DataStage Developer Guide
DataStage Developer Guide
0
Developer Guide
Contents
Preface xi
About This Guide xii
Compiling Your Operators xii
Sun Solaris xii
HP-UX xii
Compaq Tru-64 xiii
AIX xiii
Linux Redhat xiii
Class and Member Function Descriptions xiii
HTML-Documented Header Files xiii
Text Header Files xiv
The Orchestrate Documentation Set xiv
Typographic Conventions xv
Typographic Formats xv
Cross-References xvi
Syntax Conventions xvi
Using the Adobe Acrobat Reader xvi
Searching for Text in Orchestrate Documents xvii
Assistance and Additional Information xviii
iii
Contents
APT_SubProcessOperator::commandLine() 3 5
Subprocess Operator Example Derivation 3 5
Serializing Pointers 11 11
Serializing Arrays 11 11
Persistence Macros 11 13
Index Index 1
xi
List of Tables
xii
Orchestrate 7.0 Developer Guide
Preface
Describes this Guide and the conventions it uses, and tells you how to find class
information in header files, how to do text searches in Orchestrate documents, and
how to contact customer support.
About This Guide xii
Compiling Your Operators xii
Sun Solaris xii
HP-UX xii
Compaq Tru-64 xiii
AIX xiii
Linux Redhat xiii
Class and Member Function Descriptions xiii
HTML-Documented Header Files xiii
Text Header Files xiv
The Orchestrate Documentation Set xiv
Typographic Conventions xv
Typographic Formats xv
Cross-References xvi
Syntax Conventions xvi
Using the Adobe Acrobat Reader xvi
Searching for Text in Orchestrate Documents xvii
Assistance and Additional Information xviii
xi
Preface Orchestrate 7.0 Developer Guide
Sun Solaris
Use the Sun Pro C++ compiler and the dbx debugger.
Use these compiler options:
-library=iostream -dalign -g -I$(APT_ORCHHOME)/include
HP-UX
Use the HP ANSI C++ A3.33 or A3.37 compiler and the gdb debugger.
Use these compiler options:
-g -I$(APT_ORCHHOME)/include
Compaq Tru-64
Use the Compaq C++ 6.2, 6.3, or 6.5 compiler, and the ladebug debugger.
Use these compiler options:
-stdnew -ieee -nopt -g -I$(APT_ORCHHOME)/include
AIX
Use IBM VisualAge(R) C++ Professional for AIX, version 5.0.20 or 6.0, and the
dbx debugger.
Linux Redhat
Use the gcc/g++ 2.96 compiler, and the dbx debugger.
Use these compiler options:
-g -fPIC -I$(APT_ORCHHOME)/include
Typographic Conventions
Typographic Formats
Table 1 Typographic Formats
bold italic serif Orchestrate technical terms In pipeline parallelism, each operation
within the text that defines them runs when it has input data available.
Cross-References
Most cross-references indicate sections located in this book. They are hotspots and
appear in blue typeface in the online version of the document. When there are
references to other books in the documentation set, their names appear in italics.
Syntax Conventions
Operator syntax is presented as you would enter it as part of an osh command.
For a description of the general syntax of an osh command, refer to the Orchestrate
7.0 User Guide.
The following syntax conventions are used throughout this book:
• A vertical bar (|) separates mutually exclusive alternatives.
• Braces ({ }) are used with vertical bars (|) to indicate that one of several
mutually exclusive alternatives are required, for example {a | b} indicates that
a or b is required.
• If one or more items are enclosed in braces ({ }) and separated by commas the
items are synonymous. Thus {a , b} indicates that a and b have exactly the
same effect.
• Brackets ([ ])indicate that the item(s) inside are optional. For example, [a | b]
indicates a, or b, or neither.
• Ellipsis (...) indicates that the preceding item occurs zero or more times. If a
user-provided string is indicated, it may represent a different item in each
occurrence. For example:
– [-keyfield ...] means zero or more occurrences of -key field, where field
may be a different string in each occurrence.
– To indicate one or more occurrences of an item, the item occurs first
without brackets and ellipsis and then with, for example
-key field [-key field ...]
To see if your copy of Reader has Search, look for the Search icon: on the
Reader toolbar, and make sure it is present and not dimmed. The Search icon
should be located alongside the somewhat similar Find icon: .
If you do not have the appropriate version of Acrobat installed, you may use the
Acrobat Reader 4.05 included with Orchestrate. Find the version you need in one
of the following platform-specific directories:
$APT_ORCHHOME/etc/acroread-sun-405.tar.gz
$APT_ORCHHOME/etc/acroread-aix-405.tar.gz
$APT_ORCHHOME/etc/acroread-osf1-405.tar.gz
$APT_ORCHHOME/etc/acroread-hpux-405.tar.gz
Use the UNIX gunzip and tar commands to unpack the files. Then cd to the
directory *.install (where * contains an abbreviation of your platform name)
and follow the instructions in the INSTGUID.TXT file.
The Orchestrate online documentation set and the full-text search index is located
in $APT_ORCHHOME/doc.
a If the following message appears at the bottom of the Adobe Acrobat Search
window:
No selected indexes are available for search
b Click Add, then navigate the Add Index window to find the full-text search
index, Orchestrate.pdx, located in $APT_ORCHHOME/doc.
c Select the index, then click OK on the Index Selection window.
3 The Search Results window lists the documents that contain your search text,
ordered according to the number of search hits.
4 Select a book in the list.
5 Use the Previous Highlight and Next Highlight buttons to move to the
previous or next instances of your search text.
1
Creating Operators
Describes how to derive your own custom operators from Orchestrate’s
APT_Operator class.
Overview 1 2
Derivation Requirements 1 3
The APT_Operator Class Interface 1 4
APT_Operator Derivation Examples 1 5
APT_Operator Derivation With No Arguments 1 5
APT_Operator Derivation With Arguments 1 9
Including Additional APT_Operator Member Functions 1 14
Defining Parallel and Sequential Operators 1 15
Creating Parallel Operators 1 16
Creating Sequential Operators 1 16
Creating Operators that can be Either Parallel or Sequential 1 17
Specifying Input and Output Interface Schemas 1 17
Introducing Schema Variables 1 19
Introducing Operators with Dynamic Interface Schemas 1 19
Specifying a Partitioning Method 1 20
Specifying Partitioning Style and Sorting Requirements 1 23
Specifying Partitioning Style 1 23
Specifying Sort Keys 1 24
Specifying a Collection Method 1 25
Using Cursors 1 26
Using Field Accessors 1 29
Field Accessors and Schema Variables 1 32
Field Accessors and Dynamic Operators 1 33
Examples of Operator Derivations With Options 1 33
11
Chapter 1 Creating Operators Overview
Overview
Operators are the basic functional units of an Orchestrate application. Operators
read records from input data sets, perform actions on the input records, and write
results to output data sets. An operator may perform an action as simple as
copying records from an input data set to an output data set without
modification. Alternatively, an operator may modify a record by adding,
removing, or modifying fields during execution.
You can use predefined operators in your application. Orchestrate supplies a
number of such operators, such as the copy operator (APT_CopyOperator) and
the sample operator (APT_SampleOperator) The Orchestrate 7.0 Operators
Reference describes Orchestrate’s predefined operators.
You can also define your own operators. One type of operator that you can create
wraps a standard UNIX command for execution within an Orchestrate step. This
facility is documented in the Orchestrate 7.0 User Guide.
You can also derive classes from one of three Orchestrate base classes:
APT_Operator, APT_CompositeOperator, and APT_SubProcessOperator.
These correspond to the three types of operators you can define:
• Operators
• Composite operators, which contain more than one operator
• Subprocess operators, which allow you to incorporate a third-party
executable into the framework
This chapter describes the steps required to derive new operators from
APT_Operator. You follow much the same process for operators derived from
APT_CompositeOperator and APT_SubProcessOperator. See Chapter 2,
Creating Composite Operators, and Chapter 3, Creating Subprocess Operators,
for information on these operators. This chapter also provides examples of
variations in the creation process, such as using schema variables or a dynamic
interface schema.
Derivation Requirements
Deriving an operator from the base class APT_Operator requires the following:
• Using the public and private interface of APT_Operator.
• Including support for Run-Time Type Information (RTTI) and persistence.
• Enabling Orchestrate’s argc/argv command-line argument checking facility
by providing a description of the arguments expected by your operator to the
APT_DEFINE_OSH_NAME macro, and overriding the virtual function
initializeFromArgs_() to make your operator osh aware. Orchestrate’s
argument-checking facility is described in Chapter 5, Orchestrate’s
Argument-List Processor,.
• Overriding two pure virtual functions, describeOperator() and runLocally(),
and, within those functions, specifying the interface schema that defines the
number and type of data sets that may be used as input or output and how the
records should be processed.
In addition, you can optionally specify:
– The execution mode for the operator: parallel or sequential
– The partitioning method for parallel operators
– The collection method for sequential operators
– The use of cursors, which define the current input or output record of a
data set.
– The use of field accessors, which provide access to any field within a
record; or schema variables, which simplify access to an entire input or
output record.
– Data transfers
• Including support for detecting error and warning conditions and for relaying
that information to users.
• Compiling your operator code, and including its shared library in the libraries
optionally defined by your OSH_PRELOAD_LIBS environment variable.
• Running your operator.
The next figure below the interface to APT_Operator. You use the member
functions to attach data sets to the operator as well as to attach view and transfer
adapters.
The two pure virtual functions, describeOperator() and runLocally(), and the
virtual function initializeFromArgs_() are included in the protected interface.
You must override these three functions.
The runLocally() function is invoked in parallel for each instance of the operator.
After runLocally() has been called, the postFinalRunLocally() function is
invoked for each dataset partition. The default implementation of both functions
is a no op.
iField:int32; sField:string;
ExampleOperator
iField:int32; sField:string;
Here is the definition for ExampleOperator. The operator simply changes the
contents of its two record fields and then writes each record to a single output
dataset. Comments follow the code.
Code
1 #include <apt_framework/orchestrate.h>
public:
ExampleOperator();
protected:
9 virtual APT_Status describeOperator();
virtual APT_Status runLocally();
11 virtual APT_Status initializeFromArgs_(const APT_PropertyList &args,
APT_Operator::InitializeContext context);
16 ExampleOperator::ExampleOperator()
{}
24 APT_Status ExampleOperator::describeOperator()
{
setKind(APT_Operator::eParallel);
setInputDataSets(1);
setOutputDataSets(1);
setInputInterfaceSchema("record (iField:int32; sField:string)”, 0);
setOutputInterfaceSchema(“record (iField:int32; sField:string)”, 0);
return APT_StatusOk;
}
33 APT_Status ExampleOperator::runLocally()
{
APT_InputCursor inCur;
APT_OutputCursor outCur;
setupInputCursor(&inCur, 0);
setupOutputCursor(&outCur, 0);
APT_InputAccessorToInt32 iFieldInAcc(“iField”, &inCur);
APT_InputAccessorToString sFieldInAcc(“sField”, &inCur);
APT_OutputAccessorToInt32 iFieldOutAcc(“iField”, &outCur);
while (inCur.getRecord())
{
cout << “*sFieldInAcc = << *sFieldInAcc << endl;
cout << “*iFieldInAcc = “ << *iFieldInAcc << endl;
*sFieldOutAcc = “XXXXX” + *sFieldInAcc;
*iFieldOutAcc = *iFieldInAcc + 100;
cout << “*FieldOutAcc = << *iFieldOutAcc << endl;
outCur.putRecord();
};
Return APT_StatusOK;
}
Comments
9-11 You must override the virtual function initializeFromArgs_ and the two pure
virtual functions, describeOperator(), and runLocally(). Overrides are in this
example.
13 With APT_DEFINE_OSH_NAME, you connect the class name to the name used to
invoke the operator from osh and pass your argument description string to
Orchestrate. Orchestrate’s argument-checking facility is described in Chapter 5,
Orchestrate’s Argument-List Processor,. Also see
$APT_ORCHHOME/include/apt_framework/osh_name.h for documentation on this
macro.
inRec:*;
HelloWorldOp
outRec*;
The definition for HelloWorldOp follows. Table 3 contains the code for the header
file, and Table 4 has the code for the .C file. The operator takes two arguments
which determine how many times the string “hello world” should be printed and
whether it should be printed in upper or lowercase. The operator simply copies
its input to output. Comments follow the code in the tables.
Code
1 #include <apt_framework/orchestrate.h>
public:
7 HelloWorldOp();
8 void setNumTimes(APT_Int32 numTimes);
9 void setUppercase(bool uppercase);
protected:
11 virtual APT_Status initializeFromArgs_(const APT_PropertyList &args,
APT_Operator::InitializeContext context);
12 virtual APT_Status describeOperator();
13 virtual APT_Status runLocally();
private:
15 APT_Int32 runTimes_;
16 bool uppercase_;
};
Comments
8-9 Declare the C++ initialization methods for this operator. These methods are called
from initializeFromArgs_().
11 -13 Declare the three virtual functions that must be overridden. The function
initializeFrom Args_() is the osh initialization function which makes the operator osh
aware. The function describeOperator() specifies the pre-parallel initialization steps,
and runLocally() specifies the parallel execution steps.
Code
1 #include “hello.h”
2 #include <apt_framework/orchestrate.h>
3 #define HELLO_ARGS_DESC
“{uppercase={optional, description='capitalize or not'},
numtimes={value={type={int, min=1, max=10}, usageName='times'},
optional, description='number of times to print message'}}”
7 HelloWorldOp::HelloWorldOp()
: numTimes_(1),
uppercase_(false)
{}
38 APT_Status HelloWorldOp::describeOperator()
{
setKind(eParallel);
setInputDataSets(1);
setOutputDataSets(1);
setInputInterfaceSchema(“record (in:*)”, 0);
setOutputInterfaceSchema(“record (out:*)”, 0);
declareTransfer(“in”, “out”, 0, 0);
setCheckpointStateHandling(eNoState);
return APT_StatusOk;
}
49 APT_Status HelloWorldOp::runLocally()
{
APT_Status status = APT_StatusOk,
APT_InputCursor inCur;
setupInputCursor(&inCur, 0);
APT_OutputCursor outCur;
setupOutputCursor(&outCur, 0);
while(inCur.getRecord() && status == APT_StatusOk)
{
transfer(0);
outCur.putRecord();
int i;
for (i=0; i,numTimes_; i++)
{
if (uppercase_)
cout << “hello, Word” << endl;
else
cout << “hello, word” << endl;
}
return status;
}
Comments
1 hello.h is the header file for this example operator. It is shown in Table 3.
4 With APT_DEFINE_OSH_NAME, you connect the class name to the name used to
invoke the operator from osh and pass your argument description string to
Orchestrate. Orchestrate’s argument-checking facility is fully described in Chapter 5,
Orchestrate’s Argument-List Processor,. Also see
$APT_ORCHHOME/include/apt_framework/osh_name.h for documentation on this macro.
11 - The functions setNumTimes() and setUpperCase() are the initialization methods for this
18 operator. The method setNumTimes() sets the number of times to print the string; and
the method setUppercase() sets whether to print in uppercase letters.
19 With your override of initializeFromArgs_(), you can traverse the property list
Orchestrate creates from your argument description string and transfer information
from the operator’s arguments to the class instance, making it osh aware.
• setCollectionMethod()
Specify the collection method for a sequential operator.
• setKind(): Specify parallel or sequential execution of the operator.
• setNodeMap()
Specify how many partitions an operator has and which node each partition
runs on.
• setPartitionMethod()
Specify the partitioning method for a parallel operator.
• setPreservePartitioningFlag() or clearPreservePartitioningFlag()
Modify the preserve-partitioning flag in an output data set.
• setRuntimeArgs()
Specifies an initialization property list passed to initializeFromArgs_() at
runtime.
• setWorkingDirectory()
Set the working directory before runLocally() is called.
partitioner partitioner
...
All steps containing a parallel operator will also contain a partitioner to divide the
input data set into individual partitions for each processing node in the system.
The partitioning method can either be supplied by the Orchestrate framework or
defined by you. See “Specifying a Partitioning Method” on page 1-20 for more
information.
Operator 1
(sequential)
Operator 2
(parallel)
In this figure, the interface schema for both the input and the output data sets is
the same. Both schemas require the data sets to have at least three fields: two
integer fields named field1 and field2 and a floating-point field named field3. Any
extra fields in the input data set are dropped by this particular operator. See the
Orchestrate 7.0 User Guide for more information on dropping fields from an input
data set.
Note that you can use view adapters to convert components of the record schema
of a data set to match components of the interface schema of an operator. See
“Using a View Adapter with a Partitioner” on page 8-16 and “Example Collector
with a View Adapter” on page 9-16 for information on view adapters.
outRec:*;
outRec:*;
In this example, the operator specifies both an input and an output interface
schema containing a single schema variable. This figure shows that the operator
allows the user to specify two fields of its input interface schema when the
operator is instantiated. The output interface schema is fixed.
Chapter 15, Advanced Features, contains an example implementation of dynamic
schema, as well as code that creates generic functions and accessors. The code is in
Table 41.
partitioner partitioner
...
! To set and get the partitioning style of an operator input data set:
Use the APT_Operator functions below.
void setInputPartitioningStyle
(APT_PartitioningStyle::partitioningStyle, int inputDS)
APT_PartitioningStyle::partitioningStyle
getPartitioningStyle(int inputDS)
void setOutputPartitioningStyle
(APT_PartitioningStyle::partitioningStyle, int outputDS)
An example is:
{ key={value=lastname}, key={value=balance} }
! To get the name of the partitioner last used to partition an operator input:
Use this APT_Operator function:
APT_String APT_Operator::getPartitioningName(int input)
• other
You can define a custom collection method by deriving a class from
APT_Collector. Operators that use custom collectors have a collection method
of other.
By default, sequential operators use the collection method any. The any collection
method allows operator users to prefix the operator with a collection operator to
control the collection method. For example, a user could insert the ordered
collection operator in a step before the derived operator.
To set an explicit collection method for the operator that cannot be overridden,
you must include a call to APT_Operator::setCollectionMethod() within
APT_Operator::describeOperator().
You can also define your own collection method for each operator input. To do so,
you derive a collector class from APT_Collector. See Chapter 9, “Creating
Collectors” for more information.
Using Cursors
In order for your operators to process an input data set and write results to an
output data set, you need a mechanism for accessing the records and record fields
that make up a data set. Orchestrate provides three mechanisms that work
together for accessing the records and record fields of a data set: cursors,
subcursors, and field accessors.
Cursors let you reference specific records in a data set, and field accessors let you
access the individual fields in those records. You use cursors and field accessors
from within your override of the APT_Operator::runLocally() function. You use
subcursors only with vectors of subrecords. A subcursor allows you to identify
the current element of the vector.
This section describes cursors; the next section covers field accessors. See
Chapter 7, “Using Cursors and Accessors” for more information on subcursors.
You use two types of cursors with data sets: input cursors and output cursors.
Input cursors provide read-only access to the records of an input data set; output
cursors provide read/write access to the current record of an output data set.
To process an input data set, you initialize the input cursor to point to the current
input record. The cursor advances through the records until all have been
processed. Note that once a cursor has moved beyond a record of an input data
set, that record can no longer be accessed.
For the output data set, the output cursor initially points to the current output
record and advances through the rest of the records. As with input data sets, once
a cursor has moved beyond a record of an output data set, that record can no
longer be accessed.
The following figure shows an operator with a single input and a single output
data set:
direction of
cursor movement direction of
. output data set . cursor movement
. .
. .
Code
APT_Status AddOperator::runLocally()
{
3 APT_InputCursor inCur;
4 APT_OutputCursor outCur;
5 setupInputCursor(&inCur, 0);
6 setupOutputCursor(&outCur, 0);
7 while (inCur.getRecord())
{
9 // body of the loop
10 outCur.putRecord();
}
return APT_StatusOk;
}
Comments
3 Define inCur, an instance of APT_InputCursor, the input cursor for the first data set
input to this operator.
4 Define outCur, an instance of APT_OutputCursor, the output cursor for the first data
set output by this operator.
9 Process the record. This includes writing any results to the current output record. You
should not call APT_OutputCursor::putRecord() until after you have written to the
first output record (unless you want default values), because the output cursor
initially points to the first empty record in an output data set.
Not all operators produce an output record for each input record. Also, operators
can produce more output records than there are input records. An operator can
process many input records before computing a single output record. You call
putRecord() only when you have completed processing an output record,
regardless of the number of input records you process between calls to
putRecord().
See Chapter 7, “Using Cursors and Accessors” for more information on cursors.
Note that within Orchestrate, the fields of an input record are considered read
only. There is no mechanism for you to write into the fields of the records of an
input data set. Because the fields of an output record are considered read/write,
you can modify the records of an output data set.
The following figure shows a sample operator and interface schemas:
input data set
field1:int32; field2:int32;
AddOperator
field1:int32; field2:int32; total:int32;
This operator adds two fields of a record and stores the sum in a third field. For
each of the components of an operator’s input and output interface schemas, you
define a single field accessor. In this case, therefore, you need two input accessors
for the input interface schema and three output accessors for the output interface
schema.
This example uses field accessors to explicitly copy field1 and field2 from an input
record to the corresponding fields in an output record. If the input data set had a
record schema that defined more than these two fields, all other fields would be
dropped by AddOperator and not copied to the output data set.
Table 6 contains an example override of APT_Operator::describeOperator().
Code
APT_Status AddOperator::describeOperator()
{
3 setInputDataSets(1);
4 setOutputDataSets(1);
return APT_StatusOk;
}
Comments
5 Because input data sets are numbered starting from 0, specify the interface schema of
input 0 . You can simply pass a string containing the interface schema as an argument
to APT_Operator::setInputInterfaceSchema(). For more information, see “Specifying
Input and Output Interface Schemas” on page 1-18. Note that you can omit the
argument specifying the number of the data set for this function because there is only
one input data set to the operator.
6 Specify the interface schema of output 0, the first output data set.
Code
APT_Status AddOperator::runLocally()
{
APT_InputCursor inCur;
APT_OutputCursor outCur;
setupInputCursor(&inCur, 0);
setupOutputCursor(&outCur, 0);
12 while (inCur.getRecord())
{
14 *totalOutAcc = *field1InAcc + *field2InAcc;
*field1OutAcc = *field1InAcc;
17 outCur.putRecord();
}
return APT_StatusOk;
{
Comments
7-8 Define read-only field accessors for the fields of the input interface schema.
9 -11 Define read/write field accessors for the fields of the output interface schema.
12 Use APT_InputCursor::getRecord() to advance the input data set to the next input
record.
See Chapter 7, “Using Cursors and Accessors” for more information on accessors.
would need to define a field accessor for field1 but not for the schema variable
components inRec and outRec.
field1:int32; inRec:*;
outRec:*;
See “Including Schema Variables in an Interface Schema” on page 1-34 for more
information on schema variables.
outRec:*;
See “Working with a Dynamic Interface Schemas” on page 1-48 for more
information.
Background
The following figure shows an operator containing schema variables in both its
input and its output interface schemas:
field1:int32; inRec:*;
outRec:*;
This figure shows the same operator with an input data set:
inRec:*; inRec:*
transfer
outRec:*;
outRec:*
Many operators have a more complicated interface schema than that shown in the
previous figure. The next figure shows an operator with four elements in its input
interface schema and three elements in its output interface schema:
In this example, the operator adds two fields to the output data set and calculates
the values of those fields. For the operator to complete this task, you must define
field accessors for the three defined fields in the input data set as well as the two
new fields in the output data set. You cannot define accessors for the fields
represented by an interface variable.
The total output interface schema of this operator is the combination of the
schema variable outRec and the two new fields, as shown below:
The order of fields in the operator’s interface schema determines the order of
fields in the records of the output data set. In this example, the two new fields are
added to the front of the record. If the output interface schema were defined as:
"outRec:*; x:int32; y:int32;"
the two new fields would be added to the end of each output record.
A schema variable may contain fields with the same name as fields explicitly
stated in an operator’s output interface schema. For example, if the input data set
in the previous example contained a field named x, the schema variable would
also contain a field named x, as shown below.
name conflict
Orchestrate resolves name conflicts by dropping all fields with the same name as
a previous field in the schema and issuing a warning. In this example, Orchestrate
drops the second occurence of field x which is contained in the schema variable.
Performing Transfers
! To perform a transfer, you must:
1 Define the schema variables affected by the transfer using the
APT_Operator::declareTransfer() function within the body of
APT_Operator::describeOperator()
2 Implement the transfer from within the body of APT_Operator::runLocally()
using the function APT_Operator::transfer().
The code in Table 8 below shows the describeOperator() override for
NewOperator, the operator introduced in the previous figure:
Code
APT_Status NewOperator::describeOperator()
{
setInputDataSets(1);
setOutputDataSets(1);
return APT_StatusOk;
}
Comments
The function declareTransfer() returns an index identifying this transfer. The first
call to declareTransfer() returns 0, the next call returns 1, and so on. You can store
this index value as a data member, then pass the value to
APT_Operator::transfer() in the operator’s runLocally() function to perform the
transfer.
Because this is the only transfer defined for this operator, the index is not stored.
You could, however, define an operator that takes more than one data set as
input, or one that uses multiple schema variables. If you do, you can define
multiple transfers and, therefore, store the returned index.
With the transfer defined, you implement it. Typically, you call transfer() just
before you call putRecord(). The code in Table 9 below is the runLocally()
override for NewOperator:
Code
APT_Status NewOperator::runLocally()
{
APT_InputCursor inCur;
APT_OutputCursor outCur;
// define accessors
// initialize cursors
// initialize accessors
5 while (inCur.getRecord())
{
// access input record using input accessors
6 transfer(0);
7 outCur.putRecord();
}
Comments
6 After processing the input record and writing any fields to the output record, use
APT_Operator::transfer() to perform the transfer. This function copies the current
input record to the current output record.
key:string; inRec:*;
filterOperator
outRec:*;
This operator uses a single field accessor to determine the value of key. Because
the operator does not modify the record during the filter operation, you need not
define any output field accessors. Note that to iterate through the records of the
output data set, you must still define an output cursor.
FilterOperator uses the transfer mechanism to copy a record without modification
from the input data set to the output data set. If key is equal to “REMOVE”, the
operator simply does not transfer that record.
FilterOperator is derived from APT_Operator, in Table 10 below:
Code
#include <apt_framework/orchestrate.h>
public:
FilterOperator();
protected:
virtual APT_Status describeOperator();
virtual APT_Status runLocally();
virtual initializeFrom Args_(const APT_PropertyList &args,
APT_Operator::InitializeContext context);
16 FilterOperator::FilterOperator();
{}
22 APT_Status FilterOperator::describeOperator()
{
setInputDataSets(1);;
setOutputDataSets(1);
return APT_StatusOK;
}
APT_Status FilterOperator::runLocally()
setupInputCursor(&inCur, 0);
36 setupOutputCursor(&outCur, 0);
38 while (inCur.getRecord())
{
40 if (*keyInAcc != “REMOVE”)
{
42 transfer(0);
43 outCur.putRecord();
}
}
return APT_StatusOk;
}
48 void FilterOperator::serialize(APT_Archive& archive, APT_UInt8)
{}
}
Comments
13 With APT_DEFINE_OSH_NAME, you connect the class name to the name used to
invoke the operator from osh and pass your argument description string to
Orchestrate. Orchestrate’s argument-checking facility is fully described in
Chapter 5, “Orchestrate’s Argument-List Processor”. Also see
$APT_ORCHHOME/include/apt_framework/osh_name.h for documentation on this
macro.
22 The describeOperator() function defines the input and output interface schema of
the operator, as well as the transfer from inRec to outRec.
26 Specify the interface schema of input 0 (the first input data set). You can simply
pass a string containing the interface schema as an argument to
APT_Operator::setInputInterfaceSchema().
37 Define a read-only field accessor for key in the operator’s input interface schema.
Note that because this operator does not access any fields of an output record, it
does not define any output field accessors.
48 FilterOperator does not define any data members; therefore, serialize() is empty.
You must provide serialize(), even if it is an empty function.
interestTable customerData
CalculateInterestOperator
interestRate:dfloat; outRec:*;
The input interface schema of the lookup table specifies two fields: an account
type and an interest rate. The input interface schema of customerData specifies a
single field containing the account type as well as a schema variable.
The output interface schema of the operator specifies a schema variable and the
new field interest-Rate. This is the field that the operator prepends to each record
of customerData.
The following figure shows CalculateInterestOperator with its input and output
data sets:
CalculateInterestOperator
interestRate:dfloat; outRec:*;
Code
#include <apt_framework/orchestrate.h>
public:
CalculateInterestOperator();
protected:
virtual APT_Status describeOperator();
virtual APT_Status runLocally();
virtual initializeFrom Args_(const APT_PropertyList &args,
APT_Operator::InitializeContext context);
16 CalculateInterestOperator::CalculateInterestOperator()
{}
APT_Status CalculateInterestOperator::describeOperator()
{
24 setKind(APT_Operator::eParallel);
25 setInputDataSets(2);
setOutputDataSets(1);
27 setPartitionMethod(APT_Operator::eEntire, 0);
28 setPartitionMethod(APT_Operator::eAny, 1);
29 setInputInterfaceSchema(“record
(accountType:int8; interestRate:dfloat;)”, 0);
30 setInputInterfaceSchema(“record
(accountType:int8; inRec:*;)”, 1);
31 setOutputInterfaceSchema(“record
(interestRate:dfloat; outRec:*;),”, 0);
return APT_StatusOK;
}
APT_Status CalculateInterestOperator::runLocally()
{
37 APT_InputCursor inCur0;
38 APT_InputCursor inCur1;
39 APT_OutputCursor outCur0;
setupInputCursor(&inCur0, 0);
setupInputCursor(&inCur1, 1);
setupOutputCursor(&outCur0, 0);
47 struct interestTableEntry
52 interestTableEntry lookupTable[10];
53 for (int i = 0; i < 10; ++i)
{
bool gotRecord = inCur0.getRecord();
APT_ASSERT(gotRecord);
lookupTable[i].aType = *tableAccountInAcc;
lookupTable[i].interest = *tableInterestInAcc;
}
APT_ASSERT(!inCur0.getRecord());
61 while (inCur1.getRecord())
{
int i;
for (i = 0; i < 10; i++)
{
if (loopupTable[i].aType == *customerAcctTypeInAcc)
{
68 *interestOutAcc = lookupTable[i].interest;
69 transfer(0);
70 outCur0.putRecord();
}
}
}
74 void CalculateInterestOperator::serialize(APT_Archive& archive,
APT_UInt8)
{}
}
Comments
13 With APT_DEFINE_OSH_NAME, you connect the class name to the name used to
invoke the operator from osh and pass your argument description string to
Orchestrate. Orchestrate’s argument-checking facility is fully described in Chapter 5,
“Orchestrate’s Argument-List Processor”. Also see
$APT_ORCHHOME/include/apt_framework/osh_name.h for documentation on this macro.
28 Set the partitioning method of input 1, the customer data set. Because this is a parallel
operator and the records in the data set are not related to each other, choose a
partitioning method of any.
32 Declare a transfer from input data set 1, the customer data set to the output data set.
47 Define a structure representing an entry in the lookup table. Each record in the lookup
table has two fields: an account type and the daily interest rate for that type of
account.
53 Use a for loop to read in each record of the table to initialize lookupTable.
Note that for example purposes, the number of elements in the array and the loop
counter are hard coded, but this practice is not recommended.
61 Use a while loop to read in each record of the customer data. Compare the account
type of the input record to each element in the lookup table to determine the daily
interest rate corresponding to the account type.
68 Read the matching interest rate from the table and add it to the current output record.
69 Transfer the entire input record from the customer data set to the output data set.
70 Update the current output record, then advance the output data set to the next output
record.
ADynamicOperator
outRec:*;
This operator specifies a schema variable for both the input and the output
interface schema. In addition, when you instantiate the operator, you must
specify two fields of its input interface schema.
To create an operator with such a dynamic interface schema, you typically
provide a constructor that takes an argument defining the schema. This
constructor may take a single argument defining a single interface field, an array
of arguments defining multiple interface fields, or any combination of arguments
specific to your derived operator.
The constructor for the operator shown in the previous figure has the following
form:
ADynamicOperator (char * inputSchema);
The constructor for an operator that supports a dynamic interface must make the
interface schema definition available to the APT_Operator::describeOperator()
function, which contains calls to:
• setInputInterfaceSchema() to define the input interface schema
• setOutputInterfaceSchema() to define the output interface schema
For example:
setInputInterfaceSchema(“record(a:string; b:int64; inRec:*;)”, 0);
setOutputInterfaceSchema(“record (outRec”, 0, 0);
previousField()
accessor 2 nextField()
outRec:*; accessor 3
APT_MultiFieldAccessor
To read the field values, you use the following APT_MultiFieldAccessor member
functions:
• getInt8()
• getUInt8()
• getInt16()
• getUInt16()
• getInt32()
• getUInt32()
• getInt64()
• getUInt64()
• getSFloat()
• getDFloat()
• getString
• getUstring
• getStringField()
• getUStringField()
• getRawField()
• getDate()
• getTime()
• getTimeStamp()
Note that getString() and getUString() return a copy of the field, and
getStringField() and getUStringField() return a reference to the field. You use
overloads of APT_MultiFieldAccessor::setValue() to write to the fields of a
record of an output data set.
After processing the first field, you update the APT_MultiFieldAccessor object to
the next field using the statement:
inAccessor.nextField();
a:type;b:type; inRec:*;
ADynamicOperator
outRec:*;
Using this operator, a user can specify up to two fields of the operator’s input
interface schema. The constructor for this operator takes as input a string defining
these two fields. This string must be incorporated into the complete schema
definition statement required by setInputInterfaceSchema() in
describeOperator().
For example, if the constructor is called as:
ADynamicOperator myOp("field1:int32; field2:int16; ");
the complete input schema definition for this operator would be:
"record (field1:int32; field2:int16; inRec:*;) "
Code
#include <apt_framework/orchestrate.h>
#include <strstream.h>
public:
8 ADynamicOperator(char * inSchema);
9 ADynamicOperator();
protected:
virtual APT_Status describeOperator();
virtual APT_Status runLocally();
virtual initializeFrom Args_();
20 ADynamicOperator::ADynamicOperator(char * inSchema));
{
inputSchema = APT_string(“record(“) + inSchema + “inRec:*;)”;
}
APT_Status ADynamicOperator::describeOperator()
{
setInputDataSets(1);
setOutputDataSets(1);
setInputInterfaceSchema(inputSchema.data(), 0);
setOutputInterfaceSchema(“record (outRec:*;)”, 0);
return APT_StatusOK;
}
APT_Status ADynamicOperator::runLocally()
{
APT_InputCursor inCur;
APT_OutputCursor outCur;
setupInputCursor(&inCur, 0);
setupOutputCursor(&outCur, 0);
43 APT_MultiFieldAccessor inAccessor(inputInterfaceSchema(0));
44 inAccessor.setup(&inCur);
while (inCur.getRecord())
{
transfer(0);
return APT_StatusOk;
}
52 void ADynamicOperator::serialize(APT_Archive& archive, APT_UInt8)
{
archive || inputSchema;
}
Comments
8 Declare a constructor that takes a string defining part of the input interface schema.
17 With APT_DEFINE_OSH_NAME, you connect the class name to the name used to
invoke the operator from osh and pass your argument description string to
Orchestrate. Orchestrate’s argument-checking facility is fully described in Chapter 5,
“Orchestrate’s Argument-List Processor”. Also see
$APT_ORCHHOME/include/apt_framework/osh_name.h for documentation on this macro.
20 The constructor creates the complete schema definition statement and writes it to the
private data member inputSchema.
Orchestrate maintains internal storage vectors for Job Monitor metadata and
summary messages. You add your custom information to the storage vectors using
these two functions which are declared in
$APT_ORCHHOME/include/apt_framework/operator.h:
void addCustomMetadata(APT_CustomReportInfo &);
void addCustomSummary(APT_CustomReportInfo &);
void declareExternalData
(externalDataDirection direction ,
APT_String computer ,
APT_String softwareProduct ,
APT_String storeType ,
APT_String storeValue ,
APT_String schemaType ,
APT_String schemaValue ,
APT_String collectionType ,
APT_String collectionValue );
Table 13 lists the Orchestrate source and sink operators and the values of their
metadata categories:
Table 13 Metadata for Orchestrate Source/Sink Operators
Operator Source Computer Software Data Store Data Data
or Sink Product Schema Collection
datset(in) Source Conductor FileSystem Directory File File
db2loader Source Server Name DB2 DB Name Owner Table
db2lookup Source Server Name DB2 DB Name Owner Table
db2readop Source Server Name DB2 DB Name Owner Table
hlpread Source Server Name Informix DB Name Owner Table
import Source File Host FileSystem Directory File File
infxread Source Server Name Informix DB Name Owner Table
oralookup Source Server Name Oracle Oracle_SID Owner Table
oraread Source Server Name Oracle Oracle_SID Owner Table
sasdriver(in) Source File Host SAS Directory File File
sasimport Source File Host SAS Directory File File
teraread Source Server Name Teradata DB Name Owner Table
xpsread Source Server Name Informix DB Name Owner Table
dataset(out) Sink Conductor FileSystem Directory File File
db2writeop Sink Server Name DB2 DB Name Owner Table
export Sink File Host FileSystem Directory File File
hplwrite Sink Server Name Informix DB Name Owner Table
infxwrite Sink Server Name Informix DB Name Owner Table
oraupsert Sink Server Name Oracle Oracle_SID Owner Table
orawrite Sink Server Name Oracle Oracle_SID Owner Table
sasdriver(out) Sink File Host SAS Directory File File
sasexport Sink File Host SAS Directory File File
terawrite Sink Server Name Teradata DB Name Owner Table
xpswrite Sink Server Name Informix DB Name Owner Table
2
Creating Composite Operators
Describes the steps required to define composite operators.
Overview 2 1
A Composite Operator with Two Suboperators 2 2
The APT_CompositeOperator Class Interface 2 4
Composite Operator Example Derivation 2 5
Overview
Composite operators consist of one or more operators that collectively perform a
single action. By combining multiple operators into a composite operator, the
operator user is presented with a single interface, making application
development easier. The operators that make up a composite operator are called
suboperators.
This chapter describes the steps required to define composite operators, using
classes you derive from the Orchestrate base class APT_CompositeOperator. You
follow much the same procedure for creating operators derived from
APT_CompositeOperator as you do for operators derived from APT_Operator.
See Chapter 1, “Creating Operators” for more information on the basic procedure
for creating operators.
21
Chapter 2 Creating Composite Operators A Composite Operator with Two Suboperators
composite operator
input data set step
import
... suboperators
sort
virtual
data set filter
composite
operator
...
export
As an example of the first reason, you may want to create a single operator that
removes all duplicate records from a data set. To do this, you need an operator to
sort the data set so that the duplicate records are adjacent. One solution is to select
a sort operator followed by the remdup operator which removes duplicate
records. By creating a single composite operator from these two operators, you
have to instantiate and reference only a single operator.
The following figure illustrates the second reason for creating composite
operators: allowing an operator to use data already partitioned by the previous
operator. Here, you can see that the first suboperator partitions the input data
sets. The second suboperator does not define a partitioning method; instead, it
uses the already partitioned data from the previous operator.
...
suboperators
...
You could use a composite operator to create a join and filter operator. In this
case, the first suboperator would join the two data sets and the second
suboperator would filter records from the combined data set. The second
suboperator, filter, simply processes the already partitioned data set. In order for
this to occur, you must specify same as the partitioning method of the second
suboperator.
Code
#include <apt_framework/orchestrate.h>
public:
RemoveDuplicatesOperator();
protected:
9 virtual APT_Status describeOperator();
private:
11 APT_DataSet * tempDS;
12 SortOperator * sortOp;
13 RemoveOperator * removeOp;
}
14 #define ARGS_DESC "{. . .};
15 APT_DEFINE_OSH_NAME(RemoveDuplicatesOperator, RemoveDups, ARGS_DESC);
16 APT_IMPLEMENT_RTTI_ONEBASE(RemoveDuplicatesOperator,
APT_CompositeOperator);
17 APT_IMPLEMENT_PERSISTENT(RemoveDuplicatesOperator);
APT_Status RemoveDuplicatesOperator::describeOperator()
{
setInputDataSets(1);
setOutputDataSets(1);
36 markSubOperator(sortOp);
37 markSubOperator(removeOp);
return APT_StatusOk;
}
}
Comments
11 Define a pointer to a temporary data set that connects the two suboperators. This
data set will be dynamically allocated in the constructor for RemoveDuplicates.
12 Dynamically allocate the suboperators of the composite operator. This line defines
sortOp, a pointer to a SortOperator, as a private data member of the composite
operator. You instantiate SortOperator in the constructor for RemoveDuplicates.
15 With APT_DEFINE_OSH_NAME, you connect the class name to the name used to
invoke the operator from osh and pass your argument description string to
Orchestrate. See $APT_ORCHHOME/include/apt_framework/osh_name.h for
documentation on this macro. Orchestrate’s argument-checking facility is fully
described in Chapter 5, “Orchestrate’s Argument-List Processor”.
27 Specify tempDS as the output data set of sortOp. Note that you need not specify an
input data set. The function APT_CompositeOperator::redirectInput() in the
override of APT_Operator::describeOperator() specifies the input data set. See the
override of
APT_Operator::describeOperator().
28 Specify tempDS as the input data set of removeOp. Note that you do not have to
specify an output data set. The function
APT_CompositeOperator::redirectOutput() in the override of
APT_Operator::describeOperator() specifies the output data set.
3
Creating Subprocess
Operators
Tells you how to incorporate a third-party executable into your
Orchestrate application.
Overview 3 1
The APT_SubProcessOperator Class Interface 3 2
Deriving from APT_SubProcessOperator 3 3
APT_SubProcessOperator::runSource() 3 3
APT_SubProcessOperator::runSink() 3 4
APT_SubProcessOperator::commandLine() 3 5
Subprocess Operator Example Derivation 3 5
Overview
You can incorporate a third-party application package into your Orchestrate
application even if it was not written for execution in a parallel environment. It is
processed in parallel in Orchestrate.
To execute a third-party application, you create a subprocess operator, which is
derived from the Orchestrate abstract base class APT_SubProcessOperator. A
subprocess operator has all the characteristics of a standard Orchestrate operator,
including:
• Taking data sets as input
• Writing to data sets as output
• Using cursors to access records
31
Chapter 3 Creating Subprocess Operators The APT_SubProcessOperator Class Interface
runSource() = 0
UNIX pipe
stdin
subprocess operator stderr
for commandLine() = 0
third-party application error log
third-party application
stdout
UNIX pipe
runSink() = 0
The subprocess operator shown in this figure takes a single input data set and
writes its results to a single output data set. The third-party application must be
configured to take its input from stdin, to write any output to stdout, and to write
any error messages to stderr.
2 Create a buffer large enough to hold the current input record and any
additional information that you want to add to the record.
3 Call APT_Operator::transferToBuffer() to transfer the current input record to
the buffer you created in step 2.
4 Perform any preprocessing on the record required by the third-party
application.
5 Call APT_SubProcessOperator::writeToSubProcess() to copy the buffer to
the third-party application. This application must be configured to receive
inputs over stdin.
APT_SubProcessOperator::runSink()
The function runSink() reads a buffer back from a third-party application. The
returned buffer may contain a record, record fields, results calculated from a
record, and any other output information. runSink() allows you to perform any
post-processing on the results after getting them back from
APT_SubProcessOperator.
To read a fixed-length buffer back from the third-party application, you perform
the following steps:
1 Determine the buffer length. Typically, you call
APT_SubProcessOperator::readFromSubProcess() to read a fixed-length
buffer containing the length of the results buffer. The buffer is read from the
stdout of the subprocess.
2 Allocate a buffer equal to the length determined in step 1.
3 Call APT_SubProcessOperator::readFromSubProcess() again, this time
reading the fixed-length buffer from the third-party application.
To read a variable-length buffer, you perform the following steps:
1 Call APT_SubProcessOperator::getReadBuffer() to read a block of data back
from the subprocess. This function returns a pointer to a buffer containing
data read from the subprocess.
2 Parse the buffer to determine field and record boundaries. Process the buffer
as necessary.
To read a character-delimited buffer call
APT_SubProcessOperator::getTerminatedReadBuffer() to read a block of data
up to a specified delimiter back from the subprocess. This function returns a
pointer to a buffer containing data read from the subprocess.
To write the returned buffer to an output data set, you typically would call
APT_Operator::transferFromBuffer(). This function transfers the results buffer to
the output record of the operator.
APT_SubProcessOperator::commandLine()
You override the pure virtual function commandLine() to pass a command line
string to the third-party application. This string is used to execute the subprocess.
As part of the command line, you must configure the subprocess to receive all
input over stdin and write all output to stdout. Orchestrate calls this function
once to invoke the third-party application on each processing node of your
system.
For example, the UNIX gzip command takes its input from standard input by
default. You can use the -c option to configure gzip to write its output to standard
output. An example command line for gzip would be:
"/usr/local/bin/gzip -c"
Including any redirection symbols, for either input to or output from gzip, causes
an error.
Zip9Operator
Zip9Operator takes as input a data set containing exactly four fields, one of which
contains a five-digit zip code. As output, this operator creates a data set with
exactly four fields, one of which is a nine-digit zip code.
The input strings cannot contain commas because the UNIX utility ZIP9 takes as
input comma-separated strings. Also, the total line length of the strings must be
less than or equal to 80 characters.
Code
#include <apt_framework/orchestrate.h>
public:
Zip9Operator();
protected:
9 virtual APT_Status describeOperator();
10 virtual APT_String commandLine() const;
11 virtual APT_Status runSink();
12 virtual APT_Status runSource();
17 Zip9Operator::Zip9Operator()
{}
APT_Status Zip9Operator::describeOperator()
{
setKind(APT_Operator::eParallel);
setInputDataSets(1);
setOutputDataSets(1);
28 setInputInterfaceSchema(“record”
“(street:string; city:string; state:string; zip5:string;), 0);
29 setOutputInterfaceSchema(“record”
“(street:string; city:string; state:string; zip5:string;), 0);
36 APT_Status Zip9Operator::runSource()
{
APT_InputCursor inCur;
setupInputCursor (&inCur, 0);
char linebuf[80];
while (inCur.getRecord())
{
// This code builds a comma-separated string containing the street,
// city, state, and 5-digit zipcode, and passes it to ZIP9.
writeToSubProcess(linebuf, lineLength);
}
66 APT_Status Zip9Operator::runSink()
{
APT_OutputCursor outCur;
setupOutputCursor(&outCur, 0);
char linebuf[80];
while (1)
{
// read a single line of text from the subprocess
size_t lineLength = readTerminatedFromSubProcess(
linebuf, '\n', 80);
if (lineLength == 0) break;
Comments
14 With APT_DEFINE_OSH_NAME, you connect the class name to the name used
to invoke the operator from osh and pass your argument description string to
Orchestrate. See $APT_ORCHHOME/include/apt_framework/osh_name.h for
documentation on this macro. Orchestrate’s argument-checking facility is
described in Chapter 5, “Orchestrate’s Argument-List Processor”.
28 Specify the input interface schema of the first input data set.
29 Specify the output interface schema of the first output data set.
112 Zip9Operator does not define any data members; therefore, serialize() is empty.
You must provide serialize() even if it is an empty function.
4
Localizing Messages
Describes the process of changing operator message code so that Orchestrate
messages can be output in multiple languages.
Introduction 4 2
Using the Messaging Macros 4 2
Including Messaging Macros in Your Source File 4 2
Descriptions of the Message Macros 4 3
APT_MSG() 4 3
APT_NLS() 4 4
APT_DECLARE_MSG_LOG() 4 5
Message Environment Variables 4 5
Localizing Message Code 4 5
Using the APT_MSG() Macro 4 5
Example Conversions 4 6
Converting a Message With No Run-Time Variables 4 6
Converting a Message With Run-time Variables 4 6
Converting a Multi-Line Message 4 6
Steps for Converting Your Pre-NLS Messages 4 7
Eliminating Deprecated Interfaces 4 9
41
Chapter 4 Localizing Messages Introduction
Introduction
Orchestrate National Language Support (NLS) makes it possible for you to
process data in international languages using Unicode character sets. Orchestrate
uses International Components for Unicode (ICU) libraries to support NLS
functionality. For information on Orchestrate’s national language support, see
Chapter 7: National Language Support in the Orchestrate 7.0 User Guide; and access
the ICU home page:
http://oss.software.ibm.com/developerworks/opensource/icu/project
A key part of internationalizing Orchestrate is to make it possible for you to
localize its English-language messages. Orchestrate currently provides the
functionality for outputting messages in Japanese. The next section describes the
source code that makes it possible for you to output Orchestrate’s messages in
any language so that Orchestrate can be used thoughout the world.
Orchestrate’s code utilizes the source-code localization functionality of the ICU
library. Orchestrate provides macros that interface with ICU, so that you can
localize Orchestrate without having to deal directly with the ICU substructure.
where stem is a string specifying a unique identifier for the source module,
and CUST is the identifier for custom operators. For example:
APT_Error::SourceModule APT_customOpId("CUST");
APT_DECLARE_MSG_LOG(APT_customOp, "$Revision:$")
long int, unsigned long int, long long, unsigned long long, float, double,
char*, APT_String, APT_Int53, UChar*, or APT_UString.
APT_NLS()
This macro does not issue an Orchestrate message but simply returns an
APT_UString containing the localized version of the englishString. This is
needed to pass localized strings to other functions that output messages. This
macro can be called from .C files only.
Its syntax is:
APT_NLS(englishString, argumentArray )
An example is:
APT_Formattable args [] = { hostname };
error_ = APT_NLS("Invalid hostname: {0}", args);
Here a member variable is set to a localized string that can be output later.
The two arguments to APT_NLS() are identical to the englishString and
argumentArray arguments of APT_MSG(). See “APT_MSG()” on page 4-3 for the
argument descriptions.
If no run-time arguments are needed, the value of the APT_Formattable array
should be NULL.
englishString cannot contain a right parenthesis followed by a semi-colon ( ); ).
See “Including Messaging Macros in Your Source File” on page 4-2 for how to
add a message to the messaging system.
APT_DECLARE_MSG_LOG()
This macro uniquely identifies a message. It must appear in files that call
APT_MSG() and APT_NLS().
Its syntax is:
APT_DECLARE_MSG_LOG(moduleId , "$Revision:$");
where stem is a string specifying a unique identifier for the source module, and
CUST is the identifier for custom operators. For example:
APT_Error::SourceModule APT_customOpId("CUST");
APT_DECLARE_MSG_LOG(APT_customOp, "$Revision:$")
See “Including Messaging Macros in Your Source File” on page 4-2 for how to
add a message to the messaging system.
Example Conversions
Converting a Message With No Run-Time Variables
This errorLog() message has no runtime variables:
*errorLog() << "There must be at least 1 coordinate in the input
vectors." << endl;
errorLog().logError(APT_CLUSTERQUALITYOP_ERROR_START+2);
This message uses several stream operators (<<) over three lines; however, it is a
single error message and should be converted using a single APT_MSG() macro:
APT_MSG(Error, "Output schema has duplicate field names. If the "
"-flatten key is being used it is likely generating a scalar "
"field name that is the same as another input field.",
NULL, errorLog());
where stem is a string specifying a unique identifier for the source module,
and CUST is the identifier for custom operators. For example:
APT_Error::SourceModule APT_customOpId("CUST");
APT_DECLARE_MSG_LOG(APT_customOp, "$Revision:$")
into this:
if (rep_->numInputs_ > 128)
{
APT_MSG(Fatal,
"The number of inputs attached must be no more than 128",
NULL, NULL);
}
into this:
if (rep_->numInputs_ > 128)
{
APT_MSG(Fatal,
"The number of inputs attached must be no more than 128",
NULL,localErrorModule);
}
into this:
APT_Formattable args[] = {name, flagString(dispositionWhenOK),
flagString(dispositionWhenFailed), fullName};
APT_MSG(Info,"APT_PMCleanUp::registerFileImpl({0}, {1}, {2})"
" - path registered is {3}", args, NULL);
into this:
APT_MSG(Info, "Timestamp message test 1", NULL, localErrorModule);
into this:
APT_MSG(Fatal, "Timestamp message test", NULL, NULL);
into this:
APT_MSG(Fatal, "Timestamp message test", NULL, errorModule);
into this:
APT_Formattable args [] = { numCoords };
APT_MSG(Error,
"There must be at least {0} coordinates in the input vectors.",
args, errorLog());
to:
log.appendLog(subLog, APT_NLS("Error when checking operator:");
to:
APT_Formattable args [] = { path_.unparse(), sub(vecLen, vecElt),
data, bufferSave-recStart };
*log.prepend(APT_NLS("Trouble importing field \"{0}{1}\"{2}, at
offset: {3}: ", args);
to:
APT_Formattable args [] =
{ rep_->goodRecords_+ rep_->badRecords_ };
*logp.dump(APT_NLS("Import warning at record {0}: ", args));
The classes and functions that have been changed for NLS are too numerous to
list. Refer to the header file for the function or class that causes the error, and
make one of these two types of changes:
• Change one or more function prototype parameters containing char* or
APT_String to parameters that use APT_UString. This can be often done by
casting.
or
• In cases where the NLS version of a function could not be overloaded, a “U”
has been added to the beginning of a name, changing it from function_name to
Ufunction_name in the header file. An examples is:
#ifndef _NLS_NO_DEPRECATED_STRING_INTERFACE_
APT_String ident() //deprecated;
#endif
APT_UString Uident();
Substituting the function defined on the fourth line of the example for the
function defined on the first line resolves the compilation error in your .C file.
Overview 5 2
The Advantages of Argument-List Processing 5 2
Argument-List Descriptions 5 3
Supplying an Argument-List Description 5 3
Argument-List Description Elements 5 3
Structure of the Orchestrate-Generated Property List 5 4
Argument-List Description Syntax 5 5
Argument-List Description Examples 5 16
Property List Encoding of Operator Arguments 5 19
Example Property List 5 19
Traversing a Property List 5 19
Error Handling 5 20
Argument-Description Syntax Errors 5 20
Command-Line Argument Errors 5 20
Usage Strings 5 21
Converting Your Wrappered Operators 5 22
Converting a Composite Operator 5 23
Passing Arguments to a Suboperator 5 23
Example Code 5 24
Registering Your Operator Library 5 25
Executing Your Custom Operator 5 25
51
Chapter 5 Orchestrate’s Argument-List Processor Overview
Overview
Orchestrate provides a standardized argc/argv argument list processor to parse
the command-line arguments to your custom operators, including your custom
partitioner and collector operators.
By defining an argument description string in your C++ code, you provide a
description of the arguments expected by your custom operator. Orchestrate uses
your description to guide its interpretation of the argument list given to your
custom operator at run time.
Using your argument description string and the current run-time argument list,
Orchestrate detects any invalid-argument errors that prevent execution of your
operator. If there are no errors, Orchestrate provides your operator code with a
property-list of the validated arguments.
With the knowledge that Orchestrate initially and automatically performs a large
set of error checks, you can add operator code that traverses the Orchestrate-
generated property list to get the argument-validation information you need to
decide how to run your operator.
This chapter gives you the syntax you need to write your argument descriptions.
It also supplies examples of argument-description strings along with sample osh
commands for the described operators, as well as examples of Orchestrate-
generated property lists that are based on the the sample argument-description
strings and osh command-line arguments. It also shows you how to traverse a
property-list.
Note Your argument specification string now replaces the separate .op wrapper file
that was required in previous versions of Orchestrate; however, your existing
wrapped operators continue to be supported. To take advantage of
Orchestrate’s argument processing for your existing wrapped operators, convert
them by following the steps outlined in “Converting Your Wrappered
Operators” on page 5-22.
• It creates a property list you can traverse to examine the input arguments and
their values.
• It supplies error messages for arguments that do not conform to their
descriptions.
Argument-List Descriptions
Supplying an Argument-List Description
You supply a description of the arguments your operator accepts to the
APT_DEFINE_OSH_NAME macro in the form of a quoted list of argument
descriptions. Argument-list descriptions can contain Unicode characters. It’s
character set is determined by the -output_charset option, and its locale is the
same as the locale for Orchestrate output messages. Locale is user-specified at
install time. See the Orchestrate 7.0 User Guide for information on character sets
and locales.
Note The terms arguments and subarguments correspond to operator options and
suboptions.
The operator’s implementation may be spread across several files, but one of the
files is usually named with the operator’s name plus a .C extension. The call to
APT_DEFINE_OSH_NAME can go at the beginning of this file.
The macro accepts three arguments:
APT_DEFINE_OSH_NAME(C, O, U)
Cis the class name associated with the operator, O is the osh name for the
operator, and U is the argument description string. For example:
APT_DEFINE_OSH_NAME(APT_TsortOperator, tsort, TSORT_ARGS_DESC)
At run time, Orchestrate uses your argument description and the actual
arguments given to your operator to produce a property-list encoding of the
arguments and their values to your override of initializeFromArgs_().
The APT_DEFINE_OSH_NAME macro is defined in
$APT_ORCHHOME/incude/apt_framework/osh_name.h. The initializeFromArgs_()
function is defined in $APT_ORCHHOME/incude/apt_framework/operator.h,
partitioner.h, and collector.h.
The argument description string for the tsort operator is given in full in
“Argument-List Description Examples” on page 5-16.
For an operator without arguments, you can supply a minimal argument
description. The minimum description required need only include the otherInfo
parameter and its description subparameter.
For example:
{ otherInfo = {description = “this operator has no arguments.”} }
...
}
The argument name is always present as the argName property in the list. Value
properties will be present according to whether the argument item has any
values. Subarguments will be present when the argument item has
subarguments. If an argument item does not have a value or subarguments, it just
appears as an empty property in the property list.
The property list presents argument items in the order in which they are
encountered on the command line. For example, given the argument description
for the tsort operator and this osh command line:
tsort -key product -ci -sorted -hash
-key productid int32
-memory 32
-stats
The argment description string for the tsort operator is given in full in
“Argument-List Description Examples” on page 5-16.
The format of your argument description string must adhere to these rules:
1 Comments are not permitted inside the argument description.
2 Each line of the description string must end in a backslash and be quoted.
For example:
#define HELLO_ARGS_DESC
““{ “\
“ uppercase = {optional, description = 'capitalize?'}, “\
“ numtimes = { value = { type = {int, min=1, max=10 }, “\
“ usageName='times'}, “\
“ optional, “\
“ description='number of times to print' “\
“ } “\
“}” “\
Note For readability, the argument-description syntax table and the examples in the
header files omit the quotes and backslashes.
24 otherInfo = req
{
26 exclusive = {name, name, ... }, op; 0 or more
27 exclusive_required = {name, name, ... }, op; 0 or more
28 implies = {name, name, ... }, op; 0 or more
29 description = string, op; goes in usage string
30 inputs = dataset_type_descriptions, req
31 outputs = dataset_type_descriptions, req
34 type =
{
string, must be the first property
list = { string, string, ... }, op; list of legal values
regexp = regexp, op; regexp for legal values
case = sensitive | insensitive op; default: case-insensive
40 }
42 type =
{
ustring, must be the first property
list = { ustring, ustring, ... }, op; list of legal values
regexp = regexp, op; regexp for legal values
case = sensitive | insensitive op; default: case-insensive
48 }
50 type =
{
int, must be the first property
min = int, op; no lower limit by
default
max = int, op; no upper limit by
default
list = {int, int, ... } op; list of legal values;
56 } list exclusive with
min/max
58 type =
{
float, must be the first property
66 type =
{
fieldType, must be the first property
75 type =
{
propList, must be first property
elideTopBraces, op
requiredProperties = op
{ property, property, ... }
81 }
83 type =
{
schema, must be the first property
acceptSubField, op; default: top-level only
acceptVector, op; default: no vectors
acceptSchemaVar op; default: no schema vars
91 type =
{
fieldName, must be the first property
input | output, req
acceptSubField op; default: top-level only
96 }
98 type =
{
fieldTypeName, must be the first property
list = { name, name, ... }, op; list of legal type names
noParams op; default: params
accepted
103 }
105 type =
{
pathName, must be the first property
canHaveHost, op
defaultExtension = string op
110 }
inputs | outputs =
{
113 portTypeName = req; name of the dataset
{
115 description = string, req
Syntax Comment
Line
4 There can be zero or more value properties to describe the values that are
expected to follow an argument’s name on the command line. The order in
which the value entries are listed determines the order in which the values
must be presented on the command line after the argument name.
6 A single type subproperty must be supplied. Its syntax allows you to specify
one of nine data types which are described starting on line 34 below.
7 The required usageName property defines the name of the type as it appears
on the command line.
8 The optional property indicates that the value need not be supplied on the
command line. Only the final value is optional. The optional property itself is
optional.
9 You can use the optional default flag property to specify a value to be used
when no value is given on the command line. This only affects the generated
usage string. The type’s print/scan generic function is used to read the literal
value of the type.
10 The deprecated flag property is optional. When supplied, the value description
is omitted by default from generated usage strings.
13-14 By default, an argument vector may have any number of items matching a
argument description. You can restrict the number of times an argument may
occur by using the optional minOccurrences and maxOccurences properties.
The default values are 0 for minOccurrences and any integer for
maxOccurrences.
Syntax Comment
Line
16 With oshName, you specify a non-default name for the argument on the osh
command line.
17 An argument may optionally have one or more osh alias names, allowing you
to provide abbreviations, variant spellings, and so on. You specify them using
the oshAlias property.
19 You use the optional default flag property to indicate that the argument
represents a default that is in force if this argument (and typically other related
arguments in an exclusion set with this argument) is not present. This
information is put into the generated usage string, and has no other effect.
20 With the hidden flag property you can optionally describe arguments that are
not normally exposed to the operator user. Hidden argument descriptions are,
by default, omitted from generated usage strings.
21 You can optionally use the deprecated flag parameter to indicate that an
argument description exists only for back-compatibility. Its argument
description is omitted by default from generated usage strings.
26 With the exclusive constraint, you optionally name a set of arguments which
are mutually exclusive. Multiple exclusive sets may be defined.
28 An implies constraint specifies that if one given argument occurs, then another
given argument must also be supplied.
Syntax Comment
Line
30-31 Both the input and output properties are required. If they are omitted, warning
messages are emitted when the operator is run.
34-102 The type property must be the first subproperty in a value clause. It must be an
Orchestrate type.
For example:
value = { type = int32,
usageName = “mbytes”,
default = 20
}
34-40 String type. The list and regexp subproperties optionally specify the legal
values, either in list form or in a regular expression. If neither of these two
subproperties is specified, any string value is accepted for the argument.
When case has its default value of insensitive, list matching is performed in a
case-insensitive manner, and the regexp is evaluated on a copy of the string
value that has been converted to lowercase.
42-48 Ustring type. The list and regexp subproperties optionally specify the legal
values, either in list form or in a regular expression. If neither of these two
subproperties is specified, any ustring value is accepted for the argument.
When case has its default value of insensitive, list matching is performed in a
case-insensitive manner, and the regexp is evaluated on a copy of the string
value that has been converted to lowercase.
50-56 Integer type. The min and max subproperties are optional. By default, there are
no lower or upper limits. The optional list subproperty specifies a list of legal
values. It is exclusive with min and max.
Integer values are 32 bits, signed. The field value is encoded as a dfloat in the
argument’s value = value property.
58-64 Float type. The min and max subproperties are optional. By default, there are
no lower or upper limits. The optional list subproperty specifies a list of legal
values. It is exclusive with min and max.
Floating-point values are double precision. The field value is encoded as a
dfloat in the argument’s value = value property.
Syntax Comment
Line
66-73 FieldType type. The optional min and max subproperties may be specified if
the field type supports ordered comparison. By default, there are no lower or
upper limits. The optional list subproperty may be provided if the field type
supports equality comparisons. It specifies a list of legal values and is exclusive
with min and max.
The print/scan generic function is used to parse the type_literal values.
The optional compareOptions subproperty adjusts how comparisons are done
with the min, max, and list values.
The field value is encoded as a string in the argument’s value = value property.
91-96 FieldName type. You must also specify either input or output. The
acceptSubField subproperty is optional. Its default value is top-level field only.
98-103 FieldTypeName type. Using the optional list subproperty, you can specify
acceptable type names. The noParams subproperty is also optional. The default
is to accept type parameters.
113 You use the required portTypeName property to specify a one-word name for
the port. Input and output ports are the same as input and output datasets.
115 With the required description property, you can describe the purpose of the
port.
116 Use the oshName property to specify the port name for the osh command line.
Syntax Comment
Line
117-118 You can restrict the number of times a portTypeName property may occur by
using the optional minOccurrences and maxOccurences subproperties. The
default values are 0 for minOccurrences and any integer for maxOccurrences.
120 The optional required subproperty specifies one and only one occurrence of a
portTypeName property.
121 The optional once subproperty has the same functionality as the required
subproperty. It specifies one and only one occurrence of a portTypeName
property.
124 The constraints property is optional. If it is present, it may not be the empty list.
The syntax supplied provides simple constraint types that make it convenient
to describe most simple cases.
126 The ifarg constraint subproperty specifies that the port type does not appear
unless the argName has been specified. This subproperty can appear more than
once, to specify multiple “enabling” options combined by logical OR. An
example is the reject option for import/export.
127 The ifnotarg constraint subproperty indicates that the port type only appears if
the argName has not been specified. This subproperty can appear more than
once to specify multiple “disabling” options, which are combined by logical
OR. An example is the createOnly option for the lookup operator.
128 The ifargis constraint subproperty indicates that the port type appears if the
specified argName has the specified argValue.
This suboption can be specified more than once to specify multiple “enabling”
values. It can be combined with ifarg or ifnotarg. If it is specified alone, it is
effectively equivalent to also specifying ifarg for the same argName.
An example is “ifNotFound = reject” for the lookup operator.
Syntax Comment
Line
129 The argcount constraint subproperty indicates that the port type appears
exactly as many times as the argName appears. An example is the percent
option for the sample operator.
131 The portcount constraint subproperty indicates an output port type that
appears as many times as an input port type with the specified portName. An
example is rejects outputs on the merge operator
133 The incomplete flag indicates the provided input/output description is not
completely accurate given the complexity of the operator’s behavior.
Note The example argument description strings in this section are shown without line
quotes and backslashes. See the argument description in “Argument-List
Description Syntax” on page 5-5 for an example with line quotes and
backslashes.
clustered={optional,
description=
"records are grouped by this key"
},
param={value={type={propList, elideTopBraces},
usageName="params"
},
optional,
description="extra parameters for sort key"
},
otherInfo={exclusive={ci, cs},
exclusive={asc, desc},
exclusive={sorted, clustered},
description="Sub-options for sort key:"
},
},
description="specifies a sort key"
},
memory={value={type={int32, min=4},
usageName="mbytes",
default=20
},
optional,
description="size of memory allocation"
},
flagCluster={optional,
description="generate flag field identifying
clustered/sorted key value changes
in output"
},
stable={optional,
default,
description="use stable sort algorithm"
},
nonStable={silentOshAlias="-unstable",
optional,
description="use non-stable sort algorithm
(can reorder same-key records)"
},
stats={oshAlias="-statistics",
optional,
description="print execution statistics"
},
unique={oshAlias="-distinct",
optional,
description=
"keep only first record of same-key runs in output"
},
keys={value={type={schema},
usageName="keyschema"
},
deprecated=key,
maxOccurrences=1,
description="schema specifying sort key(s)"
},
seq={silentOshAlias="-sequential",
deprecated,
optional,
description="select sequential execution mode"
},
otherInfo={exclusive={stable, nonStable},
exclusive_required={key, keys},
description="Orchestrate sort operator:"
inputs={unSorted={description=”sorted dataset”,
required}
}
outputs={sorted={description=sorted dataset”,
required}
}
}“
Orchestrate’s argument list processor generates this property list based on the
tsort argument description string shown in the previous section:
{ key={value=lastname, subArgs={ci, sorted, hash}},
key={value=balance, value=int32},
memory={value=32},
stats
}
APT_Operator::InitializeContext context)
{
APT_Status status=APT_StatusOk;
if (context == APT_Operator::eRun) return status;
for (int i = 0; i < args.count(); i++)
{
const APT_Property& prop = args[i];
if (prop.name() == “numtimes”)
numTimes_ = (int) prop.valueList().getProperty(“value”, 0)
.valueDFloat();
else if (prop.name() == “uppercase”)
uppercase_ = true;
};
return status;
}
Error Handling
Argument-Description Syntax Errors
Syntax errors in your argument-description string are not noticed at compile time.
Instead, errors are generated at runtime when Orchestrate uses your argument
description string and the current operator arguments to produce a property-list
object. You can use Orchestrate’s error-log facility to capture errors. Chapter 14,
“Using the Error Log” on page 14-1 describes the error-log facility.
When an argument list is terminated, the constraints are checked for such aspects
as occurrence restrictions, exclusivity, and implies relationships; and any
violations are reported with the appropriate contextual information, including the
argument instances involved and appropriate usage-style information describing
the violated constraints.
When one or more errors are reported, only argument tokens and usage
information specific to the errors at hand are included in the error messages.
Whether or not the entire usage string is generated and issued in a summary
message is up to you, as is the responsibility of identifying the operator or other
context in which the argument-processing error occurred.
Chapter 14, “Using the Error Log” on page 14-1 describes the error-log facility.
Usage Strings
Orchestrate generates a usage string from the description elements in your
argument description string. You can access an operator usage string from the
osh command line.
For example:
$ osh -usage tsort
The usage string generated for the tsort operator follows. The example assumes
that both deprecated options and current options have been requested.
Orchestrate sort operator:
-key -- specifies a sort key; 1 or more
name -- input field name
type -- field type; optional; DEPRECATED
Sub-options for sort key:
-ci -- case-insensitive comparison; optional
-cs -- case-sensitive comparison; optional; default
-ebcdic -- use EBCDIC collating sequence; optional
-hash -- hash partition using this key; optional
-asc or -ascending
-- ascending sort order; optional; default
-desc or -descending
-- descending sort order; optional
-sorted -- records are already sorted by this key; optional
-clustered -- records are grouped by this key; optional
-param -- extra parameters for sort key; optional
params -- property=value pairs(s), without curly braces
The argc value should be the index of the last argument in the array; there is
no need for a trailing null argument. The argc/argv array is stored in the
operator, and becomes accessible via the accessArgv() method.
Example Code
ExampleCompositeOperator::initializeFromArgs()
{
// ... other processing occurs.
APT_String importProps[5];
importProps[0]="import";
importProps[1]="-schema";
importProps[2]="record (a: int32;)";
importProps[3]="-file";
// the composite's "filename" argument contains the name of the file
// to import.
importProps[4]=args.getProperty("filename").valueList().
getProperty("value").valueString();
// allocate the operator and initialize it with these arguments,
// default ident. Log any errors to our own error log.
// importOp is an APT_Operator* member variable of this class.
importOp = APT_Operator::lookupAndInitializeFromArgv(5, importProps,
errorLog());
if(!importOp || errorLog().hasError())
return APT_StatusFailed;
// allocation/initialization failed. Our errorLog will have any
// errors!
if(importOp->errorLog().hasError())
{
// an error occurred in the operator's initializeFromArgs() that
// wasn't caught in the argvchecker!
errorLog().appendLog(importOp->errorLog());
return APT_StatusFailed;
}
}
...
ExampleCompositeOperator::describeOperator()
{
// possibly other processing here.
markSubOperator(importOp);
}
10 Your runLocally() method is invoked for each parallel instance of the operator
to do the actual work of reading records, processing them, and writing them
to output.
11 The postRun function is called after step completion.
6
Type Conversions
Describes how to use pre-defined conversion functions in your custom operators
and how to define your own conversion functions.
This chapter tells you how to use pre-defined type-conversion functions in your
custom operators and tells you how to define your own type-conversion
functions.
There are two kinds of type conversions in Orchestrate: default and non-default.
Non-default conversions are also called explicit or named conversions.
Orchestrate automatically performs type conversions when the output fields from
one operator become the input fields of the next operator in the data flow. These
default conversions are possible between Orchestrate’s built-in numeric types
which are listed in the section “Default Type Conversions” on page 6-2. You can
use the modify operator to perform other field-type conversions.
61
Chapter 6 Type Conversions Default Type Conversions
Orchestrate provides two classes for handling type conversion in your custom
operators. The APT_FieldConversionRegistry class keeps track of all the field
type conversions defined in Orchestrate, and the APT_FieldConversion class
provides the functions that perform the type conversions.
For more information on explicit conversion, see the modify and transform
operator chapters in the Orchestrate 7.0 Operators Reference.
The function addFieldConversion() adds the given field conversion object to the
field-conversion registry.
lookupAndParse() locates and returns the given explicit conversion.
lookupDefault() locates and returns the default conversion for the given source
and destination schema types; and lookupExplicit() locates and returns the
indicated explicit conversion.
using HTML-documented header files. Start your search with the index.html file
in $APT_ORCHHOME/doc/html/.
Code
#include <apt_framework/orchestrate.h>
2 #include <apt_framework/type/conversion.h>
protected:
virtual APT_Status describeOperator();
virtual APT_Status runLocally();
virtual APT_Status initializeFromArgs_
(const APT_PropertyList &args, InitializeContext context);
private:
// other data members
};
APT_Status PreDefinedConversionOperator::describeOperator()
{
setInputDataSets(1);
setOutputDataSets(1);
setInputInterfaceSchema("record(dField:date)", 0);
setOutputInterfaceSchema("record(dField:date; sField:string[1])", 0);
return APT_StatusOk;
}
APT_Status PreDefinedConversionOperator::runLocally()
{
APT_InputCursor inCur;
APT_OutputCursor outCur;
setupInputCursor(&inCur, 0);
setupOutputCursor(&outCur, 0);
while (inCur.getRecord())
{
*dFieldOutAcc = *dFieldInAcc;
37 APT_Assert(nameConv);
38 APT_Int8 weekday;
40 weekday = weekday + 1;
41 if ( nameConv ) nameConv->disOwn();
43 APT_ASSERT(conv);
return APT_StatusOK;
}
Comments
2 Include the Orchestrate header file, conversion.h, which defines the type-
conversion interface.
39 Call the member function convert() to perform the conversion between the input
field dField and the local variable weekday.
40 The default value of weekday is 0. Increment weekday by 1 to mark the first day.
41 Call disown() on the nameConv conversion object since it is returned via its own()
function.
42 Create a conversion object based on the default conversion between int8 and
string.
44 Call the member function convert() to perform the conversion between the local
variable weekday and the output field sField.
Code
#include <apt_util/string.h>
2 #include <apt_framework/rawfield.h>
3 #include <apt_framework/type/conversion.h>
public:
RawStringConversion();
10 virtual APT_Status convert
(const void *STval, void* DTval, void* data) const;
11 static bool registerConversion();
protected:
13 virtual APT_FieldConversion* clone() const;
};
APT_IMPLEMENT_RTTI_ONEBASE(RawStringConversion, APT_FieldConversion);
APT_IMPLEMENT_PERSISTENT(RawStringConversion);
17 RawStringConversion::RawStringConversion()
24 APT_Status RawStringConversion::convert
(const void* STval, void* DTval, void *data) const
{
26 const APT_String &s = *(const APT_String *)STval;
APT_RawField &d = *(APT_RawField *)DTval;
34 bool RawStringConversion::registerConversion()
{
APT_FieldConversionRegistry::get().addFieldConversion(sRawString);
return true;
}
Comments
11 Define the member function which is used to register the newly defined type
conversion.
26-28 The implementation for the conversion raw_from_string. References are used for
the source and the destination, with the pointers STval and DTval being cast into
APT_String and APT_RawField respectively. The member function
assignFrom() of the class APT_RawField is used to complete the conversion.
7
Using Cursors and Accessors
Tells you how to access records and record fields using cursors and field accessors,
gives you the interfaces to the APT_String and APT_Ustring classes, and lists the
Unicode utilities.
Overview 7 2
How Record Fields Are Referenced 7 2
Cursors 7 3
The APT_InputCursor and APT_OutputCursor Class Interfaces 7 3
Example of Accessing Records with Cursors 7 5
Field Accessors 7 7
Defining Field Accessors 7 8
Using Field Accessors 7 9
Using Field Accessors to Numeric Data Types 7 10
Using Field Accessors to Decimal Fields 7 12
The APT_Decimal Class Interface 7 13
Using Field Accessors to Fixed-Length Vector Fields 7 13
Using Field Accessors to Variable-Length Vectors 7 15
Using Field Accessors to Nullable Fields 7 17
Using Field Accessors to string and ustring Fields 7 21
APT_String and APT_UString Class Interface 7 21
Unicode Utilities 7 25
Using Field Accessors to Raw Fields 7 26
The APT_RawField Class Interface 7 26
Using Field Accessors to Date, Time, and Timestamp Fields 7 29
Using Field Accessors to Aggregate Fields 7 31
Accessing Vectors of Subrecord Fields 7 33
71
Chapter 7 Using Cursors and Accessors Overview
Overview
In order for you to process an input or output data set from within an operator,
you need a mechanism for accessing the records and the record fields that make
up a data set. Orchestrates provides three mechanisms that work together to
perform these tasks: cursors, subcursors, and field accessors.
Cursors let you reference the records of a data set. You use two types of cursors
with data sets: input cursors and output cursors. Input cursors provide read-only
access to the records of a data set; output cursors provide read and write access.
Field accessors, on the other hand, provide access to a field within a record of a
data set.
You use subcursors only with vectors of subrecords. A subcursor allows you to
identify the current element of a subrecord vector.
This chapter describes how to use cursors, subcursors, and field accessors, and
includes example operator derivations. Note that you also use field accessors
from within a derived partitioner or collector. See Chapter 8, “Creating
Partitioners” and Chapter 9, “Creating Collectors” for examples using accessors
within derived partitioners and collectors.
.
.
direction of .
cursor movement
As you can see in this figure, a cursor defines the current input or output record of
a data set. Field accessors perform relative access to the current record; allowing
you to access the fields of the current record as defined by a cursor.
In order to access a different record, you update a cursor to move it through the
data set, creating a new current record. However, you do not have to update the
field accessors; they will automatically reference the record fields of the new
current record.
A record field is characterized by the field name and data type. Correspondingly,
Orchestrate supplies a different field accessor for every Orchestrate data type. In
order to access a record field, you must create an accessor for the field’s data type.
This chapter contains a description of the different accessor types, as well as
examples using them.
Cursors
Cursors let you reference specific records in a data set, while field accessors let
you access the individual fields in those records. You use cursors and field
accessors from within your override of the APT_Operator::runLocally() function.
This section describes cursors; the next section covers field accessors.
Each input and output data set requires its own cursor object. You use two
Orchestrate classes to represent cursors:
• APT_InputCursor: Defines a cursor object providing read access to an input
data set.
• APT_OutputCursor: Defines a cursor object providing read/write access to
an output data set.
The APT_InputCursor class defines the following functions for making input
records available for access:
A cursor defines the current input or output record of a data set. Field accessors
perform relative access to records; they only allow you to access the fields of the
current record as defined by either an input cursor or an output cursor.
To process an input data set, you initialize the input cursor to point to the current
input record. The cursor advances through the records until all have been
processed. Note that once a cursor has moved beyond a record of an input data
set, that record can no longer be accessed.
For the output data set, the output cursor initially points to the current output
record and advances through the rest of the records. As with input data sets, once
a cursor has moved beyond a record of an output data set, that record can no
longer be accessed.
The following figure shows an input and an output data set.
empty output
direction of
cursor movement records
. .
. . direction of
. . cursor movement
When you first create an input cursor, it is uninitialized and does not reference a
record. Therefore, field accessors to the input data set do not reference valid data.
You must call APT_InputCursor::getRecord() to initialize the cursor and make
the first record in the data set the current record. You can then use field accessors
to access the fields of the input record.
When you have finished processing a record in an input data set, you again call
APT_InputCursor::getRecord() to advance the input cursor to the next record in
the data set, making it the current input record. When no more input records are
available, APT_InputCursor::getRecord() returns false. Commonly, you use a
while loop to determine when APT_InputCursor::getRecord() returns false.
When you first create an output cursor, it references the first record in the output
data set. If the record is valid, the record fields are set to the following default
values:
• Nullable fields are set to null.
• Integers = 0.
• Floats = 0.
• Dates = January1, 0001.
• Decimals = 0.
• Times = 00:00:00 (midnight).
• Timestamps = 00:00:00 (midnight) on January 1, 0001.
• The length of variable-length string, ustring, and raw fields is set to 0.
• The characters of a fixed-length string and fixed-length ustring are set to null
(0x00) or to the pad character, if one is specified.
• The bytes of fixed-length raw fields are set to zero.
• The tag of a tagged aggregate is set to 0 to set the data type to be that of the
first field of the tagged aggregate.
• The length of variable-length vector fields are set to 0.
When you have finished writing to an output record, you must call
APT_OutputCursor::putRecord() to advance the output cursor to the next record
in the output data set, making it the current output record.
Code
APT_Status ExampleOperator::runLocally()
{
5 setupInputCursor(&inCur, 0);
6 setupOutputCursor(&outCur, 0);
7 while (inCur.getRecord())
{
9 // body of loop
10 outCur.putRecord();
}
return APT_StatusOk;
}
Comments
3 Define inCur, an instance of APT_InputCursor, the input cursor for the first data
set input to this operator.
4 Define outCur, an instance of APT_OutputCursor, the output cursor for the first data set
output by this operator.
Field Accessors
Once you have defined a cursor to reference the records of a data set, you define
field accessors to reference record fields. You assign field accessors to each
component of a data set’s record schema that you want to access.
For an input or output data set, field accessors provide named access to the record
fields. Such access is necessary if an operator is to process data sets. No field
accessor is allowed for schema variables, which have no defined data type.
Operators use field accessors to read the fields of an input record and to write the
fields of an output record. Field accessors do not allow access to the entire data set
at one time; instead, they allow you to access the fields of the current input or
output record as defined by an input or output cursor.
Field accessors allow you to work with nullable fields. Using accessors, you can
determine if a field contains a null before processing the field, or you can set a
field to null.
Note that within Orchestrate, the fields of an input record are considered read
only. There is no mechanism for you to write into the fields of the records of an
input data set. Because the fields of an output record are considered read/write,
you can modify the records of an output data set.
This section describes both input and output accessors.
Here is example code that uses three of the field accessor classes:
// Define input accessors
APT_InputAccessorToInt32 aInAccessor;
APT_InputAccessorToSFloat bInAccessor;
APT_InputAccessorToString cInAccessor;
The remaining sections of this chapter describe how to use accessors with
different field types. These sections include:
• “Using Field Accessors to Numeric Data Types”
• “Using Field Accessors to Decimal Fields”
• “Using Field Accessors to Fixed-Length Vector Fields”
• “Using Field Accessors to Variable-Length Vectors”
• “Using Field Accessors to Nullable Fields”
• “Using Field Accessors to string and ustring Fields”
• “Using Field Accessors to Raw Fields”
• “Using Field Accessors to Date, Time, and Timestamp Fields”
• “Using Field Accessors to Aggregate Fields”
field1:int32; field2:int32;
AddOperator
field1:int32; field2:int32;
This operator adds two fields of an input record and stores the sum in a field of
the output record. In addition, this operator copies the two fields of the input to
corresponding fields of the output.
For each of the components of an operator’s input and output interface schemas,
you define a single field accessor. In this case, therefore, you need two input
accessors for the input interface schema and three output accessors for the output
interface schema.
This example uses field accessors to explicitly copy field1 and field2 from an input
record to the corresponding fields in an output record. If the input data set had a
record schema that defined more than these two fields, all other fields would be
ignored by AddOperator and not copied to the output data set.
The code in Table 22 below is the describeOperator() function for AddOperator:
Code
APT_Status AddOperator::describeOperator()
{
3 setInputDataSets(1);
4 setOutputDataSets(1);
return APT_StatusOk;
}
Comments
5 Specify the interface schema of input 0 (input data sets are numbered starting from 0).
You can pass a string containing the interface schema as an argument to
APT_Operator::setInputInterfaceSchema().
6 Specify the interface schema of output 0 (the first output data set).
Code
APT_Status AddOperator::runLocally()
{
APT_InputCursor inCur;
APT_OutputCursor outCur;
setupInputCursor(&inCur, 0);
setupOutputCursor(&outCur, 0);
12 while (inCur.getRecord())
{
14 *totalOutAcc = *field1InAcc + *field2InAcc;
15 *field1OutAcc = *field1InAcc;
16 *field2OutAcc = *field2InAcc;
17 outCur.putRecord();
}
return APT_StatusOk;
}
Comments
7-8 Define read-only accessors for the fields of the operator’s input interface schema.
9-11 Define read/write accessors for the fields of the operator’s output interface
schema.
12 Use APT_InputCursor::getRecord() to advance the input data set to the next input
record.
14-16 Dereference the field accessors to access the values of the record fields in both the
input and the output data sets.
public:
constructor content() operator==()
destructor effectiveIntegerDigits() operator!=()
assignFromDecimal() effectivePrecision() operator<()
assignFromDFloat() fractionStart() operator<=()
assignFromInt32() hash() operator>()
assignFromSInt64() integerDigits() operator>=()
assignFromString() integerSize() overlapP()
assignFromUInt64() isValid() precision()
asDFloat() leadingZero() releaseStorage()
asInteger() makeInvalid() repSize()
asIntegerS64() negativeP() setScale()
asIntegerU64() operator const void * signNybble()
asString() operator=() signOk()
clear() operator!() stringLength()
compare()
The APT_Decimal class does not provide arithmetic functions. In order to use a
decimal within an arithmetic expression, you must first convert it to an integer or
float. The following code creates an accessor to a decimal field, then uses the
accessor to call member functions of APT_Decimal to convert it to a dfloat:
APT_InputAccessorToDecimal field1InAcc("field1", &inCur);
APT_OutputAccessorToDecimal field1OutAcc("field1", &outCur);
while (inCur.getRecord())
{
APT_DFloat var2 = field1InAcc->asDFloat();
field1OutAcc->assignFromDFloat(var2);
. . .
}
field1[10]:int32;
AddOperator
field1[10]:int32; total:int32;
This operator adds all elements of the vector in the input record and stores the
sum in a field of the output record. In addition, this operator copies the input
vector to the output.
For a vector, you only need to define a single accessor to access all vector
elements. The runLocally() function for AddOperator would be written as shown
in Table 24:
Code
APT_Status AddOperator::runLocally()
{
APT_InputCursor inCur;
APT_OutputCursor outCur;
setupInputCursor(&inCur, 0);
setupOutputCursor(&outCur, 0);
while (inCur.getRecord())
{
12 *totalOutAcc = 0;
outCur.putRecord();
}
return APT_StatusOk;
}
Comments
7 Define a read-only accessor for the fields of the operator’s input interface schema.
8-9 Define read/write accessors for the fields of the operator’s output interface schema.
12 Clear the total in the output record. Note that the initial value of all numeric fields
in an output record is already 0 or NULL if the field is nullable. This statement is
included for clarity only.
13 Create a for loop to add the elements of the input vector to the output total field.
15 Dereference the field accessors and to access the values of the vector elements.
16 Copies the value of element i of the input vector to element i of the output vector.
Since the output interface schema defines the length of the vector in the output
record, you do not have to set it, but you must set the vector length of a variable-
lenth vector.
Note that you also could have used the equivalent statement shown below to write
the field value:
bOutAccessor.setValueAt(i, bInAccessor.valueAt(i));
The following figure shows an operator containing a vector field in its interface
schemas:
input data set
field1[]:int32;
AddOperator
field1[]:int32; total:int32;
This operator adds all the elements of the vector in the input record and stores the
sum in a field of the output record. In addition, this operator copies the input
vector to the output. Since the input interface schema defines a variable length for
the vector in the output record, the output interface schema contains a
corresponding variable-length vector.
For a vector, you only need to define a single accessor to access all vector
elements. The runLocally() function for AddOperator in Table 25 would be written
as:
Code
APT_Status AddOperator::runLocally()
{
APT_InputCursor inCur;
APT_OutputCursor outCur;
setupInputCursor(&inCur, 0);
setupOutputCursor(&outCur, 0);
while (inCur.getRecord())
{
12 *totalOutAcc = 0;
13 field1OutAcc.setVectorLength(field1InAcc.vectorLength());
17 field1OutAcc[i] = field1InAcc[i];
}
outCur.putRecord();
}
return APT_StatusOk;
}
Comments
7 Define a read-only accessor for the fields of the operator’s input interface schema.
8-9 Define read/write accessors for the fields of the operator’s output interface schema.
12 Clear the total in the output record. Note that the initial value of all numeric fields
in an output record is already 0 or NULL if the field is nullable. This statement is
included for clarity only.
13 Set the length of the variable-length vector field in the output record.
14 Create a for loop to add the elements of the input vector to the output total field.
APT_InputAccessorToInt32::vectorLength() returns the length of a vector field.
17 Copies the value of element i of the input vector to element i of the output vector.
You can also use the equivalent statement shown below to write the field value:
bOutAccessor.setValueAt(i, bInAccessor.valueAt(i));
As part of processing a record field, you can detect a null and take the appropriate
action. For instance, you can omit the null field from a calculation, signal an error
condition, or take some other action.
To recognize a nullable field, the field of the operator’s interface must be defined
to be nullable. You include the keyword nullable in the interface specification of a
field to make it nullable. For example, all fields of the operator shown in the
following figure are nullable:
input data set
AddOperator
field1:nullable int32; field2:nullable int32; total:nullable int32;
Code
APT_Status AddOperator::describeOperator()
{
setKind(APT_Operator::eParallel);
setInputDataSets(1);
setOutputDataSets(1);
return APT_StatusOk;
}
Comments
6 Specify that all fields of the interface schema of input 0 are nullable. You can
individually specify any or all fields of the interface schema as nullable.
Code
APT_Status AddOperator::runLocally()
{
APT_InputCursor inCur;
APT_OutputCursor outCur;
setupInputCursor(&inCur, 0);
setupOutputCursor(&outCur, 0);
while (inCur.getRecord())
{
14 if (!field1InAcc.isNull())
*field1OutAcc = *field1InAcc;
16 if (!field2InAcc.isNull())
*field2OutAcc = *field2InAcc;
20 outCur.putRecord();
}
return APT_StatusOk;
}
Comments
18 If both field1 and field2 contain valid data, perform the addition. Writing to total
clears the null indicator for the field.
public:
constructor isBound() operator>>()
destructor getCollationSeq() operator<<()
addPadding() hash() operator<=()
adopt() isBoundedLength() operator>=()
adopt_badarg() isEmpty() operator ||()
allocBuf() isEmpty_fixed() padChar()
append() isEqualCI() prepareForFielding()
append2() isFixedLength() prepend()
asFloat() isLower() removePadding()
asInteger() isUpper() replace()
assignFrom() isTransformNecessary() setBoundedLength()
badIndex() isVariableLength() setCollationSeq()
bind() length() setFixedLength()
clear() nonTerminatedContent() setLength()
compactPadding() offsetOfSubstring() setPadChar()
compare() occurrences() setVariableLength()
content() operator[ ]() substring()
data() operator+() terminatedContent()
data_nonOwn() operator+=() transform()
equal_CI() operator=() transformLength
equals() operator==() trimPadding()
equivalent() operator!=() toLower()
getChar() operator>() toUpper()
initFrom() operator<()
You access a field of type APT_String or APT_UString using the field accessor
APT_InputAccessorToString or APT_InputAccessorToUString and
APT_OutputAccessorToString or APT_OutputAccessorToUString. Once you
have defined and initialized the accessor, you then use indirect addressing, via
the dereferencing operator ->, to call a member function of APT_String or
APT_UString to process the field.
You can see from the class member functions that APT_String and APT_UString
allow you to copy a string field, using operator=, and compare string fields using
operator==, operator!=, isEqualCI()and other functions. These classes also
contain member functions to access the contents and length of a string field.
The following code creates an input accessor to a string field, then uses the
accessor to call member functions of APT_String:
APT_InputAccessorToString field1InAcc(“field1”, &inCur);
while (inCur.getRecord())
{
size_t fieldLen = field1InAcc->length();
const char * buffer = field1InAcc->content();
. . .
}
field1:string; field2:string[10];
StringOperator
field1:ustring; field2:string[10];
The runLocally() function shown in Table 28 below shows how the StringOperator
would be written:
Code
APT_Status StringOperator::runLocally()
{
APT_InputCursor inCur;
APT_OutputCursor outCur;
setupInputCursor(&inCur, 0);
setupOutputCursor(&outCur, 0);
while (inCur.getRecord())
{
13 APT_Int32 fieldLen = field1InAcc->length();
14 const UChar * buffer = field1InAcc->content();
15 field1OutAcc->assignFrom(buffer, fieldLen);
16 *field2OutAcc = *field2InAcc;
outCur.putRecord();
}
return APT_StatusOk;
}
Comments
14 Return a pointer to the contents of the ustring field. Note that because a string field
is not defined to be null terminated, you may need the length of the field as well.
15 Copy the buffer to the output field, including the ustring length. By default, a
variable-length string field in an output data set has a length of 0; you must set the
string length as part of writing to the field.
16 Directly copy the fixed-length input string field to the output string field.
If the input fixed-length string is longer than the output fixed-length string, the
input string is truncated to the length of the output. If the input string is shorter,
the output string is by default padded with zeros to the length of the output string.
You can call setPadChar() to specify a different pad character.
Processing fixed and variable length vectors of string fields is the same as for
vectors of numeric data types. See “Using Field Accessors to Fixed-Length Vector
Fields” on page 7-13 or “Using Field Accessors to Variable-Length Vectors” on
page 7-15 for an example.
Unicode Utilities
The Unicode utility functions are listed in this section. The header file for these
functions is $APT_ORCHHOME/include/apt_util/unicode_utils.h.
Ctype Functions
These functions accept both char and Uchar argments unless otherwise noted:
isprint
isspace
isdigit
digit (UChar only)
isalpha
isalnum
islower
tolower
isupper
toupper
File-Related Functions
These functions accept both char and UChar arguments:
APT_fopen
APT_Ufprint
APT_Usprintf
APT_Usscanf
The <, >, <=, and => operators also use a collation sequence if a non-default
sequence is available. The default collation sequence uses byte-wise comparisons.
You can access a field of type APT_RawField using the field accessors
APT_InputAccessorToRawField and APT_OutputAccessorToRawField. Once
you have defined and initialized the accessor, you then use indirect addressing,
via the dereferencing operator ->, to call the member functions of APT_RawField
to process the field.
As shown in the class member functions shown above, you can assign to an
APT_RawField object using operator=, as well as compare APT_RawField
objects using operator== and operator!= and other functions. APT_RawField
also contains member functions to access the contents of a raw field.
For example, the following code creates an input accessor to a raw field, then uses
the accessor to call member functions of APT_RawField:
field1:raw; field2:raw[10];
RawOperator
field1:raw; field2:raw[10];
Code
APT_Status RawOperator::runLocally()
{
APT_InputCursor inCur;
APT_OutputCursor outCur;
setupInputCursor(&inCur, 0);
setupOutputCursor(&outCur, 0);
while (inCur.getRecord())
{
13 size_t fieldLen = field1InAcc->length();
14 const void * buffer = field1InAcc->content();
15 field1OutAcc->assignFrom(buffer, fieldLen);
16 *field2OutAcc = *field2InAcc;
outCur.putRecord();
}
return APT_StatusOk;
}
Comments
Processing vectors of raw fields, either fixed or variable length, is the same as for
vectors of numeric data types. See “Using Field Accessors to Fixed-Length Vector
Fields” on page 7-13 or “Using Field Accessors to Variable-Length Vectors” on
page 7-15 for an example.
Once you have defined an accessor to one of these fields, you use the accessor’s
dereference operator, ->, to call member functions of the corresponding class to
process the field.
For example, the following figure shows an operator containing a date field in its
interface schemas:
input data set
field1:date;
DateOperator
field1:date;
Code
APT_Status DateOperator::runLocally()
{
APT_InputCursor inCur;
APT_OutputCursor outCur;
setupInputCursor(&inCur, 0);
setupOutputCursor(&outCur, 0);
while (inCur.getRecord())
{
if (*field1InAcc < cutoffDate);
{
int year = field1InAcc->year();
int month = field1InAcc->month();
int day = field1InAcc->day();
. . .
}
outCur.putRecord();
}
return APT_StatusOk;
}
Comments
AggregateOperator
In order to access the elements of an aggregate field, you must define accessors to:
• Each element of the aggregate
• The tag for a tagged aggregate
Code
APT_Status AggregateOperator::runLocally()
{
APT_InputCursor inCur;
APT_OutputCursor outCur;
setupInputCursor(&inCur, 0);
setupOutputCursor(&outCur, 0);
9 APT_InputTagAccessor bTagIn;
10 inCur.setupTagAccessor(“b”, &bTagIn);
11 APT_InputAccessorToString bTaggedField1In(“b.bTaggedField1”, &inCur);
12 APT_InputAccessorToInt32 bTaggedField2In(“b.bTaggedField2”, &inCur);
15 APT_OutputTagAccessor bTagOut;
16 outCur.setupTagAccessor(“b”, &bTagOut);
17 APT_OutputTagAccessorToString bTaggedField1Out(
“b.bTaggedField1”, &outCur);
18 APT_OutputTagAccessorToInt32 bTaggedField2Out(
“b.bTagedField2”, &outCur);
while (inCur.getRecord())
{
*aSubField1Out = *aSubField1In;
*aSubField2Out = *aSubField2In;
23 switch(bTagIn.tag())
{
case 0:
26 bTagOut.setTag(0);
27 *bTaggedField1Out = *bTaggedField1In;
break;
case 1:
bTagOut.setTag(1);
*bTaggedField2Out = *bTaggedField2In;
break;
}
outCur.putRecord();
}
return APT_StatusOk;
}
Comments
7-8 Define input accessor elements for the subrecord. Note that you use dot-delimited
referencing to refer to the fields of an aggregate in much the same way that you do
for the elements of a C structure.
Once you have defined accessors for the subrecord aggregate elements, you access
the fields of a subrecord aggregate in the same way you access ordinary fields.
9-10 Define a tag accessor and accessor elements for the tagged aggregate.
13-18 Define a tag accessor and accessor elements for the output tagged aggregate.
23 Determine the active tag element. For tagged aggregates, only one element of the
tagged aggregate is active at one time. For an input data set, you use a tag accessor
to determine the currently active element.
26 Set the tag in the output tagged field. For an output record, you must set the tag in
order to specify the data type of a tagged field. Though you can change the tag for
every record in an output data set, data may be destroyed. Once you set the tag for a
record, it is good practice not to change it.
35 Use the macro APT_ASSERT(0) to generate an assertion failure if the tag value is
not 0 or 1. This means that the tag has an invalid value because field b is defined to
contain only two elements. Note that this condition should never happen; therefore,
you handle it using an assertion. See Chapter 14, “Using the Error Log” for more
information.
called a subcursor. Note that you still need to use cursors to identify the input or
output record and accessors to reference the individual fields of the subrecord.
Like cursors and accessors, Orchestrate defines two types of subcursors: one for
input data sets and one for output data sets. You use the following Orchestrate
classes to define subcursor objects:
• APT_InputSubCursor: provides access for iterating through a vector of
subrecords in an input data set
• APT_OutputSubCursor: provides access for iterating through a vector of
subrecords in an output data set
When you initialize a subcursor, it is set to refer to the first element in the
subrecord vector. Both subcursor classes contain the following member functions
for manipulating the location of the subcursor in a vector: next(), prev(),
setPosition(), and vectorLength(). In addition, APT_OutputSubCursor has the
member function setVectorLength() which you can use to modify the length of a
variable-length subrecord vector in an output data set.
Subcursors reference a vector element relative to the location of an input or
output cursor. As part of initializing a subcursor, you bind it to either an input or
output cursor. Updating an input cursor to the next input record, using
APT_InputCursor::getRecord(), or updating an output cursor to the next output
record, using APT_OutputCursor::putRecord(), resets all bound subcursors to
the first element in their associated vectors.
The following figure shows an operator containing a vector of subrecords:
a[10]:subrec(aSubField1:int32;aSubField2:Sfloat;)
SubrecordVectorOperator
a[10]:subrec(aSubField1:int32;aSubField2:Sfloat;)
Code
APT_Status SubrecordVectorOperator::runLocally()
{
APT_InputCursor inCur;
APT_OutputCursor outCur;
setupInputCursor(&inCur, 0);
setupOutputCursor(&outCur, 0);
13 while (inCur.getRecord())
{
15 for (int i = 0; i < inSubCur.vectorLength(); i++)
{
17 *aSubField1Out = *aSubField1In;
18 *aSubField2Out = *aSubField2In;
19 inSubCur.next();
20 outSubCur.next();
}
22 outCur.putRecord();
}
return APT_StatusOk;
}
Comments
You can also define a vector of subrecords and nest it within a subrecord that is
itself either a vector or a scalar. You use the same procedure described above for
nested subrecord vectors.
8
Creating Partitioners
Describes how to specify the partitioning method of a derived operator and how to
create a custom partitioner.
Overview 7 1
Setting the Preserve-Partitioning Flag 7 3
Choosing a Partitioning Method 7 3
Using the Default Partitioning Method 7 3
Using a Keyless Partitioning Method 7 4
Using the Class APT_HashPartitioner 7 5
Using The Class APT_RangePartitioner 7 7
The APT_Partitioner Class Interface 7 8
Creating a Custom Partitioner 7 8
Overriding APT_Partitioner::describePartitioner() 7 9
Overriding APT_Partitioner::setupInputs() 7 9
Overriding APT_Partitioner::partitionInput() 7 10
Example Partitioning Method Definition 7 11
Orchestrate Hashing Functions 7 15
Using a View Adapter with a Partitioner 7 16
Overview
When you create your own operator classes by deriving from APT_Operator, you
can directly control how an operator partitions data. You can use an Orchestrate-
supplied partitioning method as part of your derived parallel operator or define a
partitioning method by deriving from the base class APT_Partitioner. Once you
81
Chapter 8 Creating Partitioners Overview
have derived a new APT_Partitioner class, you simply include the derived
partitioner within a new operator.
You can even build a library of partitioner classes to use when appropriate. With
this approach, you can mix and match operators and partitioners when different
forms of an operator differ only by how the operator partitions data.
This chapter describes how to specify the partitioning method of a derived
operator and also how to create a custom partitioning method.
When you derive a new parallel operator, you may choose the partitioning
method used by the operator. Orchestrate directly supports the keyless
partitioning methods any, round robin, random, same, and entire. For
partitioners, any is implemented as the most efficient partitioner available, and it
is currently either same or round robin. To use one of these methods, include a
call to APT_Operator::setPartitionMethod() within the
APT_Operator::describeOperator() function of your derived operator. See
“Using a Keyless Partitioning Method” on page 8-4 for information.
For the keyed partitioning method hash by field, Orchestrate supplies the
partitioner class APT_HashPartitioner which allows you to hash a record based
on one or more numeric integer or string fields. See “Using the Class
APT_HashPartitioner” on page 8-5 for information on this class.
Orchestrate also supplies the partitioner class APT_ModulusPartitioner for the
keyed partitioning method modulus. Partitioning is based on a numeric key field
modulo the number of partitions. It is similar to hash by field but involves
simpler computation.
For the range partitioning method, Orchestrate supplies the partitioner class
APT_RangePartitioner. See “Using The Class APT_RangePartitioner” on page 8-
7 for information on this class.
If you want to derive a new partitioning method, or a variation of one of the
methods described above, you can derive a partitioner class from
APT_Partitioner. See “The APT_Partitioner Class Interface” on page 8-8 for
information.
To set the partitioning method for a derived operator, you use the member
functions of both APT_Operator and APT_Partitioner. See Chapter 1, “Creating
Operators” for information on deriving from APT_Operator.
Also see the section “Specifying Partitioning Style and Sorting Requirements” on
page 1-23 which discusses an alternative to stipulating a specific partitioning
method for an operator. This facility allows you to specify your partitioning and
sorting requirements. Based on your specifications, Orchestrate inserts the
apppropriate components into the data flow. This functionality makes it possible
for your operator users to write correct data flows without having to deal with
parallelism issues.
Another advantage of using the any partitioning method is that an operator user
can override the partitioning method. For example, you may create an operator
whose partitioning method is tightly coupled to the actual data set processed at
run time. Specifying the any partitioning methods allows the user to change the
partitioning method for each instance of the operator.
The function APT_Operator::partitionMethod() returns APT_Operator::eAny for
an operator using the partitioning method any.
The first argument, pType, specifies the partitioning method as defined using one
of the following values:
• APT_Operator::eAny (default)
• APT_Operator::eRoundRobin
• APT_Operator::eRandom
• APT_Operator::eSame
• APT_Operator::eEntire
The second argument, inputDS, specifies the number of the input data set to the
operator. Note that the input data sets to an operator are numbered starting from
0.
For example, to use round robin partitioning with an operator that takes a single
input data set, include the following statements within the operator’s
describeOperator() function:
setKind(APT_Operator::eParallel);
setPartitionMethod(APT_Operator::eRoundRobin, 0);
If the operator has two input data sets and you want to partition the data sets
using random, you include the lines:
setKind(APT_Operator::eParallel);
setPartitionMethod(APT_Operator::eRandom, 0); // input data set 0
setPartitionMethod(APT_Operator::eRandom, 1); // input data set 1
SortOperator
-----;
...
out:*;
APT_HashPartitioner does not define any interface schema; you use the
APT_HashPartitioner constructor or the member function
APT_HashPartitioner::setKey() to specify the key fields.
The constructor for APT_HashPartitioner has two overloads:
APT_HashPartitioner();
APT_HashPartitioner(const APT_FieldList& fList);
can be any field type, including raw, date, and timestamp. APT_HashPartitioner
determines the data type of each field from the operator’s input interface schema.
SortOperator requires three fields as input: two integer fields and a string field.
You can specify the partitioner’s interface schema within the describeOperator()
function, as as the code in Table 34 shows below:
Code
APT_Status SortOperator::describeOperator()
{
setKind(APT_Operator::eParallel);
setInputDataSets(1);
setOutputDataSets(1);
setInputInterfaceSchema(“record(field1:int32; field2:int32;
field3:string; in:*;)”, 0);
setOutputInterfaceSchema(“record(out:*;)”, 0);
declareTransfer(“in”, “out”, 0, 0);
return APT_StatusOk;
}
Comments
An application developer using this operator may use adapters to translate the
name of a dataset field and its data type in the input data set schema to match the
operator’s input interface schema. In the previous figure, the data set myDS is
input to the sort operator. An application developer could translate field a and
field c of myDS to field1 and field2 of the operator. Therefore, the hash partitioner
would partition the record by fields a and c. See “Using a View Adapter with a
Partitioner” on page 8-16 for more information on adapters.
Overriding APT_Partitioner::describePartitioner()
Many partitioning methods use the fields of a record to determine the partition
for the record. To access those fields, the partitioner must have an interface
schema, defined by overriding the pure virtual function describePartitioner().
The following figure shows a partitioner with a single integer field named
hashField as its interface schema:
SortOperator
...
out:*;
output data set
The input dataset’s concrete schema must be compatible with the partitioner’s
interface schema. In this example, both schemas contain an integer field named
hashField. If an operator’s input interface schema is not compatible with the
partitioner’s schema, you can use an adapter to translate components.
viewAdaptedSchema() returns the dataset concrete schema as projected through
the view adapter. See “Using a View Adapter with a Partitioner” on page 8-16 for
information on view adapters.
A partitioner is not required to define an interface schema if it does not use record
fields as part of its method. This type of partitioner is called a keyless partitioner.
You still must provide an override to describePartitioner(), but the function
should just return APT_StatusOK. See “Example Partitioning Method
Definition” on page 8-11 for a sample override of describePartitioner().
Overriding APT_Partitioner::setupInputs()
After you have established the partitioner’s interface schema, you need to define
field accessors. Field accessors provide named access to any type of field within a
record of a data set. See Chapter 7, “Using Cursors and Accessors” for
information on field accessors. A field accessor is normally defined as a private
data member of the derived partitioner class. You override the pure virtual
function setupInputs() to initialize the field accessors.
The following figure shows a partitioner that defines a single integer field named
hashField as its interface schema:
SortOperator
partitioner and
hashField:int32; partitioner interface schema
...
out:*;
output data set
In this example, you override the pure virtual function setupInputs() to initialize
the single field accessor used by a partitioner to access hashField. If your
partitioning method does not access record fields, you still must override
setupInputs(), but it should simply return APT_StatusOk. See “Example
Partitioning Method Definition” on page 8-11 for a sample override of
setupInputs().
Overriding APT_Partitioner::partitionInput()
You must override the pure virtual function APT_Partitioner::partitionInput() to
perform the actual partitioning operation. This function contains the code
defining your partitioning method. Here is the function prototype of
partitionInput():
virtual int partitionInput(int numPartitions) = 0;
SortOperator
hashField:int32 ; partitioner and
partitioner interface schema
...
out:*;
To access the record hashField, the partitioner defines one accessor. The partitioner
schema and operator’s input interface schema both contain an integer field named
hashField. Therefore, they are compatible. If they were not compatible, you could
create a view adapter to translate the interface schema.
The code in Table 35 below shows the derivation of SortPartitioner, the partitioner
for this operator:
Code
#include <apt_framework/orchestrate.h>
2 class SortPartitioner : public APT_Partitioner
{
4 APT_DECLARE_RTTI(SortPartitioner);
5 APT_IMPLEMENT_PERSISTENT(SortOperator);
public:
7 SortPartitioner();
protected:
9 virtual APT_Status describePartitioner();
10 virtual APT_Status setupInputs(int numPartitions);
private:
14 APT_InputAccessorToInt32 hashFieldAccessor;
19 SortPartitioner::SortPartitioner()
{}
27 APT_Status SortPartitioner::describePartitioner()
{
29 setInputInterfaceSchema(”record(hashField:int32;)”);
return APT_StatusOk;
}
Comments
7 Include the default constructor for SortPartitioner. This constructor is required for
persistent classes. See Chapter 11, “Enabling Object Persistence” for more
information.
16 With APT_DEFINE_OSH_NAME, you connect the class name to the name used to
invoke the operator from osh and pass your argument description string to
Orchestrate. See $APT_ORCHHOME/include/apt_framework/osh_name.h for
documentation on this macro. Orchestrate’s argument-checking facility is fully
described in Chapter 5, “Orchestrate’s Argument-List Processor”.
39 Use APT_hash() to compute a hash value for the integer field. APT_hash() returns
a partition number for a specified hash key. See the section “Function APT_hash()”
on page 6-14 for more information on this function.
Once you have defined your partitioner, you can use it with a derived operator.
Typically, you define the partitioner within the override of
APT_Operator::describeOperator().
To use SortPartitioner with SortOperator, you use
APT_Operator::setPartitionMethod() within the
APT_Operator::describeOperator() function of SortOperator.
setPartitionMethod() allows you to specify a partitioner class for your
partitioning method. Here is the function prototype of setPartitionMethod():
void setPartitionMethod(APT_Partitioner * partitioner,
const APT_ViewAdapter& adapter,
int inputDS);
You specify the hash key using the key argument. You use the keyLength argument
to specify the length of a character string if the string is not null-terminated. With
the caseSensitive argument you can specify whether the key represents a case-
SortOperator
pField1:int32; pField2:int32; partitioner and
partitioner interface schema
...
out:*;
The input interface schema of the partitioner defines two integer fields, pField1
and pField2, which it uses to partition the records of an input data set. This schema
is not compatible with the interface schema of the operator, so it is necessary to
define and initialize an APT_ViewAdapter within the describeOperator()
9
Creating Collectors
Describes how to derive your own custom collectors from APT_Collector.
Overview 9 1
Choosing a Collection Method 9 2
Using the Default Collection Method 9 3
Using a Keyless Collection Method 9 3
Using The Sorted-Merge Collector 9 4
An APT_SortedMergeCollector Example 9 4
The APT_Collector Class Interface 9 6
Creating a Custom Collector 9 6
How APT_Collector Operates 9 7
Overriding APT_Collector::describeCollector() 9 8
Overriding APT_Collector::setupInputs() 9 9
Overriding APT_Collector::selectInput() 9 10
Example Collector Derivation 9 11
Example Collector with a View Adapter 9 16
Overview
When you create your own sequential operator classes by deriving from
APT_Operator, you can directly control how an operator performs collection.
You can use an Orchestrate-supplied collection method as part of your derived
operator or define your own collection method by deriving from the base class
91
Chapter 9 Creating Collectors Choosing a Collection Method
APT_Collector. Once you have derived a new APT_Collector class, you include
the derived collector within an operator.
You can even build a library of collector classes to use when appropriate. This lets
you mix and match operators and collectors when different forms of an operator
differ only by how the operator performs collection.
When you derive a sequential operator, you can choose the collection method
used by it. Orchestrate directly supports the keyless collection methods any,
round robin, and ordered. The default collection method is any. With the any
method, an operator reads records on a first-come first-served basis.
To use one of these methods, you include a call to
APT_Operator::setCollectionMethod() within the
APT_Operator::describeOperator() function of your derived operator. See
“Using a Keyless Collection Method” on page 9-3 for information.
Orchestrate also supplies the sorted merge collection method, a keyed collector.
To use the sorted merge collection method, use the class
APT_SortedMergeCollector. See “Using The Sorted-Merge Collector” on page 9-
4 for information.
To derive a new collection method, or a variation on one of the methods described
above, you can derive a collection class from APT_Collector. See “The
APT_Collector Class Interface” on page 9-6.
The ordered method requires that all records are read from partition 0 before
beginning to process records from partition 1. However, the records of partition 1
may actually be ready for processing before those from partition 0. In this case,
the sequential operator must wait, possibly creating a processing bottleneck in
your application. Note, though, that the ordered collection method is necessary if
you want to process a totally sorted data set with a sequential operator and
preserve the sort order.
Unless your sequential operator requires a deterministic order for processing
records, you typically will use the any collection method. If you want more
control over the order of records processed by the operator, you can use the
ordered method, the APT_SortedMergeCollector, or a custom collector that you
define.
The first argument, cType, specifies the collection method as defined by the
following values:
• APT_Operator::eCollectRoundRobin
• APT_Operator::eCollectOrdered
The second argument, inputDS, specifies the number of the input data set to the
operator. Note that these data sets are numbered starting from 0.
For example, to use round robin collection with a sequential operator that takes a
single input data set, include the following statements within the operator
describeOperator() function:
setKind(APT_Operator::eSequential);
setCollectionMethod(APT_Operator::eCollectRoundRobin, 0);
If the operator has two input data sets and you want to use ordered for both,
include the lines:
setKind(APT_Operator::eSequential);
setCollectionMethod(APT_Operator::eCollectOrdered, 0);
setCollectionMethod(APT_Operator::eCollectOrdered, 1);
An APT_SortedMergeCollector Example
Orchestrate supplies the class APT_SortedMergeCollector, which orders the
records processed by a sequential operator, based on one or more fields of a
record. APT_SortedMergeCollector uses a dynamic interface schema that allows
you to specify one or more numeric or string fields as input.
MyOperator
-----;
out:*;
output data set
APT_SortedMergeCollector does not define any interface schema; you use the
APT_SortedMergeCollector member function
APT_SortedMergeCollector::setKey() to specify the collecting key fields.
MyOperator requires three fields as input: two integer fields and a string field. You
can specify the collector’s interface schema within the operator’s
describeOperator() function, as shown below:
APT_Status MyOperator::describeOperator()
{
setKind(APT_Operator::eSequential); // set mode to sequential
setInputDataSets(1);
setOutputDataSets(1);
setInputInterfaceSchema(
"record(field1:int32; field2:int32; field3:string; in:*;)", 0);
setOutputInterfaceSchema("record (out:*;)", 0);
declareTransfer("in", "out", 0, 0);
// Define the collector
APT_SortedMergeCollector * coll = new APT_SortedMergeCollector;
coll->setKey("field1", "int32");
coll->setKey("field2", "int32");
setCollectionMethod(coll, APT_ViewAdapter(), 0);
return APT_StatusOk;
};
the primary collecting key field and field2 as the secondary collecting key field for
the APT_SortedMergeCollector object. The function setKey() is used to to specify
both a field name and a data type for the field.
The function APT_Operator::setCollectionMethod() specifies coll as the collector
for this operator. It is not necessary to use an input field adapter with this
collector; a default view adapter is passed instead.
Note that APT_Collector contains two other functions: the public member
function initializeFromArgs() and the protected function initializeFromArgs_().
You use these functions to enable Orchestrate’s argument-list processing facility
and to make your collector osh aware. As part of deriving a collector class, you
can include support for detecting error and warning conditions and for relaying
that information back to users. See Chapter 14, “Using the Error Log” for more
information.
p0 p1 p2 pN
...
current record
in each partition
operator collector
(sequential)
In this figure, each input partition has a current record, corresponding to the
record that a sequential operator would read if it consumed a record from that
partition.
When a sequential operator calls APT_InputCursor::getRecord() as part of its
override of APT_Operator::runLocally() to obtain the next record from an input
data set, the collector determines the partition that supplies the record. The
selected partition then updates itself, so the next record in the partition becomes
the current record. When any partition becomes empty because it has supplied its
final record, that partition returns an End Of File (EOF) whenever a record is
requested from it.
The call to APT_InputCursor::getRecord() causes the operator to call
APT_Collector::selectInput(), one of the pure virtual functions that you must
override when defining a collector. This function returns the number of the input
partition supplying the record read by the operator. Your override of
p0 p1 p2 pN
...
current record
in each partition
Overriding APT_Collector::describeCollector()
Many collection methods use the fields of a record to determine the order of
records processed by a sequential operator. To access those fields, the collector
must have an interface schema, defined by overriding the pure virtual function
describeCollector().
The following figure shows a collector with a single integer field named
collectorField as its interface schema:
collectorField:int32;
collector and
collector interface schema
out:*;
output data set
Overriding APT_Collector::setupInputs()
After you have established the collector’s interface schema, you need to define
field accessors for each component of the collector’s interface schema. Field
accessors provide named access to any type of field within a record of a data set.
See the chapter Orchestrate Data Sets in the Orchestrate 7.0 User Guide for more
information. Field accessors normally are defined as a private data member of the
derived collector class. You then override the pure virtual function setupInputs()
to initialize the field accessors.
The following figure shows a collector that defines a single integer field named
collectorField as its interface schema:
collectorField:int32;
collector and
collector interface schema
out:*;
output data set
In this example, you override the pure virtual function setupInputs() to initialize
one field accessor for each partition of the input data set to access collectorField. If
your collection method does not access record fields, you still must override
setupInputs(), but it should just return APT_StatusOk. See “Example Collector
Derivation” on page 9-11 for a sample override of setupInputs().
Overriding APT_Collector::selectInput()
You must override the pure virtual function APT_Collector::selectInput() to
perform the actual collection operation. Here is the function prototype of
selectInput():
virtual int selectInput(int numPartitions) = 0;
selectInput() returns the number of the input partition supplying the next record
to the operator. Orchestrate calls selectInput() each time the operator reads a
record from the data set; you do not call it directly. The argument numPartitions
specifies the number of input partitions. Orchestrate passes numPartitions to
selectInput(), where numPartitions is guaranteed to be positive.
Your override of selectInput() must return an integer value denoting the partition
supplying the input record. This returned value must satisfy the requirement:
0 <= returnValue < numPartitions
collectorField:int32;
collector and
collector interface schema
out:*;
output data set
This operator uses a collector that determines the next field by inspecting a single
integer field. The partition whose current record has the smallest value for
collectorField supplies the record to the operator. To access the collectorField, the
collector defines one accessor for each partition of the input data set.
The collector’s schema and the operator’s input interface schema both contain an
integer field named collectorField. Therefore, they are compatible. If they were not,
you could create an view adapter to translate the interface schema. See “Example
Collector with a View Adapter” on page 9-16 for more information.
The code in Table 36 below shows the derivation of MyCollector, the collector for
this operator:
Code
#include <apt_framework/orchestrate.h>
public:
7 MyCollector();
8 ~MyCollector();
protected:
10 virtual APT_Status describeCollector();
virtual APT_Status setupInputs(int numPartitions);
virtual int selectInput(int numPartitions);
13 virtual APT_Status initializeFromArgs_(const APT_PropertyList
&args,
InitializeContext context);
private:
15 APT_InputAccessorToInt32 * collectorFieldAccessors;
16 int numParts;
21 MyCollector::MyCollector()
: collectorFieldAccessors(0), numParts(0)
{}
24 MyCollector::~MyCollector()
{
26 delete[] collectorFieldAccessors;
}
28 APT_Status SortPartitioner::initializeFromArgs_(const
APT_PropertyList
&args, APT_Partitioner::InitializeContext context)
{
return APT_StatusOk;
}
32 APT_Status MyCollector::describeCollector()
{
34 setInputInterfaceSchema(“record(collectorField:int32;)”);
return APT_StatusOk;
}
Comments
7 Include the default constructor for MyCollector. This constructor is required for
persistent classes.
18 With APT_DEFINE_OSH_NAME, you connect the class name to the name used
to invoke the operator from osh and pass your argument description string to
Orchestrate. See “Argument-List Descriptions” on page 5-3 and
$APT_ORCHHOME/include/apt_framework/osh_name.h for documentation on this
macro. Orchestrate’s argument-checking facility is fully described in Chapter 5,
“Orchestrate’s Argument-List Processor”.
32 Override APT_Collector::describeCollector.
48 Define minVal to hold the current minimum value for collectorField and initialize it
to INT_MAX, the largest supported APT_Int32.
49 Define minPartIndex to hold the number of the partition with the minimum value
for collectorField.
Once you have defined your collector, you can use it with a derived operator.
Typically, you define the collector within the override of
APT_Operator::describeOperator().
To use MyCollector with MySequentialOperator, you use
APT_Operator::setCollectionMethod() within the
APT_Operator::describeOperator() function of MySequentialOperator.
setCollectionMethod() allows you to specify a collector object for your operator.
Here is the function prototype of setCollectionMethod():
void setCollectionMethod(APT_Collector * collector,
const APT_ViewAdapter& adapter,
int inputDS);
The following figure shows a sequential operator that combines the records of an
input data set, based on a single integer field of the records:
out:*;
The input interface schema of the collector defines the integer field collectorField
that it uses to combine the records of an input data set. This schema is not
compatible with the interface schema of the operator, so an APT_ViewAdapter
object must be defined and initialized within the describeOperator() function of
the derived operator. Here is the describeOperator() function for
MySequentialOperator:
#include <apt_framework/orchestrate.h>
APT_Status MySequentialOperator::describeOperator()
{
setKind(APT_Operator::eSequential);
setInputDataSets(1);
setOutputDataSets(1);
setInputInterfaceSchema(
"record(field1:int32; field2:int32; field3:string; in:*;)", 0);
setOutputInterfaceSchema("record (out:*;)", 0);
APT_ViewAdapter collectorAdapter(
"collectorField = field1;");
10
Using Field Lists
Describes field lists which let you reference multiple schema components using a
single object.
Overview 10 1
Creating a Field List 10 2
The APT_FieldList Class Interface 10 2
Expanding a Field List 10 3
Using Field Lists with Operators 10 4
Overview
Field lists, represented by the class APT_FieldList, let you reference multiple
schema components using a single object. By doing so, you can pass one or more
field references to a function using a single argument.
You can also use field lists when creating operators with a dynamic interface
schema. This allows a user of the operator to specify some or all of the operator’s
interface schema.
This chapter describes field lists, then explains how to use field lists with
functions and operators.
10 1
Chapter 10 Using Field Lists Creating a Field List
A function that can operate on one or more fields of a data set using this record
schema could have the following prototype:
void processFields(const APT_FieldList& fList);
This function takes as an argument the list of fields to process. The field list you
include can have several forms, as shown below:
processFields("a, b, e"); // comma-separated list of fields
processFields("a - c"); // field range
processFields("a - c, e, f"); // field range and a comma-separated
// list
processFields("*"); // all fields
To create a field list, in other words, you use one or more of these elements:
• Individual field identifiers.
• Field ranges, which are two field identifiers separated by a hyphen. The first
field must appear earlier in the record schema definition than the second (the
fields must be in schema, not alphabetic, order). A field range includes all the
fields whose identifiers fall within the range.
• A wildcard (*), which represents all of the fields in the record schema.
You then specify the following record schema as the context and expand the list
using APT_FieldList::expand():
static char schema1[] = "record"
"( a:int32; "
" b:int32; "
" c:int16; "
" d:sfloat; "
" e:string; )";
listObject.expand(schema1);
The expanded field list contains three APT_FieldSelector objects: one each for the
fields a, b, and c. If the field list is already expanded, APT_FieldList::expand()
does nothing. Note that you cannot unexpand an expanded field list.
Selecting a different record schema as the context results in a different number of
selectors. Consider, for example, the following schema:
static char schema2[] = "record"
"( a:int32; "
" a1:int32; "
" a2:int16; "
" a3:sfloat; "
In this example, the expanded field list contains five field selectors, one for each
field in the record schema.
After you expand a field list, you can access the APT_FieldSelector for each field
in the list. Using the APT_FieldSelector, you can then determine the data type of
the field and process the field accordingly.
inRec:*;
DynamicOperator
outRec:*;
DynamicOperator:
• Takes a single data set as input
• Writes its results to a single output data set
• Has an input interface schema consisting of a single schema variable inRec
and an output interface schema consisting of a single schema variable outRec
To use this operator, you would specify the field list to its constructor, as shown
in this statement:
static char schema[] = "record" // Schema of input data set
"( a:int32; "
" b:int32; "
" c:int16; "
" d:sfloat; "
" e:string; )";
DynamicOperator aDynamicOp("a - c"); // Specify the input schema
Code
#include <apt_framework/orchestrate.h>
public:
DynamicOperator();
8 DynamicOperator(const APT_FieldList& fList)
: inSchema(), inFList(fList)
{}
protected:
virtual APT_Status describeOperator();
virtual APT_Status runLocally();
virtual APT_Status initializeFromArgs_(const APT_PropertyList
&args,
APT_Operator::InitializeContext context);
private:
16 APT_Schema inSchema;
17 APT_FieldList inFList;
. . .
APT_Status DynamicOperator::describeOperator()
{
setKind(APT_Operator::eParallel);
setInputDataSets(1);
setOutputDataSets(1);
38 APT_SchemaField inField;
inField.setIdentifier(“inRec”);
inField.setTypeSpec(“*”);
41 inSchema.addField(inField);
42 setInputInterfaceSchema(inSchema, 0);
setOutputInterfaceSchema(“record (outRec:*;), 0);
Return APT_StatusOK;
}
Comments
16 Define storage for the complete input interface schema to the operator.
17 Define storage for the input field list specifying the input schema.
25 Use APT_FieldList::expand() to expand the field list relative to the record schema
of the input data set.
Note that if the field list does not contain a wildcard or field range, the expand
function does nothing.
26 If any errors occurred during the expansion, print the description of the error and
return APT_StatusFailed from describeOperator().
32 Create a for loop to iterate through the field lists. For each element in the field list,
add a schema field to inSchema, the input interface schema of the operator.
38 - 41 After adding all the fields of the field list to the input interface schema, you must
create a schema field component for the schema variable "inRec:*;" and add that
component to the input interface schema.
Use APT_Schema::addField() to add the schema variable to allow transfers from
the input data set to the output data set.
11
Enabling Object Persistence
Tells you how to make your derived components persistent so that your C++
objects will maintain their integrity throughout storage and loading on
multiple nodes.
Overview 11 2
Simple Persistence and Complex Persistence 11 2
Storing and Loading Simple and Complex Persistent Objects 11 3
Creating Archives 11 3
The APT_Archive Class Interface 11 4
Using the Serialization Operators 11 4
Defining Simple-Persistent Classes 11 7
Defining Complex-Persistent Classes 11 8
The APT_Persistent Class Interface 11 9
Implementing APT_Persistent::serialize() 11 10
Serializing Pointers 11 11
Serializing Arrays 11 11
Persistence Macros 11 13
11 1
Chapter 11 Enabling Object Persistence Overview
Overview
As part of the process of executing a parallel application, Orchestrate creates
copies of objects, such as operators, data sets, and partitioners, and then transmits
and loads these objects into the memory of your multiple processing nodes. The
storing and loading of C++ objects to and from files and memory buffers is called
object persistence.
In order for your application to run on multiple nodes, all C++ objects utilized by
your Orchestrate application must support persistence to ensure that your objects
have data portability, structural integrity, valid inheritance, polymorphism, object
versioning, and interoperability with third-party software components.
All classes in the Orchestrate class library already support persistence. You must
add persistence only to those classes that you derive from Orchestrate base classes
or from your own class hierarchy. You add either simple or complex persistence to
a class depending on the functionality of the class.
The Orchestrate persistence mechanism is similar to the C++ stream I/O facility.
Instead of writing objects to streams or reading objects from streams, Orchestrate
stores and loads objects to an archive. An archive has an associated direction, or
mode: either storing for outputting objects or loading for inputting objects.
The process of storing or loading an object is called object serialization. Archives
use overloads of the input and output stream operators, operator>> and
operator<<, to load from and store to an archive. The Orchestrate persistence
mechanism also provides a bidirectional serialization operator, operator||,
which can both load from and store to an archive.
Orchestrate persistence provides two methods for making your classes persistent:
simple-persistence and complex-persistence. Simple-persistence requires less
memory, but it is not adequate for classes with complex functionality. The
following section lists the conditions that require complex-persistence for a class.
• You want data versioning for your class. This is necessary when you change
the definition of a class and then load an object that was stored using an older
class definition. At load time, complex-persistence automatically initializes
the object using the current class definition.
To make a class complex-persistent, you directly or indirectly derive your class
from the APT_Persistent base class.
If your class has none of the complex conditions outlined above, you can make
your class simple-persistent. To do this you simply need to define serialization
operators. Derivation from APT_Persistent is not necessary. The serialized
representation of a simple-persistent object consumes no archive storage other
than that used to serialize the object.
The following section tells you how to create archives and store and load objects
to and from archives. This information applies to both simple and complex
persistence.
These sections tell you how to define persistent classes:
• “Defining Simple-Persistent Classes” on page 11-7
• “Defining Complex-Persistent Classes” on page 11-8
Creating Archives
The Orchestrate base class APT_Archive defines the base-level functionality of
the archive facility. Orchestrate uses the derived classes APT_FileArchive and
APT_MemoryArchive to perform object serialization. APT_FileArchive is used
to store objects to a file or load objects from a file. APT_MemoryArchive is used
to store objects to a memory buffer or load objects back from a memory buffer.
This line of code creates a loading file archive using the file output.dat:
APT_FileArchive ar("output.dat", APT_Archive::eLoading);
Note that you establish the archive mode when you create the archive object. It’s
mode cannot be subsequently changed. Also, archives do not support a random-
access seeking mechanism. You must load objects back from an archive in the
order in which you store them.
Note that any complexity in the internal structure of Catalog does not complicate
the code required to store and load instances of Catalog.
The bidirectional operator, operator||, can perform either storing or loading. It
determines its action based on the mode of the archive supplied to the function.
Here is an example that uses operator||:
APT_FileArchive ar1("input.dat", APT_Archive::eLoading);
APT_FileArchive ar2("output.dat", APT_Archive::eStoring);
Catalog gCat;
ar1 || gCat; // load gCat from input.dat
ar2 || gCat; // store gCat to output.dat
The APT_Archive base class defines bidirectional serialization operators for the
following Orchestrate data types:
• signed or unsigned 8-, 16-, 32- or 64-bit integers
• single-precision (32 bits) or double-precision (64 bits) floats
• time of day
• boolean
The one-value serialization operators are shown below:
class APT_Archive
{
public:
...
friend APT_Archive& operator|| (APT_Archive& ar, APT_UInt8& d);
friend APT_Archive& operator|| (APT_Archive& ar, APT_Int8& d);
friend APT_Archive& operator|| (APT_Archive& ar, char& d);
class FPair
{
float x_, y_;
public:
...
friend APT_Archive& operator|| (APT_Archive& ar, FPair& d)
{
return ar || d.x_ || d.y_; }
};
// define operator<< and operator>>
APT_DIRECTIONAL_SERIALIZATION(FPair);
}
Note that you explicitly provide only the bidirectional serialization operator. By
including the APT_DIRECTIONAL_SERIALIZATION macro, you also provide
both the store and load unidirectional serialization operators. This allows FPair
objects to be serialized in the same manner as built-in types.
For example:
APT_MemoryArchive ar;
FPair fp = ...;
ar << fp;
For simple classes such as FPair, just defining serialization operators suffices to
make a class persistent. Simple-persistent classes have low overhead, since they
don't inherit from a persistence base class, and their serialized representation
consumes no archive storage other than that used to serialize the class members.
When loading, you always call the serialize() function on a default-constructed
object. This is because the loaded object has either just been dynamically
allocated, or it has been explicitly destroyed and default-constructed in place.
This policy simplifies the serialize() function, since it need not worry about the
previous state of an object. With simple persistence, a class's serialization operator
needs to recognize that it might be loading over the pre-load state of an object.
Note that when a simple-persistent object is loaded, the object's state is
overwritten by the serialization operator for the class. It is up to the serialization
operator to properly manage any state that the object may have had prior to
loading.
A crucial limitation of simple-persistent classes is that pointers to class objects
cannot be serialized. If it is necessary to serialize an object pointer, the class must
be complex-persistent.
#include <apt_framework/orchestrate.h>
APT_IMPLEMENT_PERSISTENT(ComplexPersistentClass); // Rule 2
APT_IMPLEMENT_RTTI_ONEBASE(ComplexPersistentClass, // Rule 5
APT_Persistent);
ComplexPersistentClass::ComplexPersistentClass() // Rule 3
: i_(0), f_(0.0) {}
CompexPersistentClass::ComplexPersistentClass(int i, float f)
: i_(i), f_(f) {}
Implementing APT_Persistent::serialize()
You use the macro APT_DECLARE_PERSISTENT() to declare
APT_Persistent::serialize() as a private member function in a derived class. The
function prototype of APT_Persistent::serialize() is:
void serialize(APT_Archive& archive, APT_UInt8);
• If an object has already been loaded, a second load is not performed. Instead,
operator|| simply initializes the pointer to reference the loaded object.
• If you are loading an object via a reference to a class derived from
APT_Persistent, Orchestrate destroys the object, default constructs the object
in place, and then calls APT_Persistent::serialize() to load the object from the
archive.
• When loading objects of simple data types, such as integers and floats,
APT_Persistent::serialize() simply loads the object.
During loading, an object’s previous state is irrelevant because the object is
always default constructed. Your definition of APT_Persistent::serialize() should
not have any effects other than reading, writing, or otherwise modifying the
object being serialized.
Serializing Pointers
Using classes that support persistence lets you save and load objects directly or
use a pointer to an object. Basic data types such as integers and characters,
however, do not support serialization via a pointer.
You must write your APT_Persistent::serialize() overload to handle serialization
of pointers to data types that do not support the Orchestrate persistence
mechanism. For example, char * pointers can often be replaced by an instance of
the persistent class APT_String. Because APT_String supports persistence, you
can serialize a reference to an APT_String object.
Serializing Arrays
The persistence mechanism does not directly contain support for serializing
arrays. If your classes contain array members, you must build support for array
serialization within APT_Persistent::serialize().
Arrays are relatively simple to serialize. The serialization code depends on
whether the array is fixed-length (and fixed-allocation) or variable-length (and
dynamically allocated).
The example below contains both fixed-length and variable-length arrays. In this
example, ObjClass is a persistent class and Container is a persistent class
containing two ObjClass arrays. When writing the APT_Persistent::serialize()
definition for Container, you would handle the ObjClass arrays as follows:
#include <apt_framework/orchestrate.h>
APT_DECLARE_RTTI(ObjClass);
public:
ObjClass();
. . .
}
class Container: public APT_Persistent
{
APT_DECLARE_PERSISTENT(Container);
APT_DECLARE_RTTI(Container);
public:
Container()
:variable_(0), nVariable_(0)
{}
~Container()
{
delete[] variable_;
}
. ..
private:
ObjClass fixed_[12]; // define a fixed-length array
ObjClass* variable_; // define a variable-length array
int nVariable_; // contains length of array variable_
}
Code
#include <apt_framework/orchestrate.h>
APT_IMPLEMENT_RTTI_ONEBASE(Container, APT_Persistence);
APT_IMPLEMENT_PERSISTENT(Container);
Comments
8 Use a simple for loop to serialize the elements of the fixed-length array.
Persistence Macros
This section describes the macros that you use when declaring and defining
classes that support the Orchestrate persistence mechanism.
• APT_DECLARE_ABSTRACT_PERSISTENT() declares the operator||,
operator<<, operator>>, and serialize() definitions within the definition of an
abstract base class that supports persistence.
#define APT_DECLARE_ABSTRACT_PERSISTENT(className);
#define APT_DECLARE_PERSISTENT(className);
12
Run-Time Type Information
Describes Orchestrate’s run-time type information facility.
Overview 12 1
Determining the Run-Time Data Type of an Object 12 2
Performing Casts 12 3
Adding RTTI Support to Derived Classes 12 3
RTTI Macros 12 5
Derivation Macros 12 5
Macro APT_IMPLEMENT_RTTI_ONEBASE() 12 6
Application Macros 12 6
Overview
Any derived object class that you want to use in an Orchestrate application must
support the Run-Time Type Information (RTTI) facility. This applies to any class
you derive from an Orchestrate base class, directly or indirectly. RTTI lets you
determine the data type of an object at run-time so that Orchestrate can optimize
functions for some data types, simplify general-purpose functions, and perform
checked casts on pointers.
Support for RTTI is also required by the Orchestrate persistence mechanism. See
Chapter 11, “Enabling Object Persistence” for information.
The Orchestrate RTTI facility is based on the approach described in The C++
Programming Language, Second Edition, by Bjarne Stroustrup.
12 1
Chapter 12 Run-Time Type Information Determining the Run-Time Data Type of an Object
Deriving your classes from a common base class is not necessary to support RTTI.
The RTTI facility supports:
• Multiple inheritance
• Virtual base classes
• Downcasting from a virtual base class
• Checked casts to and from any base pointer or derived pointer
The two primary objectives of the RTTI facility are to determine the run-time data
type of an object and to cast a pointer to a derived type or a base type. These
objectives are described in the next two sections.
The dynamic data type, on the other hand, can change at run-time, as shown here:
DClass dObject; // Static type is DClass.
BClass * basePtr = &dObject; // Static type of basePtr is BClass,
// but its dynamic is DClass.
const char * sType = APT_STATIC_TYPE(*basePtr).name(); // returns
// “BClass"
const char * dType = APT_DYNAMIC_TYPE(*basePtr).name();// returns
// "DClass"
Performing Casts
Casting lets you assign a pointer to another pointer of a different data type. The
data types of both the pointer and the casted pointer must support the RTTI
facility.
You perform checked casts using the APT_PTR_CAST() macro which converts a
pointer to a pointer of a new data type. If the cast cannot be performed, the
pointer is set to 0. The data types of both the pointer and the casted pointer must
exist in the same inheritance hierarchy.
Shown below is an example using APT_PTR_CAST(). This example uses the
class DClass derived from the base class BClass:
BClass bObject;
BClass * bPtr = &bObject;
// cast bPtr to type DClass
DClass * dPtr = APT_PTR_CAST(DClass, bPtr);
if (dPtr)
// APT_PTR_CAST() returns 0 if the cast cannot be performed
{
...
};
public:
ExampleOperator();
protected:
virtual APT_Status describeOperator();
virtual APT_Status runLocally();
};
APT_IMPLEMENT_RTTI_ONEBASE(ExampleOperator, APT_Operator);
APT_IMPLEMENT_PERSISTENT(ExampleOperator);
...
The class MI_Class is derived from two base classes: B1 and B2. Typically, both
base classes should support the RTTI facility. If a base class does not support
RTTI, that class cannot be used as part of a checked cast.
Note that MI_Class is a completely user-defined class; it is not derived from an
Orchestrate base class.
In the .C file, you must include the macros:
APT_IMPLEMENT_RTTI_BEGIN(MI_Class);
APT_IMPLEMENT_RTTI_BASE(MI_Class, B1);
APT_IMPLEMENT_RTTI_BASE(MI_Class, B2);
APT_IMPLEMENT_RTTI_END(MI_Class);
...
RTTI Macros
This section describes the Orchestrate macros that support RTTI. These macros
fall into two categories:
• Derivation macros
• Application macros
Derivation Macros
You use RTTI derivation macros when you are declaring and defining object
classes that support the Orchestrate RTTI.
• APT_DECLARE_RTTI(). You insert this macro in a class definition to
support the RTTI facility.
APT_DECLARE_RTTI(className);
Macro APT_IMPLEMENT_RTTI_ONEBASE()
If a derived class has a single base class, you use this macro in the source code file
of the derived class to specify the base class name.
APT_IMPLEMENT_RTTI_ONEBASE(className, baseClass);
Application Macros
You use the RTTI application macros to determine the data type of objects
instantiated from a class that supports the Orchestrate RTTI facility.
• APT_DYNAMIC_TYPE(). This macro returns an APT_TypeInfo object
describing the dynamic, or run-time, data type of an object. The object must be
instantiated from a class that supports RTTI.
APT_TypeInfo& APT_DYNAMIC_TYPE(object);
object: Specifies an object instantiated from a class that supports the RTTI
facility.
The example below uses APT_DYNAMIC_TYPE(), with the class DClass
derived from the base class BClass:
DClass dObject;
BClass * basePtr = &dObject;
destType: Specifies the resultant data type of the cast. The data types destType
and sourcePtr must exist in the same inheritance hierarchy; otherwise the
pointer is set to 0.
sourcePtr: A pointer to convert.
APT_PTR_CAST() returns 0 if sourcePtr does not exist in the same inheritance
hierarchy of type destType, or if sourcePtr is equal to 0.
• APT_STATIC_TYPE(). This macro returns an APT_TypeInfo object
describing the static data type of an object reference. The class must support
RTTI.
const APT_TypeInfo& APT_STATIC_TYPE(object);
13
Debugging Your Applications
Describes Orchestrate’s debugging facilities.
Overview 13 1
Choosing a Debugger 13 2
Specifying Program Execution Mode 13 2
Modifying the Configuration File 13 3
Debugging With Third Degree on Tru64 13 4
Debugger Options with dbx 13 5
Debugging in Sequential Mode 13 5
Debugging in Parallel Mode 13 9
Profiling Your Application 13 11
The Score Dump 13 12
Overview
Parallel processing implies the simultaneous execution of multiple processes on
multiple nodes. Under these circumstances, tracking program execution with a
debugger can be extremely difficult. For this reason, the Orchestrate development
environment offers several features that simplify the debugging of parallel
applications. Using these features, you can start debugging your application as a
sequential program executing in a single process on a single node, and then
proceed by degrees until you execute the program in full parallel mode.
13 1
Chapter 13 Debugging Your Applications Choosing a Debugger
Choosing a Debugger
Orchestrate supports the following debuggers:
• dbx on AIX and Sun Solaris
• gdb on HP-UX
• ladebug on Tru64
Add the path to your debugger to your PATH variable. Because all debugging is X-
based, your DISPLAY environment variable must be set to an explicit host name.
Note that a host name of :0.0 (meaning “this host's X server”) is not an acceptable
setting as it would configure Orchestrate to open debugger windows on the
individual nodes of the your machine.
If you are using dbx, the debugger starts in an xterm window; therefore,
Orchestrate must know where to find the xterm binary. The default location is
/usr/bin/X11/xterm. You can override this default by setting the
APT_PM_XTERM environment variable to the appropriate path.
Name your library file libop.so, and create an op.so link to it, where op is the
name of your operator. The first library is used to rebuild the osh executable;
the second library is used for operator registration.
For example, for the file sum.C, which contains the source code for the sum
operator, there should be two libraries: libsum.so and sum.so.
If the source code is not for operator implementation, it is not necessary to
create the link.
2 Rebuild the osh executable linked with your shared libraries:
/bin/cxx -D__STANDARD_TEMPLATE_SPECIALIZATION__
-D__STANDARD_TEMPLATE_INSTANTIATION__
-stdnew -ieee -msg_display_number -msg_disable 111
-D _XOPEN_SOURCE_EXTENDED -D _OSF_SOURCE -DRW_MULTI_THREAD -D_LP64
-pthread -g -no_inline osh.o -o osh.new
-L/APT_ORCHHOME_dir /lib -L/Custom_Operator_dir -lorchoslOSF1
-lorchOSF1 -lorchmonitorOSF1 -lorchcoreOSF1 -l*** -so_archive -
lpthreads -L/ICU_Lib_dir
-lustdio -licuuc -lm
Where:
• APT_ORCHHOME_dir is the value of your $APT_ORCHHOME variable.
• Custom_Operator_dir is your shared-library directory. Place the
shared libraries built in step 1 in this directory.
• ICU_Lib_dir is the directory where the ICU libraries are installed.
• Your shared libraries replace ***.
To obtain osh.o, the osh executable, send email to:
[email protected]
In one-process mode:
• The application executes in a single UNIX process. You only need to run a
single debugger session, and you can set breakpoints anywhere in your code.
• Data is partitioned according to the number of nodes defined in the
configuration file.
• Orchestrate calls APT_Operator::runLocally() as a subroutine. Each
operator’s runLocally() function is called the number of times appropriate for
the number of partitions on which it must operate.
• Orchestrate operators are saved using the persistence mechanism after
initialization, and are loaded each time before execution of runLocally().
In many-process mode, rather than calling operators as subroutines, the
framework forks a new process for each instance of each operator and waits for it
to complete. If APT_EXECUTION_MODE is set to NO_SERIALIZE, the
persistence mechanism is not used to load and save objects. Turning off
persistence may be useful for tracking errors in serialization code in your derived
classes. If turning off serialization in a program that previously crashed results in
the program executing correctly, the problem may be located in your serialization
code.
You can also set APT_DEBUG_OPERATOR to specify the operators to start your
debugger on. If not set, no debugger is started. If set to an operator number (as
determined from the output of APT_DUMP_SCORE), your debugger is started
for that single operator. If set to -1, your debugger is started for all operators. See
“The Score Dump” on page 13-12 for more information on APT_DUMP_SCORE.
In addition, setting APT_DSINFO causes the process manager to print a report
showing the operators, processes, and datasets in the step it is running.
When running Orchestrate in parallel mode, you can set the environment variable
APT_DEBUG_PARTITION to control which partitions your debugger starts on.
If set to a single number, your debugger is started for just that partition. If set to
-1, your debugger is started for all partitions.
The following table shows the interaction of the APT_DEBUG_OPERATOR and
APT_DEBUG_PARTITION environment variables:
During program execution in sequential mode, virtual data sets are written to
files in the directory defined by the environment variable APT_SEQ_VDS_DIR,
which is, by default, the current working directory. These files are named using
the prefix aptvds. This allows you to examine a virtual data set as part of
debugging your application. Virtual data sets are deleted by the framework after
they are used.
Because your application executes on a single processing node in sequential
execution mode, it has access only to the resources of that node. Therefore, when
using the sequential execution mode, you typically construct a testing data set as
a subset of your complete data set. The testing data set should be small enough
that it can easily be handled on a single processing node using available system
resources.
Most application testing and debugging can be done in sequential one-process
mode.
Here is a typical list of debugging steps:
1 Set APT_EXECUTION_MODE to ONE_PROCESS.
2 Edit the configuration file to define a single processing node. This avoids data
partitioning, allowing you to concentrate on debugging your main program
and operators.
3 Check the use of global and static member variables.
When running in full parallel mode, globals and statics cannot be used to
communicate between operators or between the main program and operators.
In one-process mode, however, there is no separation of the global memory of
the main program and operators; therefore, bidirectional communication
through globals and statics is (temporarily) possible. A bug of this type will
not manifest itself until you run in many-process mode or in parallel mode.
You should guard against this possibility by careful coding and quick
inspection during testing. Test and debug your main program: command line
parsing, logical flow, and so on.
4 Test and debug your overrides of the persistence function
APT_Persistent::serialize().
In parallel execution mode, the serialize() function is used to store copies of
objects, such as operators and data sets, transmit the copies to the processing
nodes in your system, and load the objects into memory on the processing
node. In sequential mode, even though the application executes on a single
node, serialize() is still used to save objects after initialization and to
reinitialize objects before each use.
Because incorrect serialization can cause a variety of problems at different
stages of testing, it is wise to check this code early in the process. Temporarily
setting APT_EXECUTION_MODE to NO_SERIALIZE turns off serialization
and may help to localize a bug in your serialization code.
Test and debug your operator overrides of
APT_Operator::describeOperator() and APT_Operator::runLocally().
APT_Operator::describeOperator() configures an Orchestrate operator for
execution. When you derive your own operators, you override
APT_Operator::describeOperator() to configure the operator.
APT_Operator::runLocally() defines the action of an operator. As the name
implies, this function executes locally on each processing node when you are
running in parallel execution mode. In one-process sequential mode, this
function is called as a subroutine for each input data set partition for an
operator. In many-process mode, this function is executed in a separate
process for each input data set partition.
Because APT_Operator::describeOperator() always executes sequentially,
even when running in parallel execution mode, it is easier to debug than
APT_Operator::runLocally(). Bugs that occur in APT_Operator::runLocally()
may occur in parallel and are therefore harder to debug, particularly in the
maintenance phase of programs. For this reason, you should put as much
setup and initialization as possible in APT_Operator::describeOperator().
For example, it is better to initialize data structures in
APT_Operator::describeOperator() and store them in data members of an
operator class than to build and initialize the data structure on first use within
APT_Operator::runLocally(). Note that this makes serialization more
complicated, because more information must be serialized.
5 Test and debug your custom partitioners. In an earlier step, you edited the
configuration file to define just one processing node, so the partitionInput()
functions of your custom partitioners have so far only been called with an
argument of 1. If this seems to work, edit the configuration file to increase the
number of nodes and test the ability of your partitioners to divide data sets
The forking of the new process in many-process mode has two implications for
debugging. First, changes to global variables in the main program will be visible
to operator instances, although the reverse is not true. As stated earlier,
communication using global variables is not possible during parallel execution
and should be considered a bug, even though the application may work as
intended when run in sequential mode.
The second implication of forking is that operator instances in the main process
are never actually used, and remain freshly initialized while the forked processes
are executed. Because the forked processes are exact copies of the main process,
each forked operator instance is automatically initialized as it is forked. Thus, the
persistence mechanism is redundant in the many-process execution mode. To test
for bugs in the serialization code, you can turn off persistence by setting
APT_EXECUTION_MODE to NO_SERIALIZE.
To enable profiling of your operators, compile and link your dynamic libraries
with the -pg option, and profile your application as a sequential program
executing as a single process on one node.
When you run your application with osh.pg, a profile file named gmon.out is
generated. Use this command to produce a profile data report:
gprof [options ] osh.pg gmon.out
See the gprof man page for a detailed description of this utility.
For example, shown below is the data-flow diagram for a simple application with
one step and two operators:
test_ds.ds
step
TestOperator0
TestOperator1
out_ds.ds
Output
ORCHESTRATE VX.Y
10 It has 2 operators:
11 op0[2p] (parallel TestOperator0) on node0[op0,p0] node1[op0,p1]
12 op1[2p] (parallel TestOperator1) on node0[op1,p0] node1[op1,p1]
Comments
7 Information about ds0, the first data set. Data sets are number in the form dsNN,
where NN specifies the data set number.
ds0 is read from the file test_ds.ds and is used as input to operator op0 (an instance
of TestOperator). The number in brackets, [2p], indicates that op0 has two
partitions.
8 Information about ds1. The data set is delivered as output by op0 and is used as
input by op1 (an instance of TestOperator).
9 Information about ds2. The data set is delivered as output by op1 and is written to
the file out_ds.ds.
11 Information about op0, the first operator. Operators are numbered in the form
opNN, where NN is the number of the operator in the step.The operator is an
instance of class TestOperator. It runs on two nodes, called node0 and node1 (the
available nodes were defined in the Orchestrate configuration file). The numbers
in brackets after each node indicate both the operator and the partition numbers,
which are used when debugging in parallel execution mode.
12 Information about op1. The operator is an instance of class TestOperator and runs
on nodes node0 and node1.
Note that the operators, data sets, and nodes are numbered sequentially from 0.
The operator and partition numbers are used to set breakpoints when executing
in parallel, as described in“Debugging in Parallel Mode” on page 13-9.
14
Using the Error Log
Describes the Orchestrate error log facility and the APT_ErrorLog class.
Overview 14 1
The APT_ErrorLog Class Interface 14 2
Base Class Error Log Functions 14 3
The Error Log Compared to the Assertion Failures Facility 14 3
Using the Error Log 14 4
Error Log Structure 14 4
Defining Message Identifiers 14 5
Using the Error Log in a Derived Operator 14 6
Overview
Orchestrate allows you to derive your own classes from its base classes. As part of
deriving a class, you must provide overrides for any pure virtual functions in the
base class and, optionally, overrides for any virtual functions.
Users of your derived class call its member function to create and configure a
class instance. As part of this process, users may cause an error or warning
condition. Errors are often the result of an invalid configuration or missing
configuration information.
In Orchestrate, you can use the error-log facility in your derived class to hold
error and warning messages; Orchestrate then displays the accumulated
information to the user.
14 1
Chapter 14 Using the Error Log The APT_ErrorLog Class Interface
In order to relay error or warning information to the user, you write a message to
the APT_ErrorLog for that object. You may also write an informational message
to the error log that is not an error or a warning, but may be useful to users for
detecting and correcting problems.
If the condition is mild, meaning the object can still be used, you write a warning
message to the error log and return APT_StatusOk from the function override.
Upon return from an overridden function, Orchestrate checks the error log for the
object. If any warning messages are stored in the log, they are written to standard
error. By default, standard error corresponds to the screen on the workstation
used to invoke the application.
If the condition is severe enough that the object is unusable, you write the error
information to the log. Additionally, a function override should return
APT_StatusFailed in the case of an error. When a function returns
APT_StatusFailed, any error or warning information in the log is printed to
standard error and the application is terminated. For this reason, you should
make sure that error messages clearly state any information that will aid the user
in determining the cause of the error.
A function may detect several different error or warning conditions. The error log
allows you to write multiple messages to it, for errors, warnings, and any
additional information. You can use the member functions of APT_ErrorLog to
return a count of the number of errors, warnings, or informational messages
written to it.
Many function overrides combine both assertions and the error log. See the
Orchestrate 7.0 User Guide for more information.
getError()
resetError()
error log
logError()
getWarning()
operator* resetWarning()
accumulator warning log
logWarning()
getInfo()
logInfo() resetInfo()
information log
APT_ErrorLog
As you can see in this figure, you write a string containing error, warning, or
other information to the accumulator using the operator* function. The operator*
function allows you to use the following syntax to create an error log and write to
the accumulator:
APT_ErrorLog eLog(APT_userErrorSourceModule);
*eLog << "a string";
// Use the appropriate function to copy the accumulator to a log
// buffer
resetError() to clear the buffer. You use similar functions to access the warning
and informational log buffers.
You can also use the member function APT_ErrorLog::dump() to display the
information stored in an error log. This function writes all the information
contained in an APT_ErrorLog to standard error on the workstation that invoked
the application, then resets all the log buffers to the empty state.
One use of dump() is to periodically purge an APT_ErrorLog that may contain a
large number of messages. For example, you may have an error or warning
caused by each record processed by an operator. Because an operator can process
a huge number of records, the amount of memory consumed by an
APT_ErrorLog object can become correspondingly large. Using dump(), you can
purge the APT_ErrorLog object to prevent the object from overflowing memory.
When writing an error message to the error log, you would then specify a
message index value as shown below:
Code
#include <apt_framework/orchestrate.h>
public:
ExampleOperator();
8 void setKey(const APT_String& name);
void setStable();
protected:
virtual APT_Status describeOperator();
virtual APT_Status runLocally();
virtual APT_Status initializeFromArgs_(const APT_PropertyList &args,
APT_Operator::InitializeContext context);
private:
15 int numkeys;
// other data members
. . .
APT_Status ExampleOperator::describeOperator()
{
setKind(APT_Operator::eParallel);
setInputDataSets(1);
setOutputDataSets(1);
setInputInterfaceSchema("record (iField:int32; sField:string)”, 0);
setOutputInterfaceSchema(“record (iField:int32; sField:string)”, 0);
25 if (numKeys == 0)
{
26 APT_ErrorLog& eLog = errorLog();
27 *eLog << “no keys specified for operator.”;
28 eLog.logError(MESSAGE_ID_BASE + 1);
29 return APT_StatusFailed;
}
Return APT_StatusOK;
}
Comments
15 The variable numKeys is a private variable containing the number of key fields
specified using setKey().
25 If the number of keys is 0, the user has not specified a key field, and an error
condition exists.
Note that you do not have to return from the function after detecting a single
error or warning. You may want to execute the entire function, logging multiple
error and/or warnings before returning. In this case, you can define a flag to
signal that an error occurred and check that flag before returning from the
function.
15
Advanced Features
Presents the functionality of several advanced features.
15 1
Chapter 15 Advanced Features Generic Functions, Accessors, and Dynamic Schema
APT_GenericFunction
TypeOpGF
TypeOpStringGF TypeOpInt32GF
The code also demonstrates how generic accessors can be safely cast into strongly
typed accessors by casting an APT_InputAccessorBase pointer to both a string
pointer and an int32 accessor. Using Orchestrate’s default conversion mechanism,
the code also converts int8 field values to int32 values. In addition, the code
demonstrates how to build a schema variable field and add it to the schema so
that entire records can be transferred without change from input to output.
The following table shows the code of types.C. The code is followed by comments
which are keyed to code line numbers. Table 42 follows. It contains the type.h
header file which defines the TypesOp sample operator. The comments for Table
42 are included directly in the code.
Table 41 types.C
Code
1 #include “types.h”
Table 41 types.C
public:
7 TypeOpGF(char *type) : APT_GenericFunction(type, “typeop”, ““) {};
APT_IMPLEMENT_RTTI_ONEBASE(TypeOpGF, APT_GenericFunction);
APT_IMPLEMENT_ABSTRACT_PERSISTENT(TypeOpGF);
public:
TypeOpStringGF() : APT TypeOpGF(“string”)
{}
APT_IMPLEMENT_RTTI_ONEBASE(TypeOpStringGF, TypeOpGF);
APT_IMPLEMENT_PERSISTENT(TypeOpStringGF);
Table 41 types.C
APT_DECLARE_RTTI(TypeOpInt32GF);
APT_DECLARE_PERSISTENT(TypeOpInt32GF);
public:
TypeOpInt32GF() : TypeOpGF(“int32”)
{}
APT_IMPLEMENT_RTTI_ONEBASE(TypeOpInt32GF, TypeOpGF);
APT_IMPLEMENT_PERSISTENT(TypeOpInt32GF);
int registerGFs()
{
APT_GenericFunction *tmp = new TypeOpStringGF();
APT_GenericFunctionRegistry::get().addGenericFunction(tmp);
tmp = new TypeOpInt32GF();
APT_GenericFunctionRegistry::get().addGenericFunction(tmp);
return 0;
}
67 APT_IMPLEMENT_RTTI_ONEBASE(TypesOp, APT_Operator);
APT_IMPLEMENT_PERSISTENT(TypesOp);
69 APT_REGISTER_OPERATOR(TypesOp, types);
72 TypesOp::TypesOp()
{}
Table 41 types.C
#define getDFloat(PROP) ( PROP.valueDFloat() )
#define getInt32(PROP) ( APT Int32) PROP.valueDFloat() )
#define isInt32(PROP) ( getDFloat (PROP) == getInt32(PROP) )
85 APT_Status TypesOp::describeOperator()
{
setInputDataSets(1);
setOutputDataSets(1);
APT_Schema schema=viewAdaptedSchema(0);
94 if (APT_PTR_CAST(APT_Int8Descriptor, fd))
{
APT_FieldTypeDescriptor *int32fd=APT_FieldTypeRegistry::get().
lookupBySchemaTypeName(“int32”);
field.setTypeDescriptor(int32fd);
}
100 }
APT_SchemaField schemaVar;
schemaVar.setIdentifier(“in”);
schemaVar.setKind(APT_SchemaField::eSchemaVariable);
schema.addField(schemaVar);
setInputInterfaceSchema(schema, 0);
setOutputInterfaceSchema(“record (out:*)”, 0);
declareTransfer(“in”, “out”, 0, 0);
setCheckpointStateHandling(eNoState);
return APT_StatusOk;
}
Table 41 types.C
APT_Status TypesOp::runLocally()
{
APT_Status status = APT_StatusOk;
int count=0;
for(; count < schema.numFields();)
{
118 if (schema.field(count).kind() != APT_SchemaField::eValue)
schema.removeField(count);
else
count++;
}
APT_OutputCursor outCur;
136 setupOutputCursor(&outCur, 0);
transfer(0);
Table 41 types.C
outCur.putRecord();
149 }
delete[] gfs;
delete[] accessors;
return status
}
Comments
1 Include the header file, types.h. It defines the TypesOP operator which is directly
derived from APT_Operator.
15-30 & These code lines define the TypeOpStringGF and TypeOpInt32GF classes that
36-53 provide string and int32 implementations of generic functions. Except for data
type, the two implementations are very similar.
Overriding clone() is an APT_GenericFunction base-class requirement.
The display() function displays the data type and value. You can call this type of
function to perform many kinds of operations with any number of arguments. For
example: . . .
The display() function in this case takes a generic accessor base pointer so that the
return value can be safely cast to any Orchestrate data type pointer.
Table 41 types.C
67-69 The APT_IMPLEMENT_RTTI_ONEBASE macro supports run-time type
information. The APT_IMPLEMENT_PERSISTENT macro defines Orchestrate’s
persistence mechanism, and APT_REGISTER_OPERATOR relates an operator to
its osh invocation.
70 The argument description string for TypesOp is empty because the operator takes
no arguments.
83 The serialize method is part of the Orchestrate persistence mechanism. Before the
operator is parallelized, this method is called to archive the values of its member
variables; and it is called again after parallelization, to restore those variables from
the archived values in each parallel copy of the operator. In this case, there are no
members to serialize.
94-100 A simple assignment is used to convert int8 types into int32 types because
Orchestrate supplies a default conversion between these types. When there is no
default conversion, an adapter must be used to perform the conversion.
118-130 In the parallel execution method runlocally(), schema variables are removed from
a copy of the schema in order to exclude them from having accessors assigned to
them. This method also skips over subrecs and tagged fields.The runlocally()
method then determines how many accessors are needed, and declares pointers to
generic functions to be used with the accessors.
131-136 The method also defines a cursor for the input data set, sets up accessors to their
corresponding fields, defines a cursor for the output data set.
137-149 The while loop iterates over the available input records, calling the generic
functions. The loop could also have used APT_InputAccessorBase::type() to
return an enum on which to run switch(); however, in many cases that technique is
not as efficient as using generic functions, particularly when there are a lot of fields
in a record. Since a generic function has not been created for every datatype, the
code guards against generic functions that have not been allocated.
Table 42 types.h
#include <apt_framework/orchestrate.h>
public:
// constructor
TypesOp();
protected:
// pre-parallel initialization
virtual APT_Status describeOperator();
Defining Units-of-Work
The producing operator is responsible for dividing its records into units-of-work
and for determining where to pick up on subsequent incoming units-of-work.
Any operator or dataset read which does not have specific unit-of-work semantics
has its entire output dataset treated as part of every unit-of-work for a step.
In the runLocally() or doFinalProcessing() method of your operator, your calls to
the markEndOfUnit() method on output cursors divides the records into units-
of-work.
The header file for this method is:
$APT_ORCHHOME/include/apt_framework/cursor.h
An operator that calls the markEndOfUnit() function on any output cursor does
not propagate any further incoming end-of-unit markers from its input datasets.
Instead, the operator may progatate its own end-of-unit markers.
Per-Partition Processing
The postFinalRunLocally() function is invoked for each dataset partition after
runLocally() is called. When unit-of-work processing is done, the runLocally()
function may be called multiple times. In this case, the postFinalRunLocally()
function is called after the last invocation of runLocally().
Combinable Operators
A combinable operator is an operator that is managed by the combinable operator
controller. A combinable operator processes an input record and then returns
control to the controller. This frees the Orchestrate framework from waiting for
the operator to consume all of its input. All Orchestrate combinable operators are
based on the abstract base class APT_CombinableOperator.
Advantages
Combinable operators can substantially improve performance for certain kinds of
operations by reducing the record transit time between operators. They do this by
eliminating the time that non-combinable operators need to pack records into a
buffer as the data passes between operators.
In addition, combinable operators allow you to process records whose size
exceeds the 32 kilobyte limit imposed by Orchestrate. When data flows between
two non-combinable operators, it is stored in a buffer which limits record size to
32 kilobytes. With combinable operators there is no buffer limit, so record size is
essentially unlimited until the combinable operator outputs to a non-combinable
operator or repartitions.
Disadvantages
Although combinable operators can confer substantial performance advantages
to data flows, combinable operators expose the complexity of the Orchestrate
internal API and as such their use comes with certain risks. These risks may be
minimized, however, by following the guidelines in this chapter. Also,
combinable operators are not always an appropriate choice. For example, using
combinable operators reduces pipeline parallelism and so could actually slow
down the data flow. See “Limitations and Risks” on page 17.
Another example shows how to write a combinable operator that can output
multiple records for each input record, making use of the
requestWriteOutputRecord() and writeOutputRecord() methods which are
described later in this chapter. Fully documented source code is in the file
$APT_ORCHHOME/examples/Custom_Ops/OrchAPI/ComplexCombinable/twiddle.C.
Note The source code files for this example and the first example have the same
filename, twiddle.C. Make sure you are in the proper directory when looking for
sample code.
Virtual Methods
A virtual method is a method which you must implement. The virtual methods
listed are called by the framework.
• APT_Status doFinalProcessing()
This method is called once per operator instance. If this method is only
outputting one record, use transferAndPutRecord(). If there are multiple
outputs for a operator instance, use requestWriteOutputRecord() instead,
which returns control to the framework to process the outputs.
• APT_Status doInitialProcessing()
This method lets you generate output records before doing input processing.
This method is called once per operator instance. If this method is only
outputting one record, use transferAndPutRecord(). If there are multiple
outputs for a operator instance, use requestWriteOutputRecord() instead,
which returns control to the framework to process the outputs. For an
example of how this method is used, see twiddle.C, lines 351 - 389.
• void outputAbandoned(int outputDS)
This method sends a notification to the combinable operator that the output is
no longer needed. This method is called by the framework when the operator
following this operator has received all the input it needs.
• void processEOF(int inputDS)
You use this method for processing that needs to be done after the last record
for the input data set has been received. The framework calls this method once
for each input.
You use this method to apply processing on a per record basis. Use no more
than one putRecord() or transferAndPutRecord() per output data set for each
input record. Call requestWriteOutputRecord() for each additional record
output in the method. For an example of how this is used, see twiddle.C, line
391 to the end of the file.
• APT_Status writeOutputRecord()
Non-Virtual Methods
A non-virtual method is one that you do not need to implement. The combinable
operator that you write can call the following non-virtual methods.
• void abandonInput(int inputDS)
This method notifies Orchestrate to stop sending records from inputDS to the
operator. After this function is called, atEOF() for the input will return true
and processInputRecord() cannot be called with this input as the argument. If
atEOF(inputDS) is true, this function does nothing. Conceptually,
abandonInput() is called automatically for all inputs when the operator
terminates.
Only call this method if inputDS is less than inputDatasets() and
inputConsumptionPattern() equals eSpecificInput.
• int activeInput()
Returns the number of the currently active input. This value remains stable
throughout the entire dynamic scope of processInputRecord() or
processEOF(). Calls to setActiveInput() or advanceToNextInput() do not
affect the value of activeInput().
• void clearCombiningOverrideFlag()
Sets the combiningOverrideFlag() flag for the operator to false and sets the
hasCombiningOverrideFlag() flag to true.
This function returns the combining override flag for the operator. It returns
true if setCombiningOverrideFlag() was called and false if
clearCombiningOverrideFlag() was called. Do not call this function unless
hasCombiningOverrideFlag() is true.
• bool hasCombiningOverrideFlag()
This function indicates if an operator's combining override flag value has been
set. If this flag is set to true, operator combining will be prevented and
combinable operators will be treated as ordinary operators. See also the
related functions combiningOverrideFlag(), setCombiningOverrideFlag(),
and clearCombiningOverrideFlag().
• APT_InputAccessorInterface* inputAccessorInterface(int inputDS)
This method provides access to the input accessor interface associated with
each input defined for this operator. It is to be used for setting up input
accessors. Call this method in doInitialProcessing() instead of exposing
input cursors.
To call this method, inputDS must be nonnegative and less than
inputDataSets().
This value specifies that the operator exercises direct control over the
consumption pattern by means of the setActiveInput() or
advanceToNextInput() functions. This is the default.
– eBalancedInput
This method provides access to the output cursor associated with each output
defined for this operator. Use it to set up output accessors, and to call
putRecord().
For an example of how this is used, see the call to this method from
doInitialProcessing() in twiddle.C, lines 385–386.
• int remainingOutputs()
Returns the number of outputs this operator has not counting abandoned
outputs.
• void requestWriteOutputRecord()
Use this method when your combinable operator method needs to output
multiple records for a single input record.
Within the functions doInitialProcessing(), processInputRecord(),
processEOF(), doFinalProcessing(), and writeOutputRecord(), at most one
putRecord() or transferAndPutRecord() operation can be performed per
output port.
If your operator requires additional putRecord() operations, it must call
requestWriteOutputRecord() to schedule a call to writeOutputRecord(). The
requested call to writeOutputRecord() will take place before any other calls
back into this operator.
Multiple calls to this method within a single activation of
doInitialProcessing(), processInputRecord(), processEOF(),
doFinalProcessing(), and writeOutputRecord() have the same effect as a
single call. To write out multiple records, call requestWriteOutputRecord()
from writeOutputRecord() after the previous record has been output.
• void setCombiningOverrideFlag()
This method sets the termination status of the operator as specified and
terminates as soon as the current function returns. If all input has not been
consumed, a warning is issued and the remaining input is consumed.
This method must be called only from within the dynamic scope of
doInitialProcessing(), processInputRecord(), processEOF(), or
writeOutputRecord(). It may not be called during or after
doFinalProcessing().
Index
A interface 14 2
accessors member functions 14 2
advanced 15 1 APT_EXECUTION_MODE environment
accessors and cursors variable 13 5
introduction 7 2 APT_FieldConversion class 6 5
Adobe Acrobat Reader APT_FieldConversionRegistry class 6 5
in the Orchestrate install xvii APT_FieldList class 10 1
reading online documentation xvi interface 10 2
advanced features APT_HashPartitioner class
accessors 15 1 deriving a hash-by-field partitioner 8 5
data set buffering 15 18 APT_InputCursor class
dynamic schema 15 1 interface 7 3
generic functions 15 1 APT_Operator class
aggregate fields derivation
field accessors to 7 31 example with arguments 1 9
APT_Archive Class example with no arguments 1 5
interface 11 4 examples with options 1 33
APT_Collector class member functions 1 14
interface 9 6 interface 1 4
operation of 9 7 APT_OutputCursor class
APT_CompositeOperator class interface 7 3
example derivation 2 5 APT_PartioningStyle 1 23
interface 2 4 APT_Partitioner class
APT_CONFIG_FILE environment variable 13 interface 8 8
3 overriding partitionInput() 8 10
APT_DEBUG_OPERATOR environment APT_Persistent class
variable 13 6, 13 11 deriving complex persistent classes 11 8
APT_DEBUG_PARTITION environment interface 11 9
variable 13 11 member functions 11 10
APT_Decimal class APT_PM_XTERM environment variable 13 2
interface 7 13 APT_Property class
APT_DEFINE_OSH_NAME macro 5 3 encoding operator arguments 5 19
APT_DUMP_SCORE environment APT_PTR_CAST() macro 12 7
variable 13 12 APT_RangePartitioner class
APT_ErrorLog class deriving a range partitioner 8 7
example derivation 14 6 APT_RawField class
Index 1
Index
methods 15 13 E
the controller 15 17 environment variable
compiler for your platform xii APT_ENGLISH_MESSAGES 4 5
compiler options xii APT_MSG_FILELINE 4 5
composite operators environment variables
an example 2 5 APT_CONFIG_FILE 13 3
diagram with two suboperators 2 2 APT_DEBUG_OPERATOR 13 6, 13 11
configuration file APT_DEBUG_PARTITION 13 11
for debugging 13 3 APT_DUMP_SCORE 13 12
cursors APT_EXECUTION_MODE 13 5
accessing records 7 3 APT_PM_XTERM 13 2
cursors and accessors
APT_SEQ_VDS_DIR 13 7
introduction 7 2
DISPLAY 13 2
custom operator
error log
executing 5 25
an example 14 6
customer support services xviii, 2 xviii
compared to the asseration-failures
facility 14 3
structure 14 4
D error logs
date fields base class member functions 14 3
field accessors to 7 27 execution mode
dbx specifying 13 2
debugger options 13 5 execution of your custom operator 5 25
debugger for your platform xii external data
debugger options xii declaring for MetaStage 1 57
with dbx 13 5
debugging 13 1
in parallel execution mode 13 9 F
in sequential execution mode 13 5 field accessors
modifying the configuration file 13 3 accessing record fields 7 7
score dump 13 12 Orchestrate data type accessor classes 7 8
sequential execution mode to date fields 7 29
steps 13 7 to decimal fields 7 12
setting the execution mode 13 2 to fixed length vector fields 7 13
decimal fields to nullable fields 7 17
field accessors to 7 12
to numeric fields 7 10
defining units-of-work 15 10
to raw fields 7 26
deriving field lists 10 1
to string fields 7 21
DISPLAY environment variable 13 2 to time fields 7 29
documentation
to timestamp fields 7 29
Orchestrate titles xiv
to variable length vector fields 7 15
osh syntax conventions xvi
field lists
reading with the Adobe Acrobat
an example with an operator 10 4
Reader xvi
creating 10 2
searching for text xvii
expanded 10 3
typographic conventions xv
operator interface schema 10 4
dynamic interface schemas 1 19
record schema 10 3
dynamic schema 15 1
unexpanded 10 3
R S
raw fields schema variables
data type 7 27 introducing 1 19
field accessors to 7 26, 7 27 schemas
record fields specifying 1 17
accessing with field accessors 7 7 score dump 13 12
aggregate fields sequential operators
field accessors to 7 31 creating 1 16
date fields serialization
field accessors to 7 29 of arrays 11 11
decimal fields of pointers 11 11
field accessors to 7 12 serialization operators 11 4
fixed length vector fields sink operators in Orchestrate 1 59
field accessors to 7 13 sort insertion
nullable fields disabling 1 25
field accessors to 7 17 sort keys
numeric fields setting and getting for an input data set 1
field accessors to 7 10 24
raw fields setting and getting for an output data
data type 7 27 set 1 24
field accessors to 7 27 specifying 1 24
string fields source operators in Orchestrate 1 59
data type 7 22 string field
field accessors to 7 22 converting to ustring field 6 2
subrecord fields string fields
field accessors to 7 31 data type 7 22
tagged aggregate fields field accessors to 7 21, 7 22
field accessors to 7 31 string to ustring data type conversions 6 2
time fields subcursors
field accessors to 7 29 accessing subrecord fields 7 33
timestamp fields subprocess operators
field accessors to 7 29 introduction 3 1
variable length vector fields overriding APT_
field accessors to 7 15 SubProcessOperator::runSink() 3 4
records overriding APT_
accessing using cursors 7 3 SubProcessOperator::runSource() 3 3
accessing using cursors and accessors 7 2 passing a command line string 3 5
RTTI reading buffers 3 4
dynamic data types 12 2 writing records 3 3
macros subrecord fields
application macros 12 6 accessing 7 33
derivation macros 12 5 field accessors to 7 31
performing casts 12 3 suprocess operators
static data types 12 2 overriding APT_
support for derived classes 12 3 SubProcessOperator::commandline() 3
run-time data type 5
determining 12 2 syntax conventions for osh commands xvi
run-time type information
support for derived classes 12 3
T
tagged aggregate fields
field accessors to 7 31
time fields
field accessors to 7 27
timestamp fields
field accessors to 7 27
transaction-like processes 15 9
type conversions
default 6 2
defining custom functions 6 5
example
creating new conversions 6 9
using pre-defined functions 6 6
non-default predefined 6 2
U
Unicode utilities for National Language
Support 7 25
unit-of-work functionality 15 9
resetting operator state 15 11
unit-of-working functionality 15 10
usage strings
argument-list description derivation 5 21
ustring field
converting to string field 6 2
ustring to string data type conversions 6 2
V
vector fields
field accessors to 7 13, 7 15
view adapters
deriving from APT_ViewAdapter class 8
16