Datastage Configuration File
Datastage Configuration File
Parallel Jobs
Objectives
Having completed this module the
student will be able
to create/edit a parallel job
configuration file
to specify node pooling
to specify disk pooling
Configuration File
A parallel job configuration file consists of
the following elements.
A pair of curly braces enclosing one or more
processing definitions.
Each processing node definition consists of:
the logical name for the processing node
the name of the machine on which this processing
node will appear
a pair of curly braces containing
optionally a node pool definition
{ Node Definition
node "name" {
[ fastname "hostname" ]
[ pool "poolname" "poolname" ... ]
[ resource resource_type "location" { [ "name" ] }
[ pool "diskpoolname" ] ... ]
[ resource resource_type "value" ... ]
}
[ more node definitions ]
}
Where Is It?
Location specified by environment
variable APT_CONFIG_FILE
Default name of configuration file is
default.apt
Default location is:
Configurations subdirectory in
DataStage Engine directory
${APT_ORCHHOME}/etc
Node Name
Logical name for processing node
Must be unique within the
configuration file
for example node0, node1, node2,
node3
Node name used in error/warning
messages and diagnostic output
Number of nodes does not
necessarily correspond to number of
CPUs
Fast Name
Identifies host machine on which
processing node appears
Name by which that machine is
referred to on the fastest network in
the system
Node Pools
Name(s) of pool(s) to which this
node is allocated
space-separated list of quoted pool
names
Some node pool names are reserved
DB2, INFORMIX, ORACLE
sas, sort, syncsort
These nodes will perform a task that
is restricted to this pool
input link properties of stage
Node Pools
{
node "node0" {
fastname "srvr0"
pool "" "p1" "app1"
resource disk "/d0/data"
resource scratch disk "/d0/scratch"
}
node0 node1 node "node1" {
fastname "srvr0"
pool "" "p2" "app1"
resource disk "/d1/data"
resource scratch disk "/d1/scratch"
node2 node3 }
node "node2" {
fastname "srvr0"
pool "" "p3" "app1" "app2"
resource disk "/d2/data"
resource scratch disk "/d2/scratch"
}
node "node3" {
fastname "srvr0"
pool "" "p4" "app2" "sort"
resource disk "/d3/data"
resource scratch disk "/var/scratch"
resource scratch disk "/d3/scratch"
}
}
Disk Pools
{
node "node0" {
fastname "srvr0"
pool "" "p1" "app1"
resource disk "/d0/data" {"big"}
resource scratch disk "/d0/scratch"
}
node0 node1 node "node1" {
fastname "srvr0"
pool "" "p2" "app1"
resource disk "/d1/data" {"big"}
resource scratch disk "/d1/scratch"
node2 node3 }
node "node2" {
fastname "srvr0"
pool "" "p3" "app1" "app2"
resource disk "/d2/data"
resource scratch disk "/d2/scratch"
}
node "node3" {
fastname "srvr0"
pool "" "p4" "app2" "sort"
resource disk "/d3/data"
resource scratch disk "/var/scratch"
resource scratch disk "/d3/scratch"
}
}
Allocation of Resources
Occurs at run time
More than one phase
Constraints from PX Engine arguments
matched against any pools in configuration file
Additional constraints (perhaps) from
explicit requirement for same degree of
parallelism as previous stage
Stage allocates resources on the nodes
that are still available
Hints and Tips
"Give all nodes all the disk"
engine uses fairly large blocks for I/O,
so contention is unlikely to be an issue
round robin allocation
Having only one disk per node is
limiting
disk access limits speed of node
Hints and Tips
Avoid disks where input files are
at least till input phase is finished
Ensure I/O is hitting different
spindles
Never use NFS for scratch disk
avoid using NFS for storage disk
More Information
Manager manual
The Parallel Engine Configuration File
On-line Help
topic Configuration File
links to Configuration File Editor
Review Questions, Lab Exercise
Answer the review questions for
Module 5 in your Lab book
Do the Lab Exercises for Module 5 in
your Lab book