0% found this document useful (0 votes)
1K views16 pages

Datastage Configuration File

This document discusses parallel job configuration files in 3 sentences: The configuration file specifies node and disk pooling as well as resource definitions for each processing node using a syntax of curly braces, node names, and resource types and locations. Node pools reserve nodes for certain tasks while disk pools allocate disks to nodes. The configuration file is used at runtime to allocate resources to stages based on constraints and availability.

Uploaded by

debasis_das
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views16 pages

Datastage Configuration File

This document discusses parallel job configuration files in 3 sentences: The configuration file specifies node and disk pooling as well as resource definitions for each processing node using a syntax of curly braces, node names, and resource types and locations. Node pools reserve nodes for certain tasks while disk pools allocate disks to nodes. The configuration file is used at runtime to allocate resources to stages based on constraints and availability.

Uploaded by

debasis_das
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 16

Configuration

Parallel Jobs
Objectives
 Having completed this module the
student will be able
 to create/edit a parallel job
configuration file
 to specify node pooling
 to specify disk pooling
Configuration File
 A parallel job configuration file consists of
the following elements.
 A pair of curly braces enclosing one or more
processing definitions.
 Each processing node definition consists of:
 the logical name for the processing node
 the name of the machine on which this processing
node will appear
 a pair of curly braces containing
 optionally a node pool definition

 one or more resource definitions


Configuration File Syntax

{ Node Definition
node "name" {
[ fastname "hostname" ]
[ pool "poolname" "poolname" ... ]
[ resource resource_type "location" { [ "name" ] }
[ pool "diskpoolname" ] ... ]
[ resource resource_type "value" ... ]
}
[ more node definitions ]
}
Where Is It?
 Location specified by environment
variable APT_CONFIG_FILE
 Default name of configuration file is
default.apt
 Default location is:
 Configurations subdirectory in
DataStage Engine directory
 ${APT_ORCHHOME}/etc
Node Name
 Logical name for processing node
 Must be unique within the
configuration file
 for example node0, node1, node2,
node3
 Node name used in error/warning
messages and diagnostic output
 Number of nodes does not
necessarily correspond to number of
CPUs
Fast Name
 Identifies host machine on which
processing node appears
 Name by which that machine is
referred to on the fastest network in
the system
Node Pools
 Name(s) of pool(s) to which this
node is allocated
 space-separated list of quoted pool
names
 Some node pool names are reserved
 DB2, INFORMIX, ORACLE
 sas, sort, syncsort
 These nodes will perform a task that
is restricted to this pool
 input link properties of stage
Node Pools
{
node "node0" {
fastname "srvr0"
pool "" "p1" "app1"
resource disk "/d0/data"
resource scratch disk "/d0/scratch"
}
node0 node1 node "node1" {
fastname "srvr0"
pool "" "p2" "app1"
resource disk "/d1/data"
resource scratch disk "/d1/scratch"
node2 node3 }
node "node2" {
fastname "srvr0"
pool "" "p3" "app1" "app2"
resource disk "/d2/data"
resource scratch disk "/d2/scratch"
}
node "node3" {
fastname "srvr0"
pool "" "p4" "app2" "sort"
resource disk "/d3/data"
resource scratch disk "/var/scratch"
resource scratch disk "/d3/scratch"
}
}
Disk Pools
{
node "node0" {
fastname "srvr0"
pool "" "p1" "app1"
resource disk "/d0/data" {"big"}
resource scratch disk "/d0/scratch"
}
node0 node1 node "node1" {
fastname "srvr0"
pool "" "p2" "app1"
resource disk "/d1/data" {"big"}
resource scratch disk "/d1/scratch"
node2 node3 }
node "node2" {
fastname "srvr0"
pool "" "p3" "app1" "app2"
resource disk "/d2/data"
resource scratch disk "/d2/scratch"
}
node "node3" {
fastname "srvr0"
pool "" "p4" "app2" "sort"
resource disk "/d3/data"
resource scratch disk "/var/scratch"
resource scratch disk "/d3/scratch"
}
}
Allocation of Resources
 Occurs at run time
 More than one phase
 Constraints from PX Engine arguments
 matched against any pools in configuration file
 Additional constraints (perhaps) from
explicit requirement for same degree of
parallelism as previous stage
 Stage allocates resources on the nodes
that are still available
Hints and Tips
 "Give all nodes all the disk"
 engine uses fairly large blocks for I/O,
so contention is unlikely to be an issue
 round robin allocation
 Having only one disk per node is
limiting
 disk access limits speed of node
Hints and Tips
 Avoid disks where input files are
 at least till input phase is finished
 Ensure I/O is hitting different
spindles
 Never use NFS for scratch disk
 avoid using NFS for storage disk
More Information
 Manager manual
 The Parallel Engine Configuration File
 On-line Help
 topic Configuration File
 links to Configuration File Editor
Review Questions, Lab Exercise
 Answer the review questions for
Module 5 in your Lab book
 Do the Lab Exercises for Module 5 in
your Lab book

You might also like