Data Stage Designer
Data Stage Designer
Designer Guide
Version 7.0
August 2003
Part No. 00D-003DS70
Published by Ascential Software Corporation.
©1997-2003 Ascential Software Corporation. All rights reserved. Ascential, DataStage, QualityStage,
AuditStage, ProfileStage, and MetaStage are trademarks of Ascential Software Corporation or its affiliates and
may be registered in the United States or other jurisdictions. Windows is a trademark of Microsoft
Corporation. Unix is a registered trademark of The Open Group. Adobe and Acrobat are registered trademarks
of Adobe Systems Incorporated. Other marks are the property of the owners of those marks.
This product may contain or utilize third party components subject to the user documentation previously
provided by Ascential Software Corporation or contained herein.
Preface
Organization of This Manual .........................................................................................x
Documentation Conventions ....................................................................................... xi
User Interface Conventions ................................................................................. xiii
DataStage Documentation .......................................................................................... xiv
Chapter 1. Introduction
About Data Warehousing ........................................................................................... 1-1
Operational Databases Versus Data Warehouses ............................................. 1-2
Constructing the Data Warehouse ...................................................................... 1-2
Defining the Data Warehouse ............................................................................. 1-3
Data Extraction ...................................................................................................... 1-3
Data Aggregation .................................................................................................. 1-3
Data Transformation ............................................................................................. 1-3
Advantages of Data Warehousing ...................................................................... 1-4
About DataStage .......................................................................................................... 1-4
Client Components ............................................................................................... 1-5
Server Components .............................................................................................. 1-6
DataStage Projects ........................................................................................................ 1-6
DataStage Jobs .............................................................................................................. 1-6
DataStage NLS .............................................................................................................. 1-8
Character Set Maps and Locales ......................................................................... 1-9
DataStage Terms and Concepts ................................................................................ 1-10
Table of Contents v
The Job Run Options Dialog Box .............................................................................4-86
Chapter 5. Containers
Local Containers ...........................................................................................................5-1
Creating a Local Container ..................................................................................5-2
Viewing or Modifying a Local Container ..........................................................5-2
Using Input and Output Stages ..........................................................................5-3
Deconstructing a Local Container ......................................................................5-4
Shared Containers ........................................................................................................5-5
Creating a Shared Container ...............................................................................5-6
Viewing or Modifying a Shared Container Definition ....................................5-7
Editing Shared Container Definition Properties ...............................................5-8
Using a Shared Container in a Job ....................................................................5-10
Converting Containers ..............................................................................................5-17
Index
Preface ix
Organization of This Manual
This manual contains the following:
Chapter 1 contains an overview of data warehousing and
describes how DataStage can aid the development and population
of a data warehouse. It introduces the DataStage client and server
components and covers DataStage concepts and terminology.
Chapter 2 guides you through an example DataStage job to get
you familiar with the project.
Chapter 3 Gives an overview of the DataStage Designer and its
user interface.
Chapter 4 describes how to develop a DataStage job using the
DataStage Designer.
Chapter 5 describes the use of local and shared containers in
DataStage.
Chapter 6 describes how to use the graphical job sequence
designer.
Chapter 7 describes the Intelligent Assistant which helps you
create simple jobs in DataStage.
Chapter 8 describes table definitions and their use within the
DataStage Designer.
Chapter 9 gives an overview of the powerful programming facili-
ties available within DataStage which make it easy to customize
your applications.
Chapter A covers how to navigate and edit the grids that appear
in many DataStage dialog boxes.
Chapter B provides troubleshooting advice.
Convention Usage
Bold In syntax, bold indicates commands, function
names, keywords, and options that must be
input exactly as shown. In text, bold indicates
keys to press, function names, and menu
selections.
UPPERCASE In syntax, uppercase indicates BASIC statements
and functions and SQL statements and
keywords.
Italic In syntax, italic indicates information that you
supply. In text, italic also indicates UNIX
commands and options, file names, and
pathnames.
Plain In text, plain indicates Windows NT commands
and options, file names, and path names.
Courier Courier indicates examples of source code and
system output.
Courier Bold In examples, courier bold indicates characters
that the user types or keys the user presses (for
example, <Return>).
[] Brackets enclose optional items. Do not type the
brackets unless indicated.
{} Braces enclose nonoptional items from which
you must select at least one. Do not type the
braces.
itemA | itemB A vertical bar separating items indicates that
you can choose only one item. Do not type the
vertical bar.
... Three periods indicate that more of the same
type of item can optionally follow.
➤ A right arrow between menu commands indi-
cates you should choose each command in
sequence. For example, “Choose File ➤ Exit”
means you should choose File from the menu
bar, then choose Exit from the File pull-down
menu.
Preface xi
Convention Usage
This line The continuation character is used in source
➥ continues code examples to indicate a line that is too long
to fit on the page, but must be entered as a single
line on screen.
The
General
Tab Browse
Button
Field
Check
Option Box
Button
Button
Preface xiii
DataStage Documentation
DataStage documentation includes the following:
DataStage Designer Guide. This guide describes the DataStage
Designer, and gives a general description of how to create, design,
and develop a DataStage application.
DataStage Manager Guide. This guide describes the DataStage
Manager and describes how to use and maintain the DataStage
Repository.
DataStage Server: Server Job Developer Guide: This guide
describes the tools that are used in building a server job, and it
supplies programmer’s reference information.
DataStage Enterprise Edition: Parallel Job Developer Guide: This
guide describes the tools that are used in building a parallel job,
and it supplies programmer’s reference information.
DataStage Enterprise MVS Edition: Mainframe Job Developer
Guide: This guide describes the tools that are used in building a
mainframe job, and it supplies programmer’s reference
information.
DataStage Director Guide: This guide describes the DataStage
Director and how to validate, schedule, run, and monitor
DataStage server jobs.
DataStage Administrator Guide: This guide describes DataStage
setup, routine housekeeping, and administration.
DataStage Install and Upgrade Guide. This guide contains
instructions for installing DataStage on Windows and UNIX plat-
forms, and for upgrading existing installations of DataStage.
DataStage NLS Guide. This Guide contains information about
using the NLS features that are available in DataStage when NLS
is installed.
These guides are also available online in PDF format. You can read
them with the Adobe Acrobat Reader supplied with DataStage. See
DataStage Install and Upgrade Guide for details about installing the
manuals and the Adobe Acrobat Reader.
You can use the Acrobat search facilities to search the whole DataStage
document set. To use this feature, first choose Edit ➤ Search ➤
Preface xv
xvi Ascential DataStage Designer Guide
1
Introduction
This chapter is an overview of data warehousing and DataStage.
The last few years have seen the continued growth of IT (information tech-
nology) and the requirement of organizations to make better use of the
data they have at their disposal. This involves analyzing data in active
databases and comparing it with data in archive systems.
Although offering the advantage of a competitive edge, the cost of consol-
idating data into a data mart or data warehouse was high. It also required
the use of data warehousing tools from a number of vendors and the skill
to create a data warehouse.
Developing a data warehouse or data mart involves design of the data
warehouse and development of operational processes to populate and
maintain it. In addition to the initial setup, you must be able to handle on-
going evolution to accommodate new data sources, processing, and goals.
DataStage simplifies the data warehousing process. It is an integrated
product that supports extraction of the source data, cleansing, decoding,
transformation, integration, aggregation, and loading of target databases.
Although primarily aimed at data warehousing environments, DataStage
can also be used in any data handling, data migration, or data reengi-
neering projects.
Introduction 1-1
database can be accessed by all users, ensuring that each group in an orga-
nization is accessing valuable, stable data.
A data warehouse is a “snapshot” of the operational databases combined
with data from archives. The data warehouse can be created or updated at
any time, with minimum disruption to operational systems. Any number
of analyses can be performed on the data, which would otherwise be
impractical on the operational sources.
Data Extraction
The data in operational or archive systems is the primary source of data for
the data warehouse. Operational databases can be indexed files,
networked databases, or relational database systems. Data extraction is
the process used to obtain data from operational sources, archives, and
external data sources.
Data Aggregation
An operational data source usually contains records of individual transac-
tions such as product sales. If the user of a data warehouse only needs a
summed total, you can reduce records to a more manageable number by
aggregating the data.
The summed (aggregated) total is stored in the data warehouse. Because
the number of records stored in the data warehouse is greatly reduced, it
is easier for the end user to browse and analyze the data.
Data Transformation
Because the data in a data warehouse comes from many sources, the data
may be in different formats or be inconsistent. Transformation is the
process that converts data to a required definition and value.
Introduction 1-3
Data is transformed using routines based on a transformation rule, for
example, product codes can be mapped to a common format using a trans-
formation rule that applies only to product codes.
After data has been transformed it can be loaded into the data warehouse
in a recognized and required format.
About DataStage
DataStage has the following features to aid the design and processing
required to build a data warehouse:
• Uses graphical design tools. With simple point-and-click tech-
niques you can draw a scheme to represent your processing
requirements.
• Extracts data from any number or type of database.
• Handles all the meta data definitions required to define your data
warehouse. You can view and modify the table definitions at any
point during the design of your application.
• Aggregates data. You can modify SQL SELECT statements used to
extract data.
Client Components
DataStage has four client components which are installed on any PC
running Windows 2000 or Windows NT 4.0 with Service Pack 4 or later:
• DataStage Designer. A design interface used to create DataStage
applications (known as jobs). Each job specifies the data sources,
the transforms required, and the destination of the data. Jobs are
compiled to create executables that are scheduled by the Director
and run by the Server (mainframe jobs are transferred and run on
the mainframe).
• DataStage Director. A user interface used to validate, schedule,
run, and monitor DataStage server jobs and parallel jobs.
• DataStage Manager. A user interface used to view and edit the
contents of the Repository.
• DataStage Administrator. A user interface used to perform admin-
istration tasks such as setting up DataStage users, creating and
moving projects, and setting up purging criteria.
Introduction 1-5
Server Components
There are three server components:
• Repository. A central store that contains all the information
required to build a data mart or data warehouse.
• DataStage Server. Runs executable jobs that extract, transform,
and load data into a data warehouse.
• DataStage Package Installer. A user interface used to install pack-
aged DataStage jobs and plug-ins.
DataStage Projects
You always enter DataStage through a DataStage project. When you start
a DataStage client you are prompted to attach to a project. Each project
contains:
• DataStage jobs.
• Built-in components. These are predefined components used in a
job.
• User-defined components. These are customized components
created using the DataStage Manager. Each user-defined compo-
nent performs a specific task in a job.
A complete project may contain several jobs and user-defined
components.
There is a special class of project called a protected project. Normally
nothing can be added, deleted, or changed in a protected project. Users can
view objects in the project, and perform tasks that affect the way a job runs
rather than the job’s design. Users with Production Manager status can
import existing DataStage components into a protected project and manip-
ulate projects in other ways.
DataStage Jobs
There are three basic types of DataStage job:
• Server jobs. These are compiled and run on the DataStage server.
A server job will connect to databases on other machines as neces-
sary, extract data, process it, then write the data to the target data
warehouse.
Introduction 1-7
Data Transformer Data
Source Stage Warehouse
You must specify the data you want at each stage, and how it is handled.
For example, do you want all the columns in the source data, or only a
select few? Should the data be aggregated or converted before being
passed on to the next stage?
You can use DataStage with MetaBrokers in order to exchange meta data
with other data warehousing tools. You might, for example, import table
definitions from a data modelling tool.
DataStage NLS
DataStage has built-in National Language Support (NLS). With NLS
installed, DataStage can do the following:
• Process data in a wide range of languages
• Accept data in any character set into most DataStage fields
• Use local formats for dates, times, and money (server jobs)
• Sort data according to local rules
• Convert data between different encodings of the same language
(for example, for Japanese it can convert JIS to EUC)
DataStage NLS is optionally installed as part of the DataStage server. If
NLS is installed, various extra features (such as dialog box pages and
drop-down lists) appear in the product. If NLS is not installed, these
features do not appear.
NLS is implemented in different ways for server jobs and parallel jobs, and
each has its own set of maps:
• For server jobs, NLS is implemented by the DataStage server
engine.
• For parallel jobs, NLS is implemented using the ICU library.
Introduction 1-9
DataStage Terms and Concepts
The following terms are used in DataStage:
Term Description
administrator The person who is responsible for the mainte-
nance and configuration of DataStage, and for
DataStage users.
after-job subroutine A routine that is executed after a job runs.
after-stage subroutine A routine that is executed after a stage
processes data.
Aggregator stage A stage type that computes totals or other
functions of sets of data.
Annotation A note attached to a DataStage job in the
Diagram window.
BCPLoad stage A plug-in stage supplied with DataStage that
bulk loads data into a Microsoft SQL Server or
Sybase table. (Server jobs only.)
before-job subroutine A routine that is executed before a job is run.
before-stage A routine that is executed before a stage
subroutine processes any data.
built-in data elements There are two types of built-in data elements:
those that represent the base types used by
DataStage during processing and those that
describe different date/time formats.
built-in transforms The transforms supplied with DataStage. See
DataStage Server Job Developer’s Guide for a
complete list.
Change Apply stage A parallel job stage that applies a set of
captured changes to a data set.
Change Capture stage A parallel job stage that compares two data
sets and records the differences between them.
Cluster Type of system providing parallel processing.
In cluster systems, there are multiple proces-
sors, and each has its own hardware resources
such as disk and memory.
column definition Defines the columns contained in a data table.
Includes the column name and the type of data
contained in the column.
Introduction 1-11
Term Description
Data Set stage A parallel job stage. Stores a set of data.
DB2stage A parallel stage that allows you to read and
write a DB2 database.
DB2 Load Ready Flat A mainframe target stage. It writes data to a
File stage flat file in Load Ready format and defines the
meta data required to generate the JCL and
control statements for invoking the DB2 Bulk
Loader.
Decode stage A parallel job stage that uses a UNIX
command to decode a previously encoded
data set.
Delimited Flat File A mainframe target stage that writes data to a
stage delimited flat file.
developer The person designing and developing
DataStage jobs.
Difference stage A parallel job stage that compares two data
sets and works out the difference between
them.
Encode stage A parallel job stage that encodes a data set
using a UNIX command.
Expand stage A parallel job stage that expands a previously
compressed data set.
Expression Editor An interactive editor that helps you to enter
correct expressions into a Transformer stage in
a DataStage job design.
External Filter stage A parallel job stage that uses an external
program to filter a data set.
External Routine stage A mainframe processing stage that calls an
external routine and passes row elements to it.
External Source stage A mainframe source stage that allows a main-
frame job to read data from an external source.
Introduction 1-13
Term Description
job A collection of linked stages, data elements,
and transforms that define how to extract,
cleanse, transform, integrate, and load data
into a target database. Jobs can either be server
jobs or mainframe jobs.
job control routine A routine that is used to create a controlling
job, which invokes and runs other jobs.
job sequence A controlling job which invokes and runs other
jobs, built using the graphical job sequencer.
Join stage A mainframe processing stage or parallel job
active stage that joins two input sources.
Link collector stage A server job stage that collects previously
partitioned data together.
Link partitioner stage A server job stage that allows you to partition
data so that it can be processed in parallel on an
SMP system.
local container A container which is local to the job in which it
was created.
Lookup stage A mainframe processing stage and Parallel
active stage that performs table lookups.
Lookup File stage A parallel job stage that provides storage for a
lookup table.
mainframe job A job that is transferred to a mainframe, then
compiled and run there.
Make Subrecord stage A parallel job stage that combines a number of
vectors to form a subrecord.
Make Vector stage A parallel job stage that combines a number of
fields to form a vector.
Merge stage A parallel job stage that combines data sets.
meta data Data about data, for example, a table definition
describing columns in which data is
structured.
MetaBroker A tool that allows you to exchange meta data
between DataStage and other data ware-
housing tools.
Introduction 1-15
Term Description
Peek Stage A parallel job stage that prints column values to
the screen as records are copied from its input
data set to one or more output data sets.
plug-in A definition for a plug-in stage.
plug-in stage A stage that performs specific processing that
is not supported by the standard server job or
parallel job stages.
Promote Subrecord A parallel job stage that promotes the
stage members of a subrecord to a top level field.
Relational stage A mainframe source/target stage that reads
from or writes to an MVS/DB2 database.
Remove duplicates A parallel job stage that removes duplicate
stage entries from a data set.
Repository A DataStage area where projects and jobs are
stored as well as definitions for all standard
and user-defined data elements, transforms,
and stages.
SAS stage A parallel job stage that allows you to run SAS
applications from within the DataStage job.
Parallel SAS Data Set A parallel job stage that provides storage for
stage SAS data sets.
Sample stage A parallel job stage that samples a data set.
Sequential File stage A stage that extracts data from, or writes data
to, a text file. (Server job and parallel job only)
server job A job that is compiled and run on the
DataStage server.
shared container A container which exists as a separate item in
the Repository and can be used by any server
job in the project. DataStage supports both
server and parallel shared containers.
SMP Type of system providing parallel processing.
In SMP (symmetric multiprocessing) systems,
there are multiple processors, but these share
other hardware resources such as disk and
memory.
Sort stage A mainframe processing stage or parallel job
active stage that sorts input columns.
Introduction 1-17
1-18 Ascential DataStage Designer Guide
2
Your First
DataStage Project
This chapter describes the steps you need to follow to create your first data
warehouse, using the sample data provided. The example builds a server
job and uses a UniVerse table called EXAMPLE1, which is automatically
copied into your DataStage project during server installation.
EXAMPLE1 represents an SQL table from a wholesaler who deals in car
parts. It contains details of the wheels they have in stock. There are approx-
imately 255 rows of data and four columns:
• CODE. The product code for each type of wheel.
• PRODUCT. A text description of each type of wheel.
• DATE. The date new wheels arrived in stock (given in terms of
year, month, and day).
• QTY. The number of wheels in stock.
The aim of this example is to develop and run a DataStage job that:
• Extracts the data from the file.
• Converts (transforms) the data in the DATE column from a
complete date (YYYY-MM-DD) stored in internal data format, to a
year and month (YYYY-MM) stored as a string.
• Loads data from the DATE, CODE, and QTY columns into a data
warehouse. The data warehouse is a sequential file that is created
when you run the job.
This dialog box appears when you start the DataStage Designer, Manager,
or Director client components from the DataStage program folder. In all
cases, you must attach to a project by entering your logon details.
To connect to a project:
1. Enter the name of your host in the Host system field. This is the name
of the system where the DataStage Server components are installed.
2. Enter your user name in the User name field. This is your user name
on the server system.
3. Enter your password in the Password field.
Note: If you are connecting to the server via LAN Manager, you can
select the Omit check box. The User name and Password fields
gray out and you log on to the server using your Windows NT
Domain account details.
4. Choose the project to connect to from the Project drop-down list box.
This list box displays all the projects installed on your DataStage
server. Choose your project from the list box. At this point, you may
only have one project installed on your system and this is displayed
by default.
Creating a Job
When a DataStage project is installed, it is empty and you must create the
jobs you need. Each DataStage job can load one or more data tables in the
final data warehouse. The number of jobs you have in a project depends
on your data sources and how often you want to extract data or load the
data warehouse.
The Format page contains information describing how the data would be
formatted when written to a sequential file. You do not need to edit this
page.
The Relationships page gives foreign key information about the table. We
are not using foreign keys in this exercise, so you do not need to edit this
page.
The NLS page is present if you have NLS installed. It shows the current
character set map for the table definitions. The map defines the character
set that the data is in. You do not need to edit this page.
Advanced Procedures
To manually enter table definitions, see Chapter 7, “Intelligent
Assistants.”.
Adding Stages
Stages are added using the tool palette. This palette contains icons that
represent the components you can add to a job. The palette has different
groups to organize the tools available. Click the group title to open the
group.A typical tool palette is shown below:
Linking Stages
You need to add two links:
• One between the UniVerse and Transformer stages
• One between the Transformer and Sequential File stages
Links are always made in the direction the data will flow, that is, usually
left to right. When you add links, they are assigned default names. You can
use the default names in the example.
To add a link:
1. Right-click the first stage, hold the mouse button down and drag the
link to the transformer stage. Release the mouse button.
2. Right-click the Transformer stage and drag the link to the Sequential
File stage. The following screen shows how the Diagram window
looks when you have added the stages and links:
Advanced Procedures
For more advanced procedures, see the following topics in Chapter 4:
• “Moving Stages” on page 4-28
• “Renaming Stages” on page 4-28
• “Deleting Stages” on page 4-28
9. You can use the Data Browser to view the actual data that is to be
output from the UniVerse stage. Click the View Data… button to
open the Data Browser window.
Note: In server jobs column definitions are attached to a link. You can
view or edit them at either end of the link. If you change them in a
stage at one end of the link, the changes are automatically seen in
the stage at the other end of the link. This is how column defini-
tions are propagated through all the stages in a DataStage server
job, so the column definitions you loaded into the UniVerse stage
are viewed when you edit the Transformer stage.
Note: If the data in the other columns required transforming, you could
assign DataStage data elements to these columns too.
Input columns are shown on the left, output columns on the right. The
upper panes show the columns together with derivation details, the lower
panes show the column meta data. In this case, input columns have
already been defined for input link DSLink3. No output columns have
been defined for output link DSLink4, so the right panes are blank.
The next steps are to define the columns that will be output by the Trans-
former stage, and to specify the transform that will enable the stage to
convert the type and format of dates before they are output.
1. Working in the upper-left pane of the Transformer Editor, select the
input columns that you want to derive output columns from. Click on
the CODE, DATE, and QTY columns while holding down the Ctrl
key.
2. Click the left mouse button again and, keeping it held down, drag the
selected columns to the output link in the upper-right pane. Drop the
columns over the Column Name field by releasing the mouse button.
The next step is to edit the meta data for the input and output links.
You will be transforming dates from YYYY-MM-DD, presented in
internal date format, to strings containing the date in the form YYYY-
MM. You need to select a data element for the input DATE column, to
specify that the date is input to the transform in internal format, and a
new SQL type and data element for the output DATE column, to
specify that it will be carrying a string. You do this in the lower-left
and lower-right panes of the Transformer Editor.
3. In the Data element field for the DSLink3.DATE column, select Date
from the drop-down list.
4. In the SQL type field for the DSLink4 DATE column, select Char
from the drop-down list.
5. In the Length field or the DSLink4 DATE column, enter 7.
6. In the Data element field for the DSLink4 DATE column, select
MONTH.TAG from the drop-down list.
Next you will specify the transform to apply to the input DATE
column to produce the output DATE column. You do this in the upper-
right pane of the Transformer Editor.
7. Double-click the Derivation field for the DSLink4 DATE column. The
Expression Editor box appears. At the moment, the box contains the
text DSLink3.DATE, which indicates that the output DATE column
12. Select DSLink3.DATE. This then becomes the argument for the
transform.
13. Click OK to save the changes and exit the Transformer Editor. Once
more the small icon appears on the output link from the transformer
stage to indicate that the link now has column definitions associated
with it.
The job is compiled. The result of the compilation appears in the display
area. If the result of the compilation is Job successfully compiled
with no errors you can go on to schedule or run the job. The execut-
able version of the job is stored in your project along with your job design.
If an error is displayed, click Show Error. The stage where the problem
occurs is highlighted in the job design. Check that all the input and output
column definitions have been specified correctly, and that you have
entered directory paths and file or table names where appropriate.
For more information about the error, click More. Click Close to close the
Compile Job window.
Highlight your job in the Job name column. To run the job, choose Job ➤
Run Now or click the Run button on the toolbar. The Job Run Options
dialog box appears and allows you to specify any parameter values and to
specify any job run limits. In this case, just click Run. The status changes
to Running. When the job is complete, the status changes to Finished.
Choose File ➤ Exit to close the DataStage Director window.
Refer to DataStage Director Guide for more information about scheduling
and running jobs.
Advanced Procedures
It is possible to run a job from within another job. For more information,
see “Job Control Routines” on page 4-67 and Chapter 6, “Job Sequences.”
This chapter describes the main features of the DataStage Designer. It tells
you how to start the Designer and takes a quick tour of the user interface.
You can also start the Designer from the shortcut icon on the desktop, or
from the DataStage Suite applications bar if you have it installed.
Note: If you are connecting to the server via LAN Manager, you can
select the Omit check box. The User name and Password fields
gray out and you log on to the server using your Windows NT
Domain account details.
4. Choose the project to connect to from the Project drop-down list box.
This list box displays all the projects installed on your DataStage
server.
5. Click OK. The DataStage Designer window appears, by default with
the New dialog box open, allowing you to choose a type of job to
create. You can set options to specify that the Designer opens with an
empty server or mainframe job, or nothing at all, see “Specifying
Designer Options” on page 3-24.
Note: You can also start the DataStage Designer directly from the
DataStage Manager or Director by choosing Tools ➤ Run
Designer.
Server Job
Parallel Job Mainframe Job
In the Designer Repository window you can perform any of the actions
that you can perform from the Repository tree in the Manager. When you
select a category in the tree, a shortcut menu allows you to create a new
item under that category or a new subcategory, or, for Table Definition
categories, import a table definition from a data source. When you select
an item in the tree, a shortcut menu allows you to perform various tasks
depending on the type of item selected:
• Data elements, machine profiles, routines,
transforms, IMS Databases, IMS Viewsets
You can create a copy of these items, rename
them, delete them, and display the properties
of the item. Provided the item is not read-only,
you can edit the properties.
• Jobs, shared containers
You can create a copy of these items, add them
to the palette, rename them, delete them, and
edit them in the diagram window.
• Stage types
You can add stage types to the diagram
window palette and display their properties. If
the stage belongs in a shortcut container,
DataStage will add it there. Otherwise it will
add it to the correct group. Provided the item
is not read-only, you can edit the properties.
3-10 Ascential DataStage Designer Guide
• Table definitions
You can create a copy of table definitions,
rename them, delete them and display the
properties of the item. Provided the item is not
read-only, you can edit the properties. You can
also import table definitions from data sources.
It is a good idea to choose View ➤ Refresh from the main menu bar before
acting on any Repository items to ensure that you have a completely up-
to-date view.
You can drag certain types of item from the Repository window onto a
diagram window or the diagram window, or onto specific components
within a job:
• Jobs – the job opens in a new diagram window or, if dragged to a
job sequence window, is added to the job sequence.
• Shared containers – if you drag one onto an open diagram window,
the shared container appears in the job. If you drag a shared
container onto the background a new diagram window opens
showing the contents of the shared container.
• Stage types – drag a stage type onto an open diagram window to
add it to the job or container. You can also drag it to the tool palette
to add it as a tool.
• Table definitions – drag a table definition onto a link to load the
column definitions for that link. The Select Columns dialog box
allows you to select a subset of columns from the table definition to
load if required.
You can also drag items of these types to the palette for easy access.
The diagram window is the canvas on which you design and display your
job. This window has the following components:
• Title bar. Displays the name of the job or shared container.
• Page tabs. If you use local containers in your job, the contents of
these containers are displayed in separate windows within the
job’s diagram window. Switch between views using the tabs at the
bottom of the diagram window.
Job
Open Properties Paste
job Save Cut Construct
Save Copy shared Link
All container Run marking Zoom Print
out
Zoom Help
Compile Annotations in
Undo Snap to
New Job grid
(choose type Redo Construct Grid
from drop-down container lines
list)
The toolbar appears under the menu bar by default, but you can drag and
drop it anywhere on the screen. It will dock and un-dock as required.
Alternatively, you can hide the toolbar by choosing View ➤ Toolbar.
Tool Palette
The tool palette contains shortcuts to the components you can add to your
job design. By default the tool palette is docked to the DataStage Designer,
but you can drag and drop it anywhere on the screen. It will dock and un-
dock as required. Alternatively, you can hide the tool palette by choosing
View ➤ Palette.
There is a separate tool palette for server jobs, parallel jobs, mainframe
jobs, and job sequences (parallel shared containers use the parallel job
palette, server shared containers use the server job palette). Which one is
displayed depends on what is currently active in the Designer.
The palette has different groups to organize the tools available. Click the
group title to open the group. The Favorites group allows you to drag
frequently used tools there so you can access them quickly. You can also
drag other items there from the Repository window, such as jobs and
shared containers.
To add a stage to the Diagram window, choose its shortcut from the tool
palette and click the Diagram window. The stage is added at the insertion
point in the diagram window. If you click and drag on the diagram
window to draw a rectangle as an insertion point, the stage will be sized
to fit that rectangle. You can also drag stages from the tool palette or from
the Repository window and drop them on the Diagram window.
Some of the shortcuts on the tool palette give access to several stages, these
are called shortcut containers and you can recognize them because down
arrows appear when you hover the mouse pointer over them. Click on the
arrow to see the list of items this icon gives access to:
Status Bar
The status bar appears at the bottom of the DataStage Designer window. It
displays one-line help for the window components and information on the
current state of job operations, for example, compilation of server jobs. You
can hide the status bar by choosing View ➤ Status Bar.
Debugger Toolbar
Server DataStage has a built-in debugger that can be used with server jobs or
jobs server shared containers. The debugger toolbar contains buttons repre-
senting debugger functions. You can hide the debugger toolbar by
Debug
Window
Shortcut Menus
There are a number of shortcut menus available which you display by
clicking the right mouse button. The menu displayed depends on where
you clicked.
• Background. Appears when you right-
click on the background area in the left of
the Designer (i.e. the space around
Diagram windows), or in any of the
toolbar background areas. Gives access to
the same items as the View menu (see
page 3-6).
The Toggle Annotations button in the Tool bar allows you to specify
whether the Annotations are shown or not.
To insert an annotation, assure the annotation option is on then drag the
annotation icon from the tool palette onto the Diagram window. An anno-
tation box appears, you can resize it as required using the controls in the
boundary box. Alternatively, click an Annotation button in the tool palette,
then draw a bounding box of the required size of annotation on the
Diagram window. Annotations will always appear behind normal stages
and links.
• Annotation text. Displays the text in the annotation. You can edit
this here if required.
• Vertical Justification. Choose whether the text aligns to the top,
middle, or bottom of the annotation box.
• Horizontal Justification. Choose whether the text aligns to the left,
center, or right of the annotation box.
Annotation Properties
The Annotation Properties dialog box is as follows:
Appearance Options
The Appearance options branch lets you control the appearance of the
DataStage Designer. It gives access to four pages: General, Repository
Tree, Palette, and Graphical Performance Monitor.
General
Repository Tree
This section allows you to choose what type of items are displayed in the
Repository tree in the Designer.
Palette
This section allows you to control how your tool palette is displayed.
Default Options
The Default options branch gives access to two pages: General and
Mainframe.
Mainframe
This page allows you to specify options that apply to mainframe jobs only.
• Base location for generated code. This specifies the base location
on the DataStage client where the generated code and JCL files for
a mainframe job are held. Each mainframe job holds these files in a
subdirectory of the base location reflecting the server host name,
project name, and job. For example, where the base location is
c:\Ascential\DataStage\Gencode, a complete pathname might be
c:\Ascential\DataStage\Gencode\R101\dstage\mjob1.
SMTP Defaults
This page allows you to specify default details for Email Notification activ-
ities in job sequences.
• SMTP Mail server name. The name of the server or its IP address.
• Senders email address. Given in the form
[email protected].
• Recipients email address. Given in the form
[email protected].
Prompting Options
The Prompting branch gives access to pages which determine the level of
prompting displayed when you perform various operations in the
Designer. There are three pages: General, Mainframe, and Server.
Confirmation
This page has options for specifying whether you should be warned when
performing various deletion and construction actions, allowing you to
confirm that you really want to carry out the action. Tick the boxes to have
the warnings, clear them otherwise.
Creating a Job
To create a job, choose File ➤ New from the DataStage Designer menu.
The New dialog box appears, choose one of the icons, depending on the
type of job or shared container you want to create, and click OK.
The Diagram window appears, in the right pane of the Designer, along
with the Toolbox for the chosen type of job. You can now save the job and
give it a name.
You can also find the job in the tree in the Repository window and double-
click it, or select it and choose Edit from its shortcut menu, or drag it onto
the background to open it.
The updated DataStage Designer window displays the chosen job in a
Diagram window.
Database
• ODBC. Extracts data from or loads data into databases
that support the industry standard Open Database
Connectivity API. This stage is also used as an interme-
diate stage for aggregating data. This is a passive stage.
• UniVerse. Extracts data from or loads data into UniVerse
databases. This stage is also used as an intermediate
stage for aggregating data. This is a passive stage.
• UniData. Extracts data from or loads data into UniData
databases. This is a passive stage.
File
• Hashed File. Extracts data from or loads data into data-
bases that contain hashed files. Also acts as an
intermediate stage for quick lookups. This is a passive
stage.
• Sequential File. Extracts data from, or loads data into,
operating system text files. This is a passive stage.
Real Time
• RTI Source. Entry point for a Job exposed as an RTI
service. The Table Definition specified on the output link
dictates the input arguments of the generated RTI
service.
• RTI Target. Exit point for a Job exposed as an RTI
service. The Table Definition on the input link dictates
the output arguments of the generated RTI service.
Database
• IMS. This is a source stage. It extracts data from an IMS
database or viewset.
File
• Complex Flat File. This is a source stage. It reads data
from a complex flat file.
Processing
• Transformer. This performs data transformation on
extracted data.
• Join. This is used to join data from two input tables and
produce one output table.
• Link Collector.
Database Stages
• DB2/UDB Enterprise. Allows you to read and write a
DB2 database.
File Stages
• Data set. Stores a set of data.
Processing Stages
• Transformer. Receives incoming data, transforms it in a
variety of ways, and outputs it to another stage in the
job.
• Aggregator. Classifies incoming data into groups,
computes totals and other summary functions for each
group, and passes them to another stage in the job.
Real Time
• RTI Source. Entry point for a Job exposed as an RTI
service. The Table Definition specified on the output link
dictates the input arguments of the generated RTI
service.
• RTI Target. Exit point for a Job exposed as an RTI
service. The Table Definition on the input link dictates
the output arguments of the generated RTI service.
Other Stages
• Parallel Shared Container. Represents a group of stages
and links. The group is replaced by a single Parallel
Shared Container stage in the Diagram window. Parallel
Shared Container stages are handled differently to other
stage types, they do not appear on the palette. You insert
specific shared containers in your job by dragging them
from the Repository window.
• Local Container. Represents a group of stages and links.
The group is replaced by a single Container stage in the
Diagram window (these are similar to shared containers
but are entirely private to the job they are created in and
cannot be reused in other jobs).
• Container Input and Output. Represent the interface
that links a container stage to the rest of the job design.
Link marking is enabled by default. To disable it, click on the link mark
icon in the Designer toolbar, or deselect it in the Diagram menu, or the
Diagram shortcut menu.
Unattached Links
You can add links that are only attached to a stage at one end, although
they will need to be attached to a second stage before the job can success-
fully compile and run. Unattached links are shown in a special color (red
by default – but you can change this using the Options dialog, see
page 3-24).
By default, when you delete a stage, any attached links and their meta data
are left behind, with the link shown in red. You can choose Delete
including links from the Edit or shortcut menus to delete a selected stage
along with its connected links.
Link Marking
Parallel For parallel jobs, meta data is associated with a link, not a stage. If you
jobs have link marking enabled, a small icon attaches to the link to indicate if
meta data is currently associated with it.
Link marking also shows you how data is partitioned or collected between
stages, and whether data is sorted. The following diagram shows the
different types of link marking. For an explanation, see DataStage Parallel
Partition marker
Collection marker
Link marking is enabled by default. To disable it, click on the link mark
icon in the Designer toolbar, or deselect it in the Diagram menu, or the
Diagram shortcut menu.
Unattached Links
You can add links that are only attached to a stage at one end, although
they will need to be attached to a second stage before the job can success-
fully compile and run. Unattached links are shown in a special color (red
by default – but you can change this using the Options dialog, see
page 3-24).
Link marking is enabled by default. To disable it, click on the link mark
icon in the Designer toolbar, or deselect it in the Diagram menu, or the
Diagram shortcut menu.
Unattached Links
Unlike server and parallel jobs, you cannot have unattached links in a
mainframe job; both ends of a link must be attached to a stage. If you
delete a stage, the attached links are automatically deleted too.
Link Ordering
The Transformer stage in server jobs and various active stages in parallel
jobs allow you to specify the execution order of links coming into and/or
going out from the stage. When looking at a job design in DataStage, there
are two ways to look at the link execution order:
• Place the mouse pointer over a link that is an input to or an output
from a Transformer stage. A ToolTip appears displaying the
message:
Input execution order = n
for input links, and:
Output execution order = n
Adding Stages
There is no limit to the number of stages you can add to a job. We recom-
mend you position the stages as follows in the Diagram window:
• Server jobs
Renaming Stages
There are a number of ways to rename a stage:
• You can change its name in its stage editor.
• You can select the stage in the Diagram window and then edit the
name in the Property Browser.
• You can select the stage in the Diagram window, press Ctrl-R,
choose Rename from its shortcut menu, or choose Edit ➤ Rename
from the main menu and type a new name in the text box that
appears beneath the stage.
• Select the stage in the diagram window and start typing.
Deleting Stages
Stages can be deleted from the Diagram window. Choose one or more
stages and do one of the following:
• Press the Delete key.
• Choose Edit ➤ Delete.
• Choose Delete from the shortcut menu.
A message box appears. Click Yes to delete the stage or stages and remove
them from the Diagram window. (This confirmation prompting can be
turned off if required.)
When you delete stages in mainframe jobs, attached links are also deleted.
When you delete stages in server or parallel jobs, the links are left behind,
unless you choose Delete including links from the edit or shortcut menu.
Linking Stages
You can link stages in three ways:
Moving Links
Once positioned, a link can be moved to a new location in the Diagram
window. You can choose a new source or destination for the link, but not
both.
To move a link:
1. Click the link to move in the Diagram window. The link is
highlighted.
2. Click in the box at the end you want to move and drag the end to its
new location.
In server and parallel jobs you can move one end of a link without reat-
taching it to another stage. In mainframe jobs both ends must be attached
to a stage.
Deleting Links
Links can be deleted from the Diagram window. Choose the link and do
one of the following:
• Press the Delete key.
• Choose Edit ➤ Delete.
• Choose Delete from the shortcut menu.
A message box appears. Click Yes to delete the link. The link is removed
from the Diagram window.
Note: For server jobs, meta data is associated with a link, not a stage. If
you delete a link, the associated meta data is deleted too. If you
Renaming Links
There are a number of ways to rename a link:
• You can select it and start typing in a name in the text box that
appears.
• You can select the link in the Diagram window and then edit the
name in the Property Browser.
• You can select the link in the Diagram window, press Ctrl-R,
choose Rename from its shortcut menu, or choose Edit ➤ Rename
from the main menu and type a new name in the text box that
appears beneath the link.
• Select the link in the diagram window and start typing.
Editing Stages
When you have added the stages and links to the Diagram window, you
must edit the stages to specify the data you want to use and any aggrega-
tions or conversions required.
Data arrives into a stage on an input link and is output from a stage on an
output link. The properties of the stage and the data on each input and
output link are specified using a stage editor.
To edit a stage, do one of the following:
• Double-click the stage in the Diagram window.
• Select the stage and choose Properties… from the shortcut menu.
• Select the stage and choose Edit ➤ Properties.
Note: You can use Find… to enter the name of the table definition
you want. The table definition is selected in the tree when you
click OK.
5. If you cannot find the table definition, you can click Import ➤ Data
source type to import a table definition from a data source (see
“Importing a Table Definition” on page 8-11 for details).
6. Click OK. One of two things happens, depending on the type of stage
you are editing:
Use the arrow keys to move columns back and forth between the
Available columns list and the Selected columns list. The single
arrow buttons move highlighted columns, the double arrow
buttons move all items. By default all columns are selected for
loading. Click Find… to open a dialog box which lets you search
for a particular column. The shortcut menu also gives access to
Find… and Find Next. Click OK when you are happy with your
selection. This closes the Select Columns dialog box and loads the
selected columns into the stage.
For mainframe stages and certain parallel stages where the column
definitions derive from a CFD file, the Select Columns dialog box
may also contain a Create Filler check box. This happens when the
table definition the columns are being loaded from represents a
fixed-width table. Select this to cause sequences of unselected
columns to be collapsed into filler items. Filler columns are sized
appropriately, their datatype set to character, and name set to
FILLER_XX_YY where XX is the start offset and YY the end offset.
8. Click Yes or Yes to All to confirm the load. Changes are saved when
you save your job design.
Note: Be careful when cutting from one context and pasting into another.
For example, if you cut columns from an input link and paste them
onto an output link they could carry information that is wrong for
an output link and needs editing.
To cut a stage, select it in the canvas and select Edit ➤ Cut (or press
CTRL-X). To copy a stage, select it in the canvas and select Edit ➤ Copy
(or press CTRL-C). To paste the stage, select the destination canvas and
select Edit ➤ Paste (or press CTRL-V). Any links attached to a stage will
be cut and pasted too, complete with meta data. If there are name conflicts
with stages or links in the job into which you are pasting, DataStage will
automatically update the names.
If you want to cut or copy meta data along with the stages, you should
select source and destination stages, which will automatically select links
and associated meta data. These can then be cut or copied and pasted as a
group.
The Data Browser uses the meta data defined for that link. If there is insuf-
ficient data associated with a link to allow browsing, the View Data…
button and shortcut menu command used to invoke the Data Browser are
disabled. If the Data Browser requires you to input some parameters
before it can determine what data to display, the Job Run Options dialog
box appears and collects the parameters (see “The Job Run Options Dialog
Box” on page 4-86).
The Display… button invokes the Column Display dialog box. This
allows you to simplify the data displayed by the Data Browser by choosing
to hide some of the columns. For server jobs, it also allows you to
normalize multivalued data to provide a 1NF view in the Data Browser.
This dialog box lists all the columns in the display, all of which are initially
selected. To hide a column, clear it.
For server jobs, the Normalize on drop-down list box allows you to select
an association or an unassociated multivalued column on which to
normalize the data. The default is Un-normalized, and choosing Un-
normalized will display the data in NF2 form with each row shown on a
single line. Alternatively you can select Un-Normalized (formatted),
which displays multivalued rows split over several lines.
In the example, the Data Browser would display all columns except
STARTDATE. The view would be normalized on the association PRICES.
If you alter anything on the job design you will lose the statistical informa-
tion until the next time you compile the job.
The job is compiled as soon as this dialog box appears. You must check the
display area for any compilation messages or errors that are generated.
For parallel jobs there is also a force compile option. The compilation of
parallel jobs is by default optimized such that transformer stages only get
recompiled if they have changed since the last compilation. The force
compile option overrides this and causes all transformer stages in the job
to be compiled. To select this option:
• Choose File ➤ Force Compile
Successful Compilation
If the Compile Job dialog box displays the message Job successfully
compiled with no errors. You can:
• Validate the job
• Run or schedule the job
• Release the job
• Package the job for deployment on other DataStage systems
Jobs are validated and run using the DataStage Director. See DataStage
Director Guide for additional information. More information about
compiling, releasing and debugging DataStage server jobs is in DataStage
Server Job Developer’s Guide. More information about compiling and
releasing parallel jobs is in the DataStage Parallel Job Developer’s Guide.
Will connect to the machine r101, with a username and password of fellp
and plaintextpassword, attach to the project dstageprj and compile the job
mybigjob.
Compiler Wizard
DataStage also has a compiler wizard that will guide you through the
process of compiling jobs. You can start the wizard from the Tools menu
of the Designer, Manager, or Director clients. Select Tools ➤ Run
Multiple Job Compile.
The wizard proceeds as follows:
1. A screen prompts you to specify the criteria for selecting jobs to
compile. Choose one or more of:
• Server
• Parallel
• Sequence
• Mainframe
You can also specify that only currently uncompiled jobs will be
compiled, and that you want to manually select the jobs to compile.
2. Click Next>.
Code Generation
Code generation first validates the job design. If the validation fails, code
generation stops. Status messages about validation are in the Validation
and code generation status window. They give the names and locations of
the generated files, and indicate the database name and user name used by
each relational stage.
Three files are produced during code generation:
• COBOL program file which contains the actual COBOL code that
has been generated.
• Compile JCL file which contains the JCL that controls the compila-
tion of the COBOL code on the target mainframe machine.
• Run JCL file which contains the JCL that controls the running of
the job on the mainframe once it has been compiled.
Job Upload
Once you have successfully generated the mainframe code, you can
upload the files to the target mainframe, where the job is compiled and
run.
To upload a job, choose File ➤ Upload Job. The Remote System dialog
box appears, allowing you to specify information about connecting to the
target mainframe system. Once you have successfully connected to the
target machine, the Job Upload dialog box appears, allowing you to actu-
ally upload the job.
For more details about uploading jobs, see Mainframe Job Developer’s Guide.
Code Customization
When you check the Generate COPY statement for customization box in
the Code generation dialog box, DataStage provides four places in the
generated COBOL program that you can customize. You can add code to
be executed at program initialization or termination, or both. However,
you cannot add code that would affect the row-by-row processing of the
generated program.
When you check Generate COPY statement for customization, four addi-
tional COPY statements are added to the generated COBOL program:
– COPY ARDTUDAT. This statement is generated just before the
PROCEDURE DIVISION statement. You can use this to add
WORKING-STORAGE variables and/or a LINKAGE SECTION
to the program.
– COPY ARDTUBGN. This statement is generated just after the
PROCEDURE DIVISION statement. You can use this to add your
own program initialization code. If you included a LINKAGE
SECTION in ARDTUDAT, you can use this to add the USING
clause to the PROCEDURE DIVISION statement.
– COPY ARDTUEND. This statement is generated just before each
STOP RUN statement. You can use this to add your own program
termination code.
– COPY ARDTUCOD. This statement is generated as the last
statement in the COBOL program. You use this to add your own
paragraphs to the code. These paragraphs are those which are
PERFORMed from the code in ARDTUBGN and ARDTUEND.
Job Properties
Each job in a project has properties, including optional descriptions and
job parameters. To view and edit the job properties from the Designer,
open the job in the Diagram window and choose Edit ➤ Job Properties…
or, if it is not currently open, select it in the Repository window and choose
Properties from the shortcut menu.
The Job Properties dialog box appears. The dialog box differs depending
on whether it is a server job, parallel job, or a mainframe job. A server job
has up to six pages: General, Parameters, Job control, Dependencies,
Performance, and NLS. Parallel job properties are the same as server job
properties except they have an Execution page rather than a Performance
page, and also have a Generated OSH and Defaults page. A mainframe
job has five pages: General, Parameters, Environment, Extensions, and
Operational meta data.
You can also use the Parameters page to set different values for environ-
ment variables while the job runs. The settings only take effect at run-time,
they do not affect the permanent settings of environment variables.
The server job Parameters page is as follows:
Job Parameters
Specify the type of the parameter by choosing one of the following from
the drop-down list in the Type column:
• String. The default type.
• Encrypted. Used to specify a password. The default value is set by
double-clicking the Default Value cell to open the Setup Password
dialog box. Type the password in the Encrypted String field and
Note: You can also use job parameters in the Property name field on the
Properties tab in the stage type dialog box when you create a plug-
in. For more information, see DataStage Server Job Developer’s Guide.
You can also click New… at the top of the list to define a new environ-
ment variable. A dialog box appears allowing you to specify name and
prompt. The new variable is added to the Choose environment vari-
able list and you can click on it to add it to the parameters grid.
3. Set the required value in the Default Value column. This is the only
field you can edit for an environment variable. Depending on the
type of variable a further dialog box may appear to help you enter a
value.
When you run the job and specify a value for the environment variable,
you can specify the special value $ENV, which instructs DataStage to use
the current setting for the environment variable.
Environment variables are set up using the DataStage Administrator, see
DataStage Administrator Guide.
* Now wait for both jobs to finish before scheduling the third job
Dummy = DSWaitForJob(Hjob1)
Dummy = DSWaitForJob(Hjob2)
When browsing for the location of a file on a UNIX server, there is an entry
called Root in the base locations drop-down list (called Drives on mk56 in
the above example).
Note: You cannot use row-buffering of either sort if your job uses
COMMON blocks in transform functions to pass data between
The character set map defines the character set DataStage uses for this job.
You can select a specific character set map from the list or accept the
default setting for the whole project.
The locale determines the order for sorted data in the job. Select the project
default or choose one from the list.
The page shows the current defaults for date, time, timestamp, and
decimal separator. To change the default, clear the corresponding Project
default check box, then either select a new format from the drop down list
or type in a new format.
The Parameters page lists any parameters that have been defined for the
job. If default values have been specified, these are displayed too. You can
enter a value in the Value column, edit the default, or accept the default as
it is. Click Set to Default to set a parameter to its default value, or click All
to Default to set all parameters to their default values. Click Property
Help to display any help text that has been defined for the selected param-
eter (this button is disabled if no help has been defined). Click OK when
you are satisfied with the values for the parameters.
When setting a value for an environment variable, you can specify the
special value $ENV, which instructs DataStage to use the current setting
for the environment variable. Note that you cannot use $ENV when
viewing data on Parallel jobs. You will be warned if you try to do this.
Local Containers
Server The main purpose of using a DataStage local container is to simplify a
jobs complex design visually to make it easier to understand in the Diagram
and window. If the DataStage job has lots of stages and links, it may be easier
Parallel
jobs
to create additional containers to describe a particular sequence of steps.
Containers are linked to other stages or containers in the job by input and
output stages.
Containers 5-1
You can create a local container from scratch, or place a set of existing
stages and links within a container. A local container is only accessible to
the job in which it is created.
The first ODBC stage links to a stage in the container, and is represented
by a Container Input stage. A different stage in the container links to the
second ODBC stage, which is represented by a Container Output stage.
The container Diagram window includes the input and output stages
required to link to the two ODBC stages. Note that the link names match
those used for the links between the ODBC stages and the container in the
main Diagram window.
Containers 5-3
The way in which the Container Input and Output stages are used
depends on whether you construct a local container using existing stages
and links or create a new one:
• If you construct a local container from an existing group of stages
and links, the input and output stages are automatically added.
The link between the input or output stage and the stage in the
container has the same name as the link in the main job Diagram
window.
• If you create a new container, you must add stages to the container
Diagram window between the input and output stages. Link the
stages together and edit the link names to match the ones in the
main Diagram window.
You can have any number of links into and out of a local container, all of
the link names inside the container must match the link names into and
out of it in the job. Once a connection is made, editing meta data on either
side of the container edits the meta data on the connected stage in the job.
Shared Containers
Server Shared containers also help you to simplify your design but, unlike local
jobs containers, they are reusable by other jobs. You can use shared containers
and to make common job components available throughout the project.
Parallel
jobs You can also insert a server shared container into a parallel job as a way of
making server job functionality available. For example, you could use it to
give the parallel job access to the functionality of a plug-in stage. (Note
that you can only use server shared containers on SMP systems, not MPP
or cluster systems.)
Shared containers comprise groups of stages and links and are stored in
the Repository like DataStage jobs. When you insert a shared container
into a job, DataStage places an instance of that container into the design.
When you compile the job containing an instance of a shared container, the
code for the container is included in the compiled job. You can use the
DataStage debugger on instances of shared containers used within jobs.
When you add an instance of a shared container to a job, you will need to
map meta data for the links into and out of the container, as these may vary
in each job in which you use the shared container. If you change the
contents of a shared container, you will need to recompile those jobs that
use the container in order for the changes to take effect. For parallel shared
containers, you can take advantage of runtime column propagation to
Containers 5-5
avoid the need to map the meta data. If you enable runtime column prop-
agation, then, when the jobs runs, meta data will be automatically
propagated across the boundary between the shared container and the
stage(s) to which it connects in the job (see Parallel Job Developer’s Guide for
a description of runtime column propagation).
Note that there is nothing inherently parallel about a parallel shared
container - although the stages within it have parallel capability. The
stages themselves determine how the shared container code will run.
Conversely, when you include a server shared container in a parallel job,
the server stages have no parallel capability, but the entire container can
operate in parallel because the parallel job can execute multiple instances
of it.
You can create a shared container from scratch, or place a set of existing
stages and links within a shared container.
Note: If you encounter a problem when running a job which uses a server
shared container in a parallel job, you could try increasing the
value of the DSIPC_OPEN_TIMEOUT environment variable in the
Parallel ➤ Operator specific category of the enironment variable
dialog box in the DataStage Administrator (see DataStage Adminis-
trator Guide).
A new Diagram window appears in the Designer, along with a Tool palette
which has the same content as for server jobs or parallel jobs, depending
on the type of shared container. You can now save the shared container
and give it a name. This is exactly the same as saving a job (see “Saving a
Job” on page 4-4).
Containers 5-7
• Select its icon in the job design and select Open from the shortcut
menu.
• Choose File ➤ Open from the main menu and select the shared
container from the Open dialog box.
A Diagram window appears, showing the contents of the shared
container. You can edit the stages and links in a container in the same way
you do for a job.
Containers 5-9
The Parameters page is as follows:
This is similar to a general stage editor, and has Stage, Inputs, and
Outputs pages, each with subsidiary tabs.
Stage Page
• Stage Name. The name of the instance of the shared container. You
can edit this if required.
• Shared Container Name. The name of the shared container of
which this is an instance. You cannot change this.
The General tab enables you to add an optional description of the
container instance.
Containers 5-11
The Properties tab allows you to specify values for container parameters.
You need to have defined some parameters in the shared container prop-
erties for this tab to appear.
Inputs Page
When inserted in a job, a shared container instance already has meta data
defined for its various links. This meta data must match that on the link
that the job uses to connect to the container exactly in all properties. The
inputs page enables you to map meta data as required. The only exception
to this is where you are using runtime column propagation (RCP) with a
parallel shared container. If RCP is enabled for the job, and specifically for
the stage whose output connects to the shared container input, then meta
data will be propagated at run time, so there is no need to map it at design
time.
In all other cases, in order to match, the meta data on the links being
matched must have the same number of columns, with corresponding
properties for each.
Containers 5-13
The Inputs page for a server shared container has an Input field and two
tabs, General and Columns. The Inputs page for a parallel shared
container, or a server shared container used in a parallel job, has an addi-
tional tab: Partitioning.
• Input. Choose the input link to the container that you want to map.
The General page is as follows:
Containers 5-15
The Partitioning tab appears for parallel shared containers and when you
are using a server shared container within a parallel job. It has the same
fields and functionality as the Partitioning tab on all parallel stage editors.
See Chapter 3 of DataStage Parallel Job Developer’s Guide for details.
Outputs Page
The Outputs page enables you to map meta data between a container link
and the job link which connects to the container on the output side. It has
an Outputs field and a General tab and Columns tab which perform
equivalent functions as described for the Inputs page.
The columns tab for parallel shared containers has a Runtime column
propagation check box. This is visible provided RCP is enabled for the job.
It shows whether RCP is switched on or off for the link the container link
is mapped onto. This removes the need to map the meta data.
Containers 5-17
5-18 Ascential DataStage Designer Guide
6
Job Sequences
DataStage provides a graphical Job Sequencer which allows you to specify
a sequence of server jobs or parallel jobs to run. The sequence can also
contain control information; for example, you can specify different courses
of action to take depending on whether a job in the sequence succeeds or
fails. Once you have defined a job sequence, it can be scheduled and run
using the DataStage Director. It appears in the DataStage Repository and
in the DataStage Director client as a job.
Job Sequence
Note: This tool is provided in addition to the batch job facilities of the
DataStage Director and the job control routine facilities of the
DataStage Designer.
Designing a job sequence is similar to designing a job. You create the job
sequence in the DataStage Designer, add activities (as opposed to stages)
from the tool palette, and join these together with triggers (as opposed to
links) to define control flow.
Each activity has properties that can be tested in trigger expressions and
passed to other activities further on in the sequence. Activities can also
have parameters, which are used to supply job parameters and routine
arguments.
The job sequence itself has properties, and can have parameters, which can
be passed to the activities it is sequencing.
You can open an existing job sequence in the same way you would open
an existing job (see “Opening an Existing Job” on page 4-2).
Triggers
The control flow in the sequence is dictated by how you interconnect
activity icons with triggers.
Wait-for-file, Unconditional
ExecuteCommand Otherwise
Conditional - OK
Conditional - Failed
Conditional - Custom
Conditional - ReturnValue
Routine Unconditional
Otherwise
Conditional - OK
Conditional - Failed
Conditional - Custom
Conditional - ReturnValue
Job Unconditional
Otherwise
Conditional - OK
Conditional - Failed
Conditional - Warnings
Conditional - Custom
Conditional - UserStatus
Run-activity-on-exception, Unconditional
Sequencer, Email notification
Note: If a job fails to run, for example because it was in the aborted state
when due to run, this will not fire a trigger. Job activities can only
fire triggers if they run. Non-running jobs can be handled by excep-
tion activities, or by choosing an execution action of reset then run
rather than just run for jobs (see page 6-16).
Nested Conditions
Each nested condition can have one input trigger and will normally have
multiple output triggers. You specify the condition it branches on by
editing the expressions attached to the output triggers in the Triggers page
of its Properties dialog box (see “Nested Condition Properties” on
page 6-23).
Sequencer
The Dependencies page of the Properties dialog box shows you the
dependencies the job sequence has. These may be functions, routines, or
jobs that the job sequence runs. Listing the dependencies of the job
sequence here ensures that, if the job sequence is packaged for use on
another system, all the required components will be included in the
package.
The details as follows:
• Type. The type of item upon which the job sequence depends:
– Job. Released or unreleased job. If you have added a job to the
sequence, this will automatically be included in the dependen-
cies. If you subsequently delete the job from the sequence, you
must remove it from the dependencies list manually.
– Local. Locally cataloged BASIC functions and subroutines (i.e.,
Transforms and Before/After routines).
– Global. Globally cataloged BASIC functions and subroutines
(i.e., Custom UniVerse functions).
Activity Properties
When you have outlined your basic design by adding activities and trig-
gers to the diagram window, you fill in the details by editing the properties
of the activities. To edit an activity, do one of the following:
• Double-click the activity in the Diagram window.
• Select the activity and choose Properties… from the shortcut
menu.
• Select the activity and choose Edit ➤ Properties.
The format of the Properties dialog box depends on the type of activity. All
have a General page, however, and any activities with output triggers
have a Triggers page.
Sequencer Properties
In addition to the General and Triggers pages, the Properties dialog box
for a Sequencer control contains a Sequencer page.
2. Select the job to be used as a basis for the template. Click OK.
Another dialog box appears in order to collect details about your
template:
Administrating Templates
To delete a template, start the Job-From-Template Assistant and select the
template. Click the Delete button. Use the same procedure to select and
delete empty categories.
The Assistant stores all the templates you create in the directory you
specified during your installation of DataStage. You browse this directory
when you create a new job from a template. Typically, all the developers
using the Designer save their templates in this single directory.
After installation, no dialog is available for changing the template
directory. You can, however change the registry entry for the template
directory. The default registry value is:
[HKLM/SOFTWARE/Ascential Software/DataStage Client/
currentVersion/Intelligent Assistant/Templates]
2. Select the template to be used as the basis for the job. All the
templates in your template directory are displayed. If you have
custom templates authored by Consulting or other authorized
personnel, and you select one of these, a further dialog box prompts
3. When you have answered the questions, click Apply. You may cancel
at any time if your are unable to enter all the information. Another
dialog appears in order to collect the details of the job you are
creating:
3. When you have chosen your source, and supplied any information
required, click Next. The DataStage Select Table dialog box appears in
order to let you choose a table definition. The table definition speci-
fies the columns that the job will read. If the table definition for your
source data isn’t there, click Import in order to import a table defini-
4. Select a Table Definition from the tree structure and click OK. The
name of the chosen table definition is displayed in the wizard screen.
If you want to change this, click Change to open the Table Definition
dialog box again. This screen also allows you to specify the table
6. Select one of these stages to receive your data: Data Set, DB2, Infor-
mixXPS, Oracle, Sequential File, or Teradata. Enter additional
information when prompted by the dialog.
7. Click Next. The screen that appears shows the table definition that
will be used to write the data (this is the same as the one used to
extract the data). This screen also allows you to specify the table
8. Click Next. The next screen invites you to supply details about the job
that will be created. You must specify a job name and optionally
specify a job category. The job name should follow DataStage naming
10. When the job generation is complete, click Finish to exit the dialog.
All jobs consist of one source stage, one transformer stage, and one target
stage.
In order to support password maintenance, all passwords in your
generated jobs are parameterized and are prompted for at run time.
Table definitions are the key to your DataStage project and specify the data
to be used at each stage of a DataStage job. Table definitions are stored in
the Repository and are shared by all the jobs in a project. You need, as a
minimum, table definitions for each data source and one for each data
target in the data warehouse.
When you develop a DataStage job you will typically load your stages
with column definitions from table definitions held in the Repository.
You can import, create, or edit a table definition using either the DataStage
Designer or the DataStage Manager. (If you are dealing with a large
number of table definitions, we recommend that you use the Manager).
The information given here is the same as on the Format tab in one of the
following parallel job stages:
• Sequential File Stage
• File Set Stage
• External Source Stage
• External Target Stage
• Column Import Stage
• Column Export Stage
See DataStage Parallel Job Developer’s Guide for details.
The Defaults button gives access to a shortcut menu offering the choice of:
• Save current as default. Saves the settings you have made in this
dialog box as the default ones for your table definition.
• Reset defaults from factory settings. Resets to the defaults that
DataStage came with.
Note: You cannot use a server map unless it is loaded into DataStage. You
can load different maps using the DataStage Administrator. For
more information, see DataStage Administrator Guide.
2. Enter the general information for each column you want to define as
follows:
• Column name. Type in the name of the column. This is the only
mandatory field in the definition.
• Key. Select Yes or No from the drop-down list.
• Native type. For data sources with a platform type of OS390,
choose the native data type from the drop-down list. The contents
of the list are determined by the Access Type you specified on the
General page of the Table Definition dialog box. (The list is blank
for non-mainframe data sources.)
• SQL type. Choose from the drop-down list of supported SQL
types. If you are a adding a table definition for platform type
OS390, you cannot manually enter an SQL type, it is automatically
derived from the Native type.
• Length. Type a number representing the length or precision of the
column.
Server Jobs. If you are specifying meta data for a server job type data
source or target, then the Edit Column Meta Data dialog bog box appears
with the Server tab on top. Enter any required information that is specific
to server jobs:
• Data element. Choose from the drop-down list of available data
elements.
• Display. Type a number representing the display length for the
column.
• Position. Visible only if you have specified Meta data supports
Multi-valued fields on the General page of the Table Definition
dialog box. Enter a number representing the field number.
• Type. Visible only if you have specified Meta data supports Multi-
valued fields on the General page of the Table Definition dialog
box. Choose S, M, MV, MS, or blank from the drop-down list.
• Association. Visible only if you have specified Meta data supports
Multi-valued fields on the General page of the Table Definition
dialog box. Type in the name of the association that the column
belongs to (if any).
• NLS Map. Visible only if NLS is enabled and Allow per-column
mapping has been selected on the NLS page of the Table Defini-
tion dialog box. Choose a separate character set map for a column,
which overrides the map set for the project or table. (The per-
column mapping feature is available only for sequential, ODBC, or
generic plug-in data source types.)
• Null String. This is the character that represents null in the data.
• Padding. This is the character used to pad missing columns. Set to
# by default.
Native Storage
Native Data Length COBOL Usage SQL Precision Length
Type (bytes) Representation Type (p) Scale (s) (bytes)
BINARY 2 PIC S9 to S9(4) COMP SmallInt 1 to 4 n/a 2
4 PIC S9(5) to S9(9) COMP Integer 5 to 9 n/a 4
8 PIC S9(10) to S9(18) COMP Decimal 10 to 18 n/a 8
FLOAT
(single) 4 PIC COMP-1 Decimal p+s (default 18) s (default 4) 4
(double) 8 PIC COMP-2 Decimal p+s (default 18) s (default 4) 8
Parallel Jobs. If you are specifying meta data for a parallel job type data
source or target, then the Edit Column Meta Data dialog bog box appears
with the Parallel tab on top. This allows you to enter detailed information
about the format of the column.
• Field level
This has the following properties:
– Bytes to Skip. Skip the specified number of bytes from the end of
the previous field to the beginning of this field.
– Delimiter. Specifies the trailing delimiter of the field. Type an
ASCII character or select one of whitespace, end, none, or null.
– whitespace. A whitespace character is used.
– end. specifies that the last field in the record is composed of all
remaining bytes until the end of the record.
– none. No delimiter.
– null. Null character is used.
– Delimiter string. Specify a string to be written at the end of the
field. Enter one or more ASCII characters.
– Drop on Input. Specify this property when you must fully define
the layout of your input data but do not want this field actually
read into the data set.
This dialog box displays all the table definitions in the project in the
form of a table definition tree.
2. Double-click the appropriate branch to display the table definitions
available.
3. Select the table definition you want to use.
Note: You can use the Find… button to enter the name of the table
definition you want. The table definition is automatically high-
lighted in the tree when you click OK. You can use the Import
button to import a table definition from a data source.
4. If you cannot find the table definition, you can click Import ➤ Data
source type to import a table definition from a data source (see
“Importing a Table Definition” on page 8-11 for details).
Use the arrow keys to move columns back and forth between the
Available columns list and the Selected columns list. The single
arrow buttons move highlighted columns, the double arrow buttons
move all items. By default all columns are selected for loading. Click
Find… to open a dialog box which lets you search for a particular
column. The shortcut menu also gives access to Find… and Find Next.
Click OK when you are happy with your selection. This closes the
Select Columns dialog box and loads the selected columns into the
stage.
For mainframe stages and certain parallel stages where the column
definitions derive from a CFD file, the Select Columns dialog box may
also contain a Create Filler check box. This happens when the table
definition the columns are being loaded from represents a fixed-width
table. Select this to cause sequences of unselected columns to be
collapsed into filler items. Filler columns are sized appropriately, their
datatype set to character, and name set to FILLER_XX_YY where XX is
the start offset and YY the end offset. Using fillers results in a smaller
set of columns, saving space and processing time and making the
column set easier to understand.
If you are importing column definitions that have been derived from
a CFD file into server or parallel job stages, you are warned if any of
Propagating Values
You can propagate the values for the properties set in a column to several
other columns. Select the column whose values you want to propagate,
then hold down shift and select the columns you want to propagate to.
Choose Propagate values... from the shortcut menu to open the dialog
box.
The Data Browser uses the meta data defined in the data source. If there is
no data, a Data source is empty message appears instead of the Data
Browser.
The Display… button opens the Column Display dialog box. It allows
you to simplify the data displayed by the Data Browser by choosing to
hide some of the columns. It also allows you to normalize multivalued
data to provide a 1NF view in the Data Browser.
This dialog box lists all the columns in the display, and initially these are
all selected. To hide a column, clear it.
The Normalize on drop-down list box allows you to select an association
or an unassociated multivalued column on which to normalize the data.
The default is Un-Normalized, and choosing Un-Normalized will display
In the example, the Data Browser would display all columns except
STARTDATE. The view would be normalized on the association PRICES.
Note: You do not need to edit the Format page for a stored procedure
definition.
Note: You do not need a result set if the stored procedure is used for input
(writing to a database). However, in this case, you must have input
parameters.
Note: You cannot use a map unless it is loaded into DataStage. You can
load different maps using the DataStage Administrator. For more
information, see DataStage Administrator Guide.
This chapter describes the programming tasks that you can perform in
DataStage.
The programming tasks that might be required depend on whether you
are working on server jobs, parallel jobs, or mainframe jobs. This chapter
provides a general introduction to the subject, telling you what you can
do. Details of programming tasks are in DataStage Server: Server Job Devel-
oper’s Guide, DataStage Enterprise Edition: Parallel Job Developer’s Guide, and
DataStage Enterprise MVS Edition: Mainframe Job Developer's Guide.
Note: When using shared libraries you will need to ensure that they
libraries are in the right order in the LD_LIBRARY PATH environ-
ment variable (UNIX servers).
Programming Components
There are different types of programming components used in server jobs.
They fall within these three broad categories:
• Built-in. DataStage has several built-in programming components
that you can reuse in your server jobs as required. Some of the
built-in components are accessible using the DataStage Manager or
DataStage Designer, and you can copy code from these. Others are
Routines
Routines are stored in the Routines branch of the DataStage Repository,
where you can create, view, or edit them using the Routine dialog box. The
following program components are classified as routines:
• Transform functions. These are functions that you can use when
defining custom transforms. DataStage has a number of built-in
transform functions which are located in the Routines ➤ Exam-
ples ➤ Functions branch of the Repository. You can also define
your own transform functions in the Routine dialog box.
• Before/After subroutines. When designing a job, you can specify a
subroutine to run before or after the job, or before or after an active
stage. DataStage has a number of built-in before/after subroutines,
which are located in the Routines ➤ Built-in ➤ Before/After
branch in the Repository. You can also define your own
before/after subroutines using the Routine dialog box.
• Custom UniVerse functions. These are specialized BASIC func-
tions that have been defined outside DataStage. Using the Routine
dialog box, you can get DataStage to create a wrapper that enables
you to call these functions from within DataStage. These functions
are stored under the Routines branch in the Repository. You
specify the category when you create the routine. If NLS is enabled,
Transforms
Transforms are stored in the Transforms branch of the DataStage Reposi-
tory, where you can create, view or edit them using the Transform dialog
box. Transforms specify the type of data transformed, the type it is trans-
formed into, and the expression that performs the transformation.
DataStage is supplied with a number of built-in transforms (which you
cannot edit). You can also define your own custom transforms, which are
stored in the Repository and can be used by other DataStage jobs.
When using the Expression Editor, the transforms appear under the DS
Transform… command on the Suggest Operand menu.
Functions
Functions take arguments and return a value. The word “function” is
applied to many components in DataStage:
• BASIC functions. These are one of the fundamental building
blocks of the BASIC language. When using the Expression Editor,
Expressions
An expression is an element of code that defines a value. The word
“expression” is used both as a specific part of BASIC syntax, and to
describe portions of code that you can enter when defining a job. Areas of
DataStage where you can use such expressions are:
• Defining breakpoints in the debugger
• Defining column derivations, key expressions and constraints in
Transformer stages
• Defining a custom transform
In each of these cases the DataStage Expression Editor guides you as to
what programming elements you can insert into the expression.
Subroutines
A subroutine is a set of instructions that perform a specific task. Subrou-
tines do not return a value. The word “subroutine” is used both as a
specific part of BASIC syntax, but also to refer particularly to before/after
subroutines which carry out tasks either before or after a job or an active
stage. DataStage has many built-in before/after subroutines, or you can
define your own.
Macros
DataStage has a number of built-in macros. These can be used in expres-
sions, job control routines, and before/after subroutines. The available
macros are concerned with ascertaining job status.
When using the Expression Editor, they appear under the DS Macro…
command on the Suggest Operand menu.
Expressions
Expressions are defined using a built-in language based on SQL3. For
more information about this language, see Mainframe Job Developer’s Guide.
You can use expressions to specify:
• Column derivations
• Key expressions
• Constraints
• Stage variables
You specify these in various mainframe job stage editors as follows:
• Transformer stage – column derivations for output links, stage
variables, and constraints for output links
• Relational stage – key expressions in output links
• Complex Flat File stage – key expressions in output links
• Fixed-Width Flat File stage – key expressions in output links
• Join stage – key expression in the join predicate
• External Routine stage – constraint in each stage instance
Routines
The External Routine stage enables you to call a COBOL subroutine that
exists in a library external to DataStage in your job. You must first define
the routine, details of the library, and its input and output arguments. The
routine definition is stored in the DataStage Repository and can be refer-
enced from any number of External Routine stages in any number of
mainframe jobs.
Defining and calling external routines is described in more detail in the
Mainframe Job Developer’s Guide.
Expressions
Expressions are used to define:
• Column derivations
• Constraints
• Stage variables
Expressions are defined using a built-in language. The Expression Editor
available from within the Transformer stage helps you with entering
appropriate programming elements. It operates for parallel jobs in much
the same way as it does for server jobs and mainframe jobs. It helps you to
enter correct expressions and can:
• Facilitate the entry of expression elements
• Validate variable names and the complete expression
For more details about the expression editor, and about the built-in
language, see DataStage Parallel Job Developer’s Guide.
Functions
For many expressions you can choose ready made functions from the
built-in ones supplied with DataStage. You can also, however, define your
own functions that can be accessed from the expression editor, such func-
tions must be supplied within a UNIX shared library or in a standard
object file (filename.o) and then referenced by defining a parallel routine
within the DataStage project which calls it. For details of how to define a
function, see DataStage Manager Guide.
Routines
Parallel jobs also have the ability of executing routines before or after an
active stage executes. These routines are defined and stored in the
DataStage Repository, and then called in the Triggers page of the partic-
ular Transformer stage Properties dialog box (see Parallel Job Developer’s
Guide for more details). These routines must be supplied in a UNIX shared
library or an object file, and do not return a value. For details of how to
define a routine, see DataStage Manager Guide.
DataStage uses grids in many dialog boxes for displaying data. This
system provides an easy way to view and edit tables. This appendix
describes how to navigate around grids and edit the values they contain.
Grids
The following screen shows a typical grid used in a DataStage dialog box:
On the left side of the grid is a row selector button. Click this to select a
row, which is then highlighted and ready for data input, or click any of the
cells in a row to select them. The current cell is highlighted by a chequered
border. The current cell is not visible if you have scrolled it out of sight.
Some cells allow you to type text, some to select a checkbox and some to
select a value from a drop-down list.
You can move columns within the definition by clicking the column
header and dragging it to its new position. You can resize columns to the
available space by double-clicking the column header splitter.
• Select and order columns. Allows you to select what columns are
displayed and in what order. The Grid Properties dialog box
displays the set of columns appropriate to the type of grid. The
example shows columns for a server job columns definition. You
can move columns within the definition by right-clicking on them
and dragging them to a new position. The numbers in the position
column show the new position.
• Allow freezing of left columns. Choose this to freeze the selected
columns so they never scroll out of view. Select the columns in the
grid by dragging the black vertical bar from next to the row
headers to the right side of the columns you want to freeze.
• Allow freezing of top rows. Choose this to freeze the selected rows
so they never scroll out of view. Select the rows in the grid by drag-
Key Action
Right Arrow Move to the next cell on the right.
Left Arrow Move to the next cell on the left.
Up Arrow Move to the cell immediately above.
Down Arrow Move to the cell immediately below.
Tab Move to the next cell on the right. If the current cell is
in the rightmost column, move forward to the next
control on the form.
Shift-Tab Move to the next cell on the left. If the current cell is
in the leftmost column, move back to the previous
control on the form.
Page Up Scroll the page down.
Page Down Scroll the page up.
Home Move to the first cell in the current row.
End Move to the last cell in the current row.
Key Action
Esc Cancel the current edit. The grid leaves edit mode, and
the cell reverts to its previous value. The focus does not
move.
Enter Accept the current edit. The grid leaves edit mode, and
the cell shows the new value. When the focus moves
away from a modified row, the row is validated. If the
data fails validation, a message box is displayed, and the
focus returns to the modified row.
Up Arrow Move the selection up a drop-down list or to the cell
immediately above.
Down Move the selection down a drop-down list or to the cell
Arrow immediately below.
Left Arrow Move the insertion point to the left in the current value.
When the extreme left of the value is reached, exit edit
mode and move to the next cell on the left.
Right Arrow Move the insertion point to the right in the current value.
When the extreme right of the value is reached, exit edit
mode and move to the next cell on the right.
Ctrl-Enter Enter a line break in a value.
Adding Rows
You can add a new row by entering data in the empty row. When you
move the focus away from the row, the new data is validated. If it passes
validation, it is added to the table, and a new empty row appears. Alterna-
tively, press the Insert key or choose Insert row… from the shortcut menu,
and a row is inserted with the default column name Newn, ready for you
to edit (where n is an integer providing a unique Newn column name).
Propagating Values
You can propagate the values for the properties set in a grid to several
rows in the grid. Select the column whose values you want to propagate,
then hold down shift and select the columns you want to propagate to.
Choose Propagate values... from the shortcut menu to open the dialog
box.
In the Property column, click the check box for the property or properties
whose values you want to propagate. The Usage field tells you if a partic-
ular property is applicable to certain types of job only (e.g. server,
mainframe, or parallel) or certain types of table definition (e.g. COBOL).
The Value field shows the value that will be propagated for a particular
property.
Troubleshooting B-1
then you must edit the UNIAPI.INI file and change the value of the
PROTOCOL variable. In this case, change it from 11 to 12:
PROTOCOL = 12
Troubleshooting B-3
#LC_TIME="<langdef>";export LC_TIME
#LC_MESSAGES="<langdef>"; export LC_MESSAGES
Take the following steps:
1. Replace all occurences of <langdef> with the locale used by the
server (the locale must be one of those listed when you use the
locale -a command).
2. Remove the #s at the start of the lines.
3. Stop and restart the DataStage server:
To stop the server:
$DSHOME/bin/uv -admin -stop
To start the server:
$DSHOME/bin/uv -admin -start
Ensure that you allow sufficient time between executing stop and
start commands (minimum of 30 seconds recommended).
Miscellaneous Problems
Index-1
viewing 5-2 definition 1-11
Copy stage 1-11, 4-13 exiting 3-38
creating main window 3-2
containers 5-2 options 4-74
data warehouses 1-2 starting 3-1
jobs 2-4 DataStage Designer window 3-2
stored procedure definitions 8-37 menu bar 3-6
table definitions 8-12 shortcut menus 3-19
currency formats 4-75 status bar 3-18
current cell in grids A-1 tool palette 3-15
custom transforms, definition 1-11 toolbar 3-15
customizing DataStage Director 1-5
COBOL code 4-53 definition 1-11
DataStage Manager 1-5
D definition 1-11
starting 2-3
data DataStage Manager window 2-4
aggregating 1-3 DataStage Package Installer 1-6
extracting 1-3 definition 1-11
sources 1-17 DataStage Repository 1-6
transforming 1-3 definition 1-16
Data Browser 2-14, 8-31 DataStage Server 1-6
definition 1-11 date formats 4-75
using 4-39 DB2 Load Ready Flat File stages,
data elements definition 1-12
definition 1-11 DB2 stage 1-12, 4-11, 4-15
Data Migration Assistant 7-6 debugger toolbar 3-18
Data Set stage 1-12 Decode stage 1-12, 4-13
Data set stage 4-12 defining
data warehouses data warehouses 1-3
advantages of 1-4 locales 4-74, 4-76
example 2-1 maps 4-74, 4-76
DataStage table definitions 2-4
client components 1-5 deleting
concepts 1-10 column definitions 4-32, 8-29, 8-40
jobs 1-6 links 4-30
programming in 9-1 stages 4-28
projects 1-6 Delimited Flat File stages,
server components 1-6 definition 1-12
terms 1-10 developer, definition 1-12
DataStage Administrator 1-5 developing jobs 2-9, 3-1, 4-1, 4-26
definition 1-11 Diagram window 3-12
DataStage Designer 1-5, 3-1 Difference stage 1-12
Index-3
Inter-process stage 1-13 link partitioner stage 1-14
linking stages 2-10, 4-28
J links
deleting 4-30
JCL templates 4-52 moving 4-29
job 1-14 multiple 4-30
job activity 6-4 renaming 4-30
job control routines 4-67 loading column definitions 4-33, 8-27
definition 1-14 local container
job parameters 4-62 definition 1-14
job properties 4-54 Local container stages 4-8, 4-15
editing 4-54 locales
saving 4-80 and jobs 4-75
viewing 4-54 specifying 4-74, 4-76
job sequence Lookup File stage 1-14
definition 1-14 Lookup file stage 4-12
jobs Lookup stage 4-14
compiling 2-23 Lookup stages, definition 1-14
creating 2-4
defining locales 4-74, 4-76 M
defining maps 4-74, 4-76
definition 1-14 mainframe job stages
dependencies, specifying 4-70 Multi-Format Flat File 1-15
developing 2-9, 3-1, 4-1, 4-26 mainframe jobs 1-7
mainframe 1-14 definition 1-14
opening 4-2 Make Subrecord stage 1-14
overview 1-6 Make subrecord stage 4-15
properties of 4-54 Make vector stage 4-15
running 2-24 Make Vector stageparallel job stages
server 1-16 Make Vector 1-14
version number 4-55, 4-79 manually entering
Join stage 4-14 stored procedure definitions 8-36
Join stages, definition 1-14 table definitions 8-12
massively parallel processing 1-15
K menu bar
in DataStage Designer window 3-6
key field 8-4, 8-35 Merge stage 1-14, 4-15
meta data
L definition 1-14
importing from a UniData
link collector stage 1-14 database B-2
Link Partitioner Stage 4-7 MetaBrokers
Index-5
R source, definition 1-17
specifying
reference links 4-16, 4-19 Designer options 3-25
Relational stages, definition 1-16 input parameters for stored
Remove duplicates stage 1-16, 4-14 procedures 8-38
renaming job dependencies 4-70
links 4-30 Split Subrecord stage 1-17
stages 4-28 Split subrecord stage 4-15
Repository 1-6 Split Vector stage 1-17
definition 1-16 Split vector. stage 4-15
routine activity 6-4 SQL
Routine dialog box 9-6 data precision 8-4, 8-35
routines data scale factor 8-4, 8-35
parallel jobs 9-8 data type 8-4, 8-35
routines, writing 9-2 display characters 8-4, 8-35
row selector button A-1 stages 4-5
Run-activity-on-exception activity 6-4 adding 2-9, 4-26
running a job 2-24 Aggregator 1-10, 4-7, 4-13
BCPLoad 1-10
S column definitions for 4-31
Complex Flat File 1-11
Sample stage 1-16 Container 1-11
SAS stage 1-16, 4-14 Container Input 4-8
saving job properties 4-80 Container Output 4-8, 4-15
Sequential file stage 4-13 DB2 Load Ready Flat File 1-12
Sequential File stages 4-6 definition 1-17
definition 1-16 deleting 4-28
Server 1-6 Delimited Flat File 1-12
server directories, browsing 4-37 editing 2-11, 4-30
server jobs 1-6 External Routine 1-12
definition 1-16 Fixed-Width Flat File 1-13
server shared container stages 4-8, Folder 4-7
4-15 FTP 1-13
setting up a project 2-2 Hashed File 1-13, 4-6
shared container Join 1-14
definition 1-16 linking 2-10
shortcut menus local container 4-8, 4-15
in DataStage Designer Lookup 1-14
window 3-19 moving 4-28
SMP 1-16 ODBC 1-15, 4-6
sort order 4-75, 4-76 Oracle 7 load 1-15
Sort stage 4-14 plug-in 1-16, 4-8
Sort stages, definition 1-16 Relational 1-16
Index-7
Index-8 Ascential DataStage Designer Guide