IBM WebSphere DataStage
DataStage EE Basics
Creating a Job
DW & BI IMPACT Training 2008
Kolkata
© 2007 IBM Corporation
IBM GBS | WebSphere DataStage PX Training
Where We are
MODULE 01 Introduction to DataStage
MODULE 02 DataStage Installation on Windows Platform
MODULE 03 Features of DataStage Clients
MODULE 04 Creating a Job
MODULE 05 Accessing Sequential Data
MODULE 06 Combining Data
MODULE 07 Splitting Data
MODULE 08 Transforming Data
MODULE 09 Sorting and Aggregating Data
MODULE 10 Accessing Relational Data
MODULE 11 Job Control
MODULE 12 Architecture and Parallelism Concepts
© 2007 IBM Corporation
IBM GBS | WebSphere DataStage PX Training
Module 04
Creating a Job
© 2007 IBM Corporation
IBM GBS | WebSphere DataStage PX Training
Developing Jobs in DataStage
Study Given Technical Specification
Plan Job Design
Decide Stage Types
Import Metadata of source and target [and lookup reference] in
manager
Build job in Designer
Configure Stages
Define Job parameters
Maintain Design Standards
Compile job in Designer
Run and monitor job in Director
Module 03 © 2007 IBM Corporation
IBM GBS | WebSphere DataStage PX Training
Study Given Technical Specification
source and target column details
Source Information
Target Information
Mapping Rules – Business logic
Module 03 © 2007 IBM Corporation
IBM GBS | WebSphere DataStage PX Training
Attaching to a Project
Module 03 © 2007 IBM Corporation
IBM GBS | WebSphere DataStage PX Training
Open DataStage Designer
Module 03 © 2007 IBM Corporation
IBM GBS | WebSphere DataStage PX Training
Plan Job Design
1. Analyze the source data and what stage we can use to extract the
source data
2. Implement the business logic in DataStage
3. Load the Data into the database table
© 2007 IBM Corporation
IBM GBS | WebSphere DataStage PX Training
Job Parameters
Go to Job Properties
Use parameters defined in Project Level: User defined
parameters in environment variable. Default value can be hard
coded or retrieved at runtime [$PROJDEF]
Create Job- level parameters
Parameters can be passed from a sequence. Value defined in
higher level takes precedence
© 2007 IBM Corporation
IBM GBS | WebSphere DataStage PX Training
A sample Job in Designer
Compile Job
Annotation
Job properties
Run Job
© 2007 IBM Corporation
IBM GBS | WebSphere DataStage PX Training
Job Properties View generated script
Job parameters in OSH
Run any BASIC sub-routine
before or after Job run
Short description of
the Job
Full Job description to
track modification
history
© 2007 IBM Corporation
IBM GBS | WebSphere DataStage PX Training
Job Parameters
Parameter name
Parameter
Parameter prompt to default value
be seen at run time
Parameter type
Project level
parameters
Job level
parameters
Add new
environment variable
© 2007 IBM Corporation
IBM GBS | WebSphere DataStage PX Training
Design Standards – Naming Convention
Maintain Naming Conventions for stages and links
Use Job and stage annotations
Name the stages after the Data they access/Function they
perform
DO NOT leave default stage names like Sequential_File_0
Use 2-character prefixes to indicate stage type
Links named for the data they carry
DO NOT leave default link names like DSLink1
Prefix all link names with “lnk_”
© 2007 IBM Corporation
IBM GBS | WebSphere DataStage PX Training
Design Standards – Development Approach
Follow Iterative Job Design
• Use Copy and Peek stages as stubs
• Start small and build to final solution
• Start from source and work out
Test job in phases
• Small sections first, then increasing in complexity
• Use Peek stage/datasets to examine records
• Check data at various locations
• Check before and after processing
Solve the business problem before the performance problem
© 2007 IBM Corporation
IBM GBS | WebSphere DataStage PX Training
Compile/Validate/Run
Compiling a Job:
• creates the OSH script for the Job
• C++ code for the transformers
Validating a Job:
• performs connectivity check and authentication
• Basically does everything except process rows. It opens any required files and
connect to data sources.
• Prepares SQL and captures any return message referring to syntax errors from
database
• It is not necessary to validate a job every time you change its design. If the
change does not affect any passive stage, then re-validating it won't really
prove anything. Missing property values will be picked up when the job is
compiled.
• Validation is useful on initial deployment.
Run a Compiled Job
• DataStage Engine runs the processes on the available nodes [ details later]
© 2007 IBM Corporation
IBM GBS | WebSphere DataStage PX Training
Q&A
© 2007 IBM Corporation