0% found this document useful (0 votes)
33 views

Syllabus of DataStage Course

This document outlines the modules in a DataStage course. The course covers topics such as data warehousing concepts, ETL processes, working with the DataStage designer and director, job types, file handling stages, data transformation techniques, database connectivity, deployment, monitoring, performance tuning, and best practices. The modules provide instruction on DataStage architecture, installation, job design, parallel processing, exception handling, and deployment to an Information Server.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Syllabus of DataStage Course

This document outlines the modules in a DataStage course. The course covers topics such as data warehousing concepts, ETL processes, working with the DataStage designer and director, job types, file handling stages, data transformation techniques, database connectivity, deployment, monitoring, performance tuning, and best practices. The modules provide instruction on DataStage architecture, installation, job design, parallel processing, exception handling, and deployment to an Information Server.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Syllabus of DataStage Course

Module 1: Introduction to Data Warehouse Concepts

 What is Data Warehouse?


 Data Mart
 OLAP VS OLTP
 Data Warehouse Architecture
 What is Data Modelling?
 Explorer on Dimensional Modelling
 Explorer on Star Schema
 Explain Snowflake Schema
 Understanding on Dimension
 Understanding on Fact
 Slowly Changing Dimension
 Lifecycle of Data Warehous

Module 2: Understanding onETL  (EXTRACTION, TRANSACTION, LOAD)

 Overview of ETL
 Feature and benefit for Business
 Different SCD Types
 ETL tools in markets
 Explain on staging tables
 Explain on Transformation
 Loading data into different stage of table

Module 3: Overview of InfoSphere DataStage

 What is InfoSphere DataStage?


 Architecture of DataStage
 Explain on Topologies
 Components in DataStage
 Runtime Architecture
 OSH Script and Execution Flow

Module 4: Install and Configuration on InfoSphere DataStage

 Prerequisite for InfoSphere DataStage


 Install InfoSphere DataStage
 Verify Installation
 Setup Environment variables
 Create / Update / Delete projects
 User creation and Grand permission
Module 5: Working with DataStage Designer

 Overview of Designer
 Explorer on DataStage Designer
 High level overview of Commonly used Stages
 Schema
 Pipeline Parallelism
 Manipulate configuration file
 Repository Palette
 Passive and Active stages
 Annotation and Create jobs
 Import and Export Metadata
 Dataset Management
 Partition technique

Module 6: DataStage Job

 Overview of Job types


 Explain on Sequence and Parallel Jobs
 Explain on Server Jobs
 Different stages
 Understanding Containers

Module 7: DataStage Director

 Introduction to DataStage Director


 User Interface Director
 Job status and view
 Compiling Single and Multiple jobs
 Run, Reset ad Restart jobs
 Scheduling Batches
 Performance monitor

Module 8: Creating Parallel Job

 Overview of Parallel Jobs


 Design a Parallel Job using Designer
 Pipeline Parallelism
 Partition Parallelism
 NLS Mode Work
 Maps in Parallel Jobs
 Run Parallel Jobs

Module 9: Handle Files

 Introduction to file handling


 Sequence and Complex file stage
 Huge File Manipulation
 Error and Invalid Records Rejection

Module 10: File Stages

 File Stages
 Sequential File stage
 Explain on DataSet
 Complex Flat File stage
 Create jobs to read and write on sequential files
 Multiple file reader using file patterns
 Null handling in File Stage
 Lookup file Set

Module 11: Combining and Partitioning Data

 Overview of data process for combine and Partitioning


 Combine data using by Lookup stage
 Combine data using by merge stage
 Combine data using by Join stage
 Combine data using the Funnel stage

Module 12: Sorting and Aggregating Data

 Sort data using in-stage sorts and Sort stage


 Data Segregation using Aggregates stage
 Unique data using Duplicates stage

Module 13: Transformation on Data

 Understanding DataStage internal logical message


 Column generator and Row generator
 Transform message one to another format
 Filter Data using on business criteria
 Control data flow based on data conditions
 Cover real time scenario using different Processing Stages
 Routes creation

Module 14: Working with Relational DataBase

 Understanding Database Stage


 Database Metadata
 Explain on ODBC Connection
 Import Definitions for Tables.
 Use Connector stages in a job.
 Define SQL statements using Builder
 ODBC Connector
 Oracle Connector
 Parallel Job with Connector

Module 15: Advanced Parallel Jobs

 Overview of Type1 and Type2 process


 Range look process
 Job Performance analysis
 Performance tuning

Module 16: Job Sequence

 Job activities in Sequencer


 Sequence Trigger
 Notification and Terminator activity
 Start and End Loop activity
 Error and Exception handling activity

Module 17: Working with Cleansing Data

 Overview of Cleansing
 Explain Workflow of Standardization
 Create and Configure Standardize Stage Job
 Explain on Rule sets
 Managing Rule sets

Module 18: Exception Handling on DataStage

 Introduction to Exception Handling


 How to Design job to link with Exception
 Explain on Exception stage
 One-source and Two-source Match Exception Stage
 Route exception to Exception Stage

Module 19: Deployment on InfoSphere DataStage

 Introduction to InfoSphere Information Server Manager


 Explain on Deployment life cycle
 Adding Domain on Information Server Manager
 View job and asset properties
 Explain Deployment Package
 Deployment Workflow
 Define Deployment Package
 Setup Deploy Path
 Deploy Package
 Import and Export Assets
 Explain various types of Source Control for DataStage
Module 20: Working with Monitoring Jobs

 Introduction to Monitoring Jobs


 Explain on Operations Console
 How to Monitoring Jobs by using console

Module 21: Performance Tuning Job

 Understanding performance impact activities


 Design Job for Optimal Performance
 Design flow with minimize CPU and Memory usage
 Explain on Buffering
 Deadlock prevention

Module 22: Best Practice on DataStage and Data Load

You might also like