0% found this document useful (0 votes)

537 views94 pages

DataStage PPT

1. ETL (Extraction, Transformation, and Loading) is usually a batch process that handles large volumes of data from heterogeneous sources to load into data warehouses, marts, and analytical applications. 2. DataStage is an ETL tool that provides a graphical interface for designing data flows to extract, transform, and load data. It utilizes stages connected by links to represent these processes. 3. The Designer component allows creating and editing DataStage jobs, while the Director is used for scheduling, monitoring, and running jobs on the DataStage server.

Uploaded by

sainisaurabh_1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

537 views94 pages

DataStage PPT

Uploaded by

sainisaurabh_1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 94

ETL Basics

Extraction Transformation & Load

Usually a
batch process of large volumes of data

Scenarios
Load a warehouse, mart, analytical and reporting applications Application/Data Integration Load packaged applications, or external systems through their APIs or interface databases Data Migration

Extract

Transform

Heterogeneous Data Sources: Relational & non-relational databases; Sequential flat files, complex flat files, COBOL files, VSAM data, XML data, etc.; Packaged Applications (e.g. SAP, Siebel, etc.) Incremental/changed data or complete/snapshot data Internal data or third-part data Push/Pull Cleansing & validation Simple - Range checks, duplicate checks, NULL value transforms etc. Specialized/Complex Name & address validations, de-duplication, etc.

Transform

Load

Computations (arithmetic, string, date, etc.) Pivot Split or Concatenate Aggregate Filter Join, look-up

Historical vs. Refresh load Incremental vs. Snapshot Bulk Loading vs Record-level Loading

ETL Platform Options

Database features including SQL, stored procedures, etc.: Oracle, Teradata, etc. Code-based custom scripts: PL/SQL, Cobol, Pro*C, etc. Engine-based products: IBM/Ascential DataStage, Informatica PowerCenter, Ab Initio

Usual features provided by ETL tools:

Graphical data flow definition interfaces for easy development Native & ODBC connectivity to standard databases, packages, etc. Metadata maintenance components Metadata import & export from standard databases, packages, etc. Inbuilt standard functions & transformations e.g. date, aggregate, sort, etc. Options for sharing or reusing developed components Facility to call external routines or write custom code for complex requirement Batch definition to handle dependencies between data flows to create the application ETL Engines that handle the data manipulation without depending on the database engines. Run-time support for monitoring the data flow and reading message logs Scheduling options

Architecture of a Typical ETL Tool

Source & Target Database

ETL Metadata Repository

Metadata

Data

ETL Engine Data

GUI-Based Development Environment

Metadata Definition/Import/Export Data Flow & Transformation Definition Batch Definition Test & Debug Schedule

Run-time Environment
Trigger ETL Monitor flow View logs

Optional additional functions

Cleansing capability, name & address cleansing, de-duplication Data Profiling Metadata Management Run Audit Pre-built templates Additional adaptors for interfacing with third-party products, models & protocols

Server Component DataStage Server Repository DataStage Package Installer

Client Component DataStage Designer DataStage Director DataStage Administrator DataStage Manager

DataStage Components

Engine

4
Sources Manager

Server

4
Targets
ETL Metadata Maintained in internal format

Repository

Execute Jobs Monitor Jobs, view job logs

Director
Manage Repository Create custom routines & transforms Import & Export component definitions

Designer

Assemble Jobs Debug Compile Jobs Execute Jobs

DataStage Server Components

DataStage Server:
Available for : Win NT, 2000, Server 2003; IBM AIX; HP Compaq Tru64; HP HP-UX; Red Hat Enterprise Linux AS; Sun Solaris Server runs the executable, managing data

Repository:
Contains all the metadata, mapping rules, etc. DataStage applications are organized into Projects, each server can handle multiple projects DataStage repository maintained in an internal format & not in the database

Package Installer
Note: DataStage uses the OS-level security - Only root/admin user can administer the server

DataStage Client Components

Windows-based components Need to access the server at development time as well Designer. to create DataStage jobs , compiled to create the executables Director: validate, schedule, run, and monitor jobs Manager: view and edit the contents of the Repository. Administrator: setting up users, creating and moving projects, and setting up purging criteria Designer, Director & Manager can connect to one Project at a time

Most DataStage configuration tasks are carried out using the DataStage Administrator, a client program provided with DataStage. To access the DataStage Administrator: 1. From the Ascential DataStage program folder, choose DataStage Administrator.

Log on to the server. If you do so as an Administrator (for Windows NT servers), or as dsadm (for UNIX servers), you have unlimited administrative rights; otherwise your rights are restricted as described in the previous section. 3. The DataStage Administration window appears: The General page lets you set server-wide properties. It is enabled only when at least one project exists. The controls and buttons on this page are enabled only if you logged on as an administrator

1.Used to store and manage re-usable metadata for the jobs. 2. Used to import and export components from file-system to DataStage projects.. 3. Primary interface to the DataStage Repository. 4. Custom routines and transforms can also be created in the Manager

The DataStage Director is the client component that validates,runs,schedules and monitors jobs run by the DataStage Server. It is the starting point for most off the tasks a DataStage operator needs to do in respect of DataStage jobs..

Job Category Pan

Menu Bar

Toolbar

Status Bar

Display Area

The display area is the main part of the DataStage Director window. There are three views:

Job Status - The default view, which appears in the right pane of the DataStage Director window. It displays the status of all jobs in the category currently selected in the job category tree. If you hide the job category pane, the Job Statues view includes a Category column, and displays the status of all server jobs in the current project, regardless of their category.

Job Schedule - Displays a summary of scheduled jobs and batches in the currently selected job category..If the job category pane is hidden, the display area shows all scheduled jobs and batches, regardless of their category. Job Log- Displays the log file for a job chosen from the Job Status view or the Job Schedule view.

DataStage Designer is used to: Create DataStage Jobs that are compiled into executable programs. Design the jobs that extract, integrate, aggregate, load, and transform the data. Create and reuse metadata and job components Allows you to use familiar graphical point and-click techniques to develop processes for extracting, cleansing, transforming, Integrating and loading data.

Use Designer to: Specify how data is extracted.. Specify data transformations.. Decode data going into the target tables using reference lookups Aggregate Data.. Split data into multiple outputs on the basis of defined constraints

The Designer graphical interface lets you select Stage icons, drop them onto the Designer work area, and add links. Then, still working in the Designer, you define the required actions and processes for each stage and link. A job created with the Designer is easily scalable. This means that you can easily create a simple job, get it working, then insert further processing, additional data sources, and so on.

1.Enter the name of your host in the Host system field. This is the name of the system where the DataStage server components are installed. 2. Enter your user name in the User name field. This is your user name on the server system. 3. Enter your password in the Password field. 4. Choose the project to connect to from the Project list. This list box displays all the projects installed on your DataStage server. At this point, you may only have one project installed on your system and this is displayed by default. 5. Select the Save settings check box to save your logon settings

The DataStage Designer window consists of the following parts: One or more Job windows where you design your jobs. The Property Browser window where you view the properties of the selected job. The Repository window where you view components in a projects. A Toolbar from where you select Designer functions. A Tool Palette from which you select job components. A Debug Toolbar from where you select debug functions. A Status Bar which displays one-line help for the window

components, and information on the current state of job operations, for example, compilation. For full information about the Designer window, including the functions of the pull-down and shortcut menus, refer to the DataStage Designer Guide.

STAGES IN DATASTAGE
FILE:SEQUENTIAL FILE

DATA SET

PROCESSING
TRANSFORMER

COPY

FILTER

SORTER

AGGREGATOR

FUNNEL

REMOVE DUPLICATE

JOIN

LOOK UP

MERGE

MODIFY

NETEZZA

TERADATA

ORACLE

Learn how to:

Create a Enterprise Edition Job that generates data, and take a look at some of that data.

Create a Job Select and position stages Connect stages with links Import a schema Set stage options Save, Compile & Run a Job View and Delete Job Log Row Generator Peek
40

Stages used:

To create a new job: Select FileNew, and select Parallel OR Click the New Program icon on the toolbar

Creating a New Job

Create the following flow:

Select the Row Generator stage Drag it onto the Parallel Canvas and drop it Select the Peek stage Drag it onto the Parallel Canvas and drop it

Right-Click on the Row Generator stage and drag a Link onto Peek
Does your flow look like the one above?
41

Importing a Schema

1. Select to import schema

2. Enter appropriate path and file name (Instructor will provide details) 3. You can also use the File Browser

Importing a Schema

Make sure you put it into the right categoryShould reflect your userid

Click on Next/Import/Finish to import.

Importing a Schema- End Goal

Did everything go smoothly?

After clicking on Finish, select the imported schema

This lets you select which columns you want to bring in. We want all of the columns! Click OK.

Column Properties

Row Generator specific options: Here you can select specific properties for the data you are going to generate.

Double-Click here to access additional options Click on Next> to step through column properties Click on Close when done
45

Final Touches

Your job should look like this However, the eye should not wink. Notice the new icon on the link, indicating presence of metadata

Next: Click on the Compile icon Save the job (Lab2) under your own Category Did it compile successfully?

Ready to Run

Action: Click to Run Click for Log

Running Job

After you click Select Run

Click for Log

Clearing the Job Log

Tips: Clear away unnecessary Job Logs Use <Copy> button, paste as text into any editor of your choice
49

Objectives
Learn how to:
Modify the simple datagenerating program to sort the data and save it.
Create a copy of a Job Edit an existing Job

Create a Dataset
Handle Errors View a Dataset

New stage used:

Sort
50

Create a Copy of a Job

If necessary, open the Job created in Lab 2. 1. Access stage properties for Peek stage 2. Select Input tab 3. Override default Partition type: from (Auto) to Hash
Click Here to specify Sort Insertion Next: Click OK What Happens?

Insert a Sort
Lets sort by birth_date Select the birth_date column from the Available list Once selected, you should see birth_date listed under Selected

Food for Thought: Why Hash partitioning type?

Sort Insertion A
Are your results sorted on birth_date?

Z
Note the new icon appears on the link, denoting the presence of a sort. Select Save As from the File menu And Save Job (Lab3).

Choose one of these to compile and run your job.

Lets Stage the Data

Well now save the output of the sort for later use. Now attach a Dataset stage to the program by

Placing a Dataset stage on the Canvas Right-clicking on Peek and drawing a Link over to the Dataset stage

Your Job should now look like this:

Viewing a Dataset
Right-click on the Dataset stage and select View DSLinkX data (Note: Link names may vary) Click OK to bring up Data Browser:

Objectives
Use the Lookup stage to replace state codes with state names

Learn how to:

use the Lookup operator start thinking about partitioning

New operators used:

Lookup

Entire partitioner

Remember the Records in Lab 2?

They look like this:
John Parker M 1979-04-24 MA 0 1 0 0 Susan Calvin F 1967-12-24 IL 0 1 1 1 William Mandella M 1962-04-07 CA 0 1 2 2 Ann Claybourne F 1960-10-29 FL 0 1 3 3 Frank Chalmers M 1969-12-10 NY 0 1 4 4 Jane Studdock F 1962-02-24 TX 0 1 5 5

One of the fields is a two-character state code. Lets expand it out into a full state name.

The State Table

We have a table that maps state codes to state names:
Alabama Alaska American Samoa Arizona Arkansas California Colorado Connecticut Delaware District of Columbia [] AL AK AS AZ AR CA CO CT DE DC

Unix text file with tab after state full name. We imported this file in Lab5a.

Well use that table to tack on the expanded state name to the rows generated in Lab 2.

What Were Going To Build...

Uses states.txt file as the lookup table Has TAB delimiter between state_name & state columns Use state column as lookup key Note that source data has column called state while lookup table has state_code

Use same schema as Lab 2 Generate 100 rows

Reminder: Dont forget to perform column mapping (see next slide)

Lookup Mapping

What You Should See...

Sample Output:

Make sure state names match state code

Peek,0: John Parker M 1979-04-24 0087228.46 MA 0 1 0 0 Massachusetts 0004881.94 NY 0 1 4 4 New York

Peek,0: Frank Chalmers M 1969-12-10

Peek,0: John Boone M 1964-04-16

0042729.03 CO 0 1 8 8 Colorado
0082552.55 OH 0 1 12 12 Ohio

Peek,0: Frank Sinatra M 1984-06-12 Peek,0: John Calvin M 1961-11-30

0025966.39 FL 0 1 16 16 Florida 0022976.45 KY 0 1 20 20 Kentucky 0005305.48 MI 0 1 24 24 Michigan 0098979.80 CA 0 1 28 28 California 0023340.92 NJ 0 1 32 32 New Jersey

Peek,0: Frank Studdock M 1962-10-29 Peek,0: John Sarandon M 1964-06-03 Peek,0: Frank Austin M 1971-01-21 Peek,0: John Mandella M 1981-06-16 Peek,0: Frank Glass M 1983-04-15

0068974.57 SD 0 1 36 36 South Dakota

Objectives
Learn how to: Use the Join stage to Use an InnerJoin find out which products New stages used: the customer purchased
Join
Remdup Hash partitioner

Background - What We Have Customers of ACME Hardware place orders for products.
We have two simple tables to model this
customer order
1 1 1001 1116 1147 1032 1161 1106 1132 1195 1007 1021 1072 1139

order
1000 1000

product quantity
screws nuts bolts nails nuts screws washers bolts nails nuts screws washers 137 200 145 159 197 253 330 527 370 162 135 351

customer_order table: tells us which orders were placed by each customer

1 2 2 3 3 3 4 4 4 4

order_product table: tells us how many of each product are in an order

1001 1001 1001 1001 1001 1002 1002 1002 1002 1002

1153

1003
1005

nails
bolts

359
443

Note the data types involved.

Use Integer and Varchar types where appropriate when defining table definitions.
63

Background - What We Want

Q: Which products have been ordered by each customer? A: Customer 1 has ordered washers, bolts, screws and

Go ahead and assemble this flow, but do so in a more optimized manner see next slide (save as Lab8a and a copy for Lab 9). Use cust_order.txt & order_prod_plus.txt as input files. See previous slide for file layouts whitespace delimited fields Note: column ordering matters! Make sure you get the column data types correct also.
64

Job Optimizations

These two jobs are equivalent!

Notice the different partitioner/collector icons. Can you visually determine how the data is being handled?

We want to join the tables using the order number as the join key. This means the tables need to be hashed and sorted on the order number. The resulting table will have one record for each row of the order_product table, with the customer field added. Make sure you use 'order_prod_plus.txt as your input If we then sort these records on customer and product and remove duplicated customer/product combinations, we have our answer 234 records should be written out

Using Lookup and Merge

This is what your flows should look like,

Using Lookup:

Note the "PhantomOrders" links leading to "Customerless" files

This is what your flows should look like,

Using Merge:

Order Matters!
Remember:
Lookup captures unmatched Source rows, on the Primary link Merge captures unmatched Update rows, on the Secondary link(s)

Tip:
Always check the Link Ordering Tab in the Stage page

New Results from Lookup and Merge

Outputs: Lookup and Merge should yield outputs with 234 rows, just as InnerJoin Rejects:

Lookup and Merge populate the "Customerless" file with following two rows: "1000",gaskets",28" "1000",widgets",14" You caught ACME Hardware red-handed: they tried to boost their stock by reporting a phantom order of 28 gaskets and 14 widgets!

Objectives
Use the Aggregator stage to see how many of each product does each customer have on order.

New stages used:

Aggregator

The Aggregator stage is a processing stage. It classifies data rows from a single input link into groups and computes totals or other aggregate functions for each group.

Our InnerJoin Job Was A Bit Incomplete...

We did almost enough work in the InnerJoin lab (Lab 8a) to find out how many of each product each customer has on order. Now that we know about the Aggregator stage, we can finish the job. Go back to the version of Lab 8a. Remove the implicit Remdup (sort unique) Insert an Aggregator

Aggregator Options

Method: sort

we could have a lot of customer/product groups

Grouping keys: customer and product Set Column for Calculation=quantity Function to apply: Sum Output Column

Name of result column: quantity

What You Should Have...

Your Job should look like this. Compile and Run your Job.

Modify Stage

The Modify stage is a processing stage. It can have a single input link and a single output link. The modify stage alters the record schema of its input data set. The modified data set is then output. You can drop or keep columns from the schema, or change the type of a column.

Dropping and Keeping Columns The following example takes a data set comprising the following columns:

Modify Stage 28-3

The modify stage is used to drop the REPID, CREDITLIMIT, and COMMENTS columns. To do this, the stage properties are set as follows:

You could achieve the same effect by specifying which columns to keep, rather than which ones to drop. In the case of this example the required specification to use in the stage properties would be:
KEEP CUSTID, NAME, ADDRESS, CITY, STATE, ZIP, AREA, PHONE Changing Data Type You could also change the data types of one or more of the columns from the above example. Say you wanted to convert the CUSTID from decimal to string, you would specify a new column to take the converted data, andspecify the conversion in the stage properties: conv_CUSTID:string = string_from_decimal(CUSTID)

Copy Stage
The Copy stage is a processing stage. It can have a single input link and any number of output links. The Copy stage copies a single input data set to a number of output data sets. Each record of the input data set is copied to every output data set. Records can be copied without modification or you can drop or change the order of columns

The Copy stage properties are fairly simple. The only property is Force, and we do not need to set it in this instance as we are copying to multiple data sets (and DataStage will not attempt to optimize it out of the job). We need to concentrate on telling DataStage which columns to drop on each output link. The easiest way to do this is using the Outputs page Mapping tab. When you open this for a link the left pane shows the input columns, simply drag the columns you want to preserve across to the right pane. We repeat this for each link as follows:

Funnel Stage
The Funnel stage is a processing stage. It copies multiple input data sets to a single output data set. This operation is useful for combining separate data sets into a single large data set. The stage can have any number of input links and a single output link.

The continuous funnel method is selected on the Stage page Properties tab of the Funnel stage:

The continuous funnel method does not attempt to impose any order on the data it is processing. It simply writes rows as they become available on the input links. In our example the stage has written a row from each input link in turn. A sample of the final, funneled, data is as follows:

Filter Stage
The Filter stage is a processing stage. It can have a single input link and a any number of output links and, optionally, a single reject link. The Filter stage transfers, unmodified, the records of the input data set which satisfy the specified requirements and filters out all other records. You can specify different requirements to route rows down different output links. The filtered out records can be routed to a reject link, if required.

Specifying the Filter The operation of the filter stage is governed by the expressions you set in the Where property on the Properties Tab. You can use the following elements to specify the expressions: Input columns. Requirements involving the contents of the input columns. Optional constants to be used in comparisons. The Boolean operators AND and OR to combine requirements. When a record meets the requirements, it is written unchanged to the specified output link. The Where property supports standard SQL expressions,except when comparing strings.

Datastage - Parameters - Schema Files
No ratings yet
Datastage - Parameters - Schema Files
23 pages
Datastage Performance Guide PDF
No ratings yet
Datastage Performance Guide PDF
108 pages
DataStage Corso Slides
No ratings yet
DataStage Corso Slides
259 pages
Datastage Parallel Job Advanced Developers Guide
100% (2)
Datastage Parallel Job Advanced Developers Guide
314 pages
Datastage: Datastage Interview Questions/Answers
No ratings yet
Datastage: Datastage Interview Questions/Answers
28 pages
SnapLogic Second Edition
From Everand
SnapLogic Second Edition
Gerardus Blokdyk
No ratings yet
Oracle Essbase 11 Development Cookbook
From Everand
Oracle Essbase 11 Development Cookbook
Jose R. Ruiz
No ratings yet
TIBCO Software The Ultimate Step-By-Step Guide
From Everand
TIBCO Software The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
Unofficial TIBCO® Business Works™ Interview Questions, Answers, and Explanations: TIBCO Certification Review Questions
From Everand
Unofficial TIBCO® Business Works™ Interview Questions, Answers, and Explanations: TIBCO Certification Review Questions
equitypress
3.5/5 (2)
Free Web Design Quote Template PDF
No ratings yet
Free Web Design Quote Template PDF
5 pages
Datastage Enterprise Edition
No ratings yet
Datastage Enterprise Edition
374 pages
Sandy's DataStage Notes
No ratings yet
Sandy's DataStage Notes
23 pages
Introduction To Datastage: Ibm Infosphere Datastage V11.5
No ratings yet
Introduction To Datastage: Ibm Infosphere Datastage V11.5
23 pages
Training Course Datastage (Part 1) : V. Beyet 03/07/2006
100% (1)
Training Course Datastage (Part 1) : V. Beyet 03/07/2006
122 pages
CG Datastage
No ratings yet
CG Datastage
122 pages
What Is Difference Between Server Jobs and Parallel Jobs? Ans:-Server Jobs
No ratings yet
What Is Difference Between Server Jobs and Parallel Jobs? Ans:-Server Jobs
71 pages
Unix Ds Commands
No ratings yet
Unix Ds Commands
7 pages
DataStage Best Practises 1
No ratings yet
DataStage Best Practises 1
41 pages
Datastage Functions and Routines
100% (1)
Datastage Functions and Routines
9 pages
Ibm Infosphere Datastage Performance Tuning: Menu
No ratings yet
Ibm Infosphere Datastage Performance Tuning: Menu
9 pages
DataStage Parallel Routines
No ratings yet
DataStage Parallel Routines
5 pages
Datastage Guide
No ratings yet
Datastage Guide
233 pages
Recovered DataStage Tip
100% (1)
Recovered DataStage Tip
115 pages
DataStage FAQ
100% (1)
DataStage FAQ
243 pages
APT Config
No ratings yet
APT Config
9 pages
DataStage Theory Part
No ratings yet
DataStage Theory Part
18 pages
Data Stage Parallel Job Tutorial
No ratings yet
Data Stage Parallel Job Tutorial
76 pages
DataStage Interview Question
No ratings yet
DataStage Interview Question
9 pages
E-DS Administrator, Designer, Director - Other Functions
No ratings yet
E-DS Administrator, Designer, Director - Other Functions
20 pages
DataStage Adv Bootcamp All Presentations
100% (1)
DataStage Adv Bootcamp All Presentations
316 pages
DataStage Configuration File
No ratings yet
DataStage Configuration File
7 pages
Usefull Stuff Datastage
100% (1)
Usefull Stuff Datastage
23 pages
Job Sequencer
No ratings yet
Job Sequencer
19 pages
Issues Datastage
No ratings yet
Issues Datastage
4 pages
Data Stage Interview Questions
No ratings yet
Data Stage Interview Questions
15 pages
Parallel Job Developer's 2017
No ratings yet
Parallel Job Developer's 2017
1,070 pages
Info Sphere DataStage Parallel Framework Standard Practices
No ratings yet
Info Sphere DataStage Parallel Framework Standard Practices
460 pages
Imp Datastage New
No ratings yet
Imp Datastage New
153 pages
Datastage Interview Questions
No ratings yet
Datastage Interview Questions
10 pages
Data Stage
No ratings yet
Data Stage
3 pages
DataStage Administration
No ratings yet
DataStage Administration
98 pages
Oracle Data Guard A Clear and Concise Reference
From Everand
Oracle Data Guard A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
IBM InfoSphere Replication Server and Data Event Publisher
From Everand
IBM InfoSphere Replication Server and Data Event Publisher
Pav Kumar-Chatterjee
No ratings yet
ORACLE 12C Complete Self-Assessment Guide
From Everand
ORACLE 12C Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Oracle Database Mastery: Comprehensive Techniques for Advanced Application
From Everand
Oracle Database Mastery: Comprehensive Techniques for Advanced Application
Adam Jones
No ratings yet
Oracle Exadata Complete Self-Assessment Guide
From Everand
Oracle Exadata Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Oracle Essbase 9 Implementation Guide
From Everand
Oracle Essbase 9 Implementation Guide
Joseph Sydney Gomez
No ratings yet
Ultimate Salesforce Data Cloud for Customer Experience: Explore, Implement and Elevate B2C Experiences Through Customer Data Innovations Using Salesforce Data Cloud
From Everand
Ultimate Salesforce Data Cloud for Customer Experience: Explore, Implement and Elevate B2C Experiences Through Customer Data Innovations Using Salesforce Data Cloud
Gourab Mukherjee
No ratings yet
Pentaho Data Integration Cookbook - Second Edition
From Everand
Pentaho Data Integration Cookbook - Second Edition
Adrián Sergio Pulvirenti
No ratings yet
Beginning Microsoft SQL Server 2012 Programming
From Everand
Beginning Microsoft SQL Server 2012 Programming
Paul Atkinson
1/5 (1)
Instant Pentaho Data Integration Kitchen
From Everand
Instant Pentaho Data Integration Kitchen
Sergio Ramazzina
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Microsoft SQL Server 2008 R2 Administration Cookbook
From Everand
Microsoft SQL Server 2008 R2 Administration Cookbook
Satya Shyam K Jayanty
5/5 (1)
Datastage Enterprise Edition: 3/17/2014 Shakthidhar Bommireddy 1
No ratings yet
Datastage Enterprise Edition: 3/17/2014 Shakthidhar Bommireddy 1
88 pages
Data Stage Basic Concepts
No ratings yet
Data Stage Basic Concepts
6 pages
Data Stage
100% (1)
Data Stage
299 pages
Introduction To ETL and DataStage
No ratings yet
Introduction To ETL and DataStage
48 pages
Datastage Interview
100% (1)
Datastage Interview
161 pages
04 Metadata and Metadata Management
No ratings yet
04 Metadata and Metadata Management
23 pages
Datastage Enterprise Edition
No ratings yet
Datastage Enterprise Edition
372 pages
A-Introduction To ETL and DataStage
No ratings yet
A-Introduction To ETL and DataStage
48 pages
DataStage PPT
No ratings yet
DataStage PPT
94 pages
Project Report ON Contact Management System: in Partial Fulfillment of The Requirement For The Award of
No ratings yet
Project Report ON Contact Management System: in Partial Fulfillment of The Requirement For The Award of
55 pages
Direct To Home
No ratings yet
Direct To Home
27 pages
Acknowledgement
No ratings yet
Acknowledgement
5 pages
Organic Light Emitting Diodes
No ratings yet
Organic Light Emitting Diodes
20 pages
Rank Display System
No ratings yet
Rank Display System
45 pages
Chapter 3
No ratings yet
Chapter 3
41 pages
Renegade Training Edits PDF
100% (1)
Renegade Training Edits PDF
198 pages
Genetics Ferns Lab Report
No ratings yet
Genetics Ferns Lab Report
15 pages
Clara Claris Praeclara
No ratings yet
Clara Claris Praeclara
10 pages
Weekly Training Plan and Accomplishment Report
No ratings yet
Weekly Training Plan and Accomplishment Report
8 pages
Form Equivalence, Quantity Fairness and Development: Inequality and Difference
No ratings yet
Form Equivalence, Quantity Fairness and Development: Inequality and Difference
6 pages
US Diagnosis of Acute Appendicitis: Pendahuluan
No ratings yet
US Diagnosis of Acute Appendicitis: Pendahuluan
6 pages
Product Management Ebook
No ratings yet
Product Management Ebook
14 pages
What You Are Losing Being A Digital Zombie
No ratings yet
What You Are Losing Being A Digital Zombie
2 pages
The Bluest of Blue - SinceWhenDoYouCallMe - John
No ratings yet
The Bluest of Blue - SinceWhenDoYouCallMe - John
350 pages
001 - ADL Books 中英版 4本
No ratings yet
001 - ADL Books 中英版 4本
41 pages
Champion
0% (1)
Champion
12 pages
31
No ratings yet
31
44 pages
Virgin of The Poor
No ratings yet
Virgin of The Poor
5 pages
A Facebook Page To Promote Visayan Cuisine
No ratings yet
A Facebook Page To Promote Visayan Cuisine
45 pages
Paraphrasing
No ratings yet
Paraphrasing
20 pages
Great Minds On INDIA
57% (7)
Great Minds On INDIA
43 pages
Member 17
No ratings yet
Member 17
21 pages
9 Enneagram Types Overview PDF
100% (1)
9 Enneagram Types Overview PDF
2 pages
Nicolas Léonard Sadi Carnot - Wikipedia
No ratings yet
Nicolas Léonard Sadi Carnot - Wikipedia
7 pages
Traditional Dresses
No ratings yet
Traditional Dresses
11 pages
Philosophy of Yoga MCQS
100% (1)
Philosophy of Yoga MCQS
10 pages
Anaesthesia
No ratings yet
Anaesthesia
7 pages
Creating Informational Texts Lesson Plan: o Language
No ratings yet
Creating Informational Texts Lesson Plan: o Language
3 pages
Covach, Form in Rock PDF
No ratings yet
Covach, Form in Rock PDF
12 pages
Letter To Media
No ratings yet
Letter To Media
1 page
Balanced Reading Program 2018 - 2019
No ratings yet
Balanced Reading Program 2018 - 2019
8 pages
Sei Shonagon - The Pillow Book
No ratings yet
Sei Shonagon - The Pillow Book
1 page
Notes
No ratings yet
Notes
11 pages

DataStage PPT

Uploaded by

DataStage PPT

Uploaded by

ETL Basics

Extraction Transformation & Load

ETL Platform Options

Usual features provided by ETL tools:

Architecture of a Typical ETL Tool

Source & Target Database

Source & Target Database

ETL Metadata Repository

ETL Engine Data

GUI-Based Development Environment

Optional additional functions

Server Component DataStage Server Repository DataStage Package Installer

Execute Jobs Monitor Jobs, view job logs

Assemble Jobs Debug Compile Jobs Execute Jobs

DataStage Server Components

DataStage Client Components

Job Category Pan

Learn how to:

Creating a New Job

Create the following flow:

1. Select to import schema

Click on Next/Import/Finish to import.

Importing a Schema- End Goal

Did everything go smoothly?

Action: Click to Run Click for Log

After you click Select Run

Click for Log

Clearing the Job Log

New stage used:

Create a Copy of a Job

Food for Thought: Why Hash partitioning type?

Choose one of these to compile and run your job.

Lets Stage the Data

Your Job should now look like this:

Learn how to:

New operators used:

Remember the Records in Lab 2?

The State Table

What Were Going To Build...

Use same schema as Lab 2 Generate 100 rows

Reminder: Dont forget to perform column mapping (see next slide)

What You Should See...

Make sure state names match state code

Peek,0: Frank Chalmers M 1969-12-10

Peek,0: John Boone M 1964-04-16

Peek,0: Frank Sinatra M 1984-06-12 Peek,0: John Calvin M 1961-11-30

0068974.57 SD 0 1 36 36 South Dakota

customer_order table: tells us which orders were placed by each customer

order_product table: tells us how many of each product are in an order

Note the data types involved.

Background - What We Want

These two jobs are equivalent!

Using Lookup and Merge

This is what your flows should look like,

Note the "PhantomOrders" links leading to "Customerless" files

This is what your flows should look like,

New Results from Lookup and Merge

New stages used:

Our InnerJoin Job Was A Bit Incomplete...

we could have a lot of customer/product groups

Name of result column: quantity

What You Should Have...

Modify Stage 28-3

You might also like