0% found this document useful (0 votes)
351 views

DataStage Interview Question

Datastage and Informatica are both powerful ETL tools. Some key differences are: - Datastage offers more control over partitioning with 7 types compared to Informatica's default partitioning. - Datastage uses a project-based approach while Informatica uses a repository-based approach with separate GUIs for development and monitoring. - Datastage requires compiling jobs to run while Informatica's mappings auto-generate code. - Datastage supports transformations through custom scripts while Informatica has 30 general transformations and dynamic caching lookups.

Uploaded by

anamik2100
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
351 views

DataStage Interview Question

Datastage and Informatica are both powerful ETL tools. Some key differences are: - Datastage offers more control over partitioning with 7 types compared to Informatica's default partitioning. - Datastage uses a project-based approach while Informatica uses a repository-based approach with separate GUIs for development and monitoring. - Datastage requires compiling jobs to run while Informatica's mappings auto-generate code. - Datastage supports transformations through custom scripts while Informatica has 30 general transformations and dynamic caching lookups.

Uploaded by

anamik2100
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

DATASTAGE

Difference between Informatica and Datastage

Both Datastage and Informatica are powerful ETL tools . Both tools do almost exactly the same thing in almost exactly the same
way. Performance, maintainability, learning curve are all similar and comparable. Below are the few things which I would like
highlight regarding both these tools.

Multiple Partitions

Informatica offers partitioning as dynamic partitioning which defaults a workflow not at every Stage/Object level in a mapping/job.
Informatica offers other partitioning choices as well at the workflow level.

DataStage's pipeline partitioning uses multiple partitions, processed and then re-collected with DataStage. DataStage lets control a job
design based on the logic of the processing instead of defaulting the whole pipeline flow to one partition type. DataStage offers 7
different types of multi-processing partitions.

User Interface
Informatica offers access to the development and monitoring effort through its 4 GUIs - offered as Informatica
PowerDesigner, Repository Manager, Worflow Designer, Workflow Manager.
DataStage caters to development and monitoring its jobs through 3 GUIs - IBM DataStage Designer(for development), Job Sequence
Designer(workflow design) and Director(for monitoring).

Version Control
Informatica offers instant version control through its repository server managed with “Repository Manager” GUI console. A mapping
with work-in-progress cannot be opened until saved and checked back into the repository. Version control is done by using checkin
and check out.

Version Control was offered as a component until version Ascential DataStage7.5.x. Ascential was acquired by IBM and
when DataStage was integrated into IBM Information Server with DataStage at version 8.0.1, the support of version control as a
component was discontinued.

Repository based flow


Informatica, offers a step-by-step effort of creating a data integration solution. Each object created while mapping a source with a
target gets saved into the repository project folder categorized by - Sources, Targets, Transformations, Mappings, Mapplets, User-
defined functions, Business Components, Cubes and Dimensions. Each object created can be shared, dropped into a mapping across
cross-functional development teams. Thus increasing re-usability. Projects are folder based and inter-viewable.
DataStage offers a project based integration solution, projects are not interviewable. Every project needs a role based access. The
step-by-step effort in mapping a source to a target lineages into a job. For sharing objects within a job, separate objects need to be
created called containers that are local/shared.

Data Encryption
Informatica has an offering within PowerCenter Designer as a separate transformation called “Data Masking Transformation”.

Data Masking or encryption needs to be done before reaching DataStage Server.

Concept of DataStage Page 1


DATASTAGE

Variety of Transformations
Informatica offers about 30 general transformations for processing incoming data.

Datastage offers about 40 data transforming stages/objects. Datastage is more powerful transformation engine by using functions
(Oconv and IConv) and routines. We can do almost any transformation.

Source_- Target flow


Within Informatica’s PowerCenter Designer, first a source definition needs to be created using “Source Analyzer” that imports the
metadata, then a target definition is created using “Target Designer”, then a transformation using “Transformation Developer” is
created, and finally maps a source-transformation-target using “Mapping Designer”.

Datastage lets drag and drop a functionality i.e a stage within in one canvas area for a pipeline source-target job. With DataStage
within the “DataStage Designer” import of both source and target metadata is needed, proceeding with variety of stages offered as
database stages, transformation stages, etc.

The biggest difference between both the vendor offerings in this area is Informatica forces you to be organized through a step-by-step
design process, while DataStage leaves the organization as a choice and gives you flexibility in dragging and dropping objects based
on the logic flow.

Checking Dependencies
Informatica offers a separate edition – Advanced edition that helps with data lineage and impact analysis. We can go to separate
targets and source and check all the dependencies on that.
DataStage offers through Designer by right clicking on a job to perform dependencies or impact analysis.

Components Used

The Informatica ETL transformations are very specific purpose, so you tend to need more boxes on the page to do the same thing. eg.
A simple transform in Informatica would have a Source Table, Source Qualifier, Lookup, Router, 2 Update Strategies, and 2 Target
Tables (9 boxes).

In DataStage, you would have a Table and Hashed File for the lookup, plus a Source Relational Stage, Transformation Stage, and 2
links to a target Relational Stage (5 boxes). This visual clutter in Informatica is a bit annoying.

Type of link

To link two components in Informatica, you have to link at the column level.We have to connect each and every column bw the two
componenents

In DataStage, you link at the component level, and then map individual columns. This allows you to have coding templates that are all
linked up - just add columns. I find this a big advantage in DS.

Reusability
Informatica offers ease of re-usability through Mapplets and Worklets for re-using mappings and workflows.This really improves the
performance

Concept of DataStage Page 2


DATASTAGE

DataStage offers re-usability of a job through containers(local&shared). To re-use a Job Sequence(workflow), you will need to make
a copy, compile and run.

Code Generation and Compilation


Informatica’s thrust is the auto-generated code. A mapping gets created by dropping a source-transformation-target that doesn’t need
to be compiled.
DataStage requires to compile a job in order to run it successfully.

Heterogeneous Sources
In Informatica we can use both heterogenous source and homogenous source.
Datastage does not perform very well with heterogeneous sources. You might end up extracting data from all the sources and putting
them into a hash and start your transformation

Slowly Changing Dimension

Informatica supports Full History, Recent Values, Current & Previous Values using SCD wizards.

DataStage supports only through Custom scripts and does not have a wizard to do this

Dynamic Lookup Cache

Informatica's marvellous Dynamic Cache Lookup has no equivalent in DS Server Edition. The same saves some effort and is very
easily maintainable.

http://shortcut-tricks.blogspot.com/2016/04/difference-between-informatica-and.html

Datastage Scenario Based Questions and Answers for Freshers and Experienced

1. Create a job to load the first 3 records from a flat file into a target table?

2. Create a job to load the last 3 records from a flat file into a target table?

3. Create a job to load the first record from a flat file into one table A, the last record from a flat file into table B and the remaining
records into table C?

4. Consider the following products data which contain duplicate records.

A
B
C
C
B
D
B

Concept of DataStage Page 3


DATASTAGE

Answer the below questions

Q1. Create a job to load all unique products in one table and the duplicate rows in to another table.

The first table should contain the following output

A
D

The second target should contain the following output

B
B
B
C
C

Q2. Create a job to load each product once into one table and the remaining products which are duplicated into another table.

The first table should contain the following output

A
B
C
D

The second table should contain the following output

B
B
C

1. Consider the following employees data as source?

employee_id, salary
-------------------
10, 1000
20, 2000
30, 3000
40, 5000

Q1. Create a job to load the cumulative sum of salaries of employees into target table?
The target table data should look like as

employee_id, salary, cumulative_sum


-----------------------------------
10, 1000, 1000
20, 2000, 3000
30, 3000, 6000
40, 5000, 11000

Q2. Create a job to get the pervious row salary for the current row. If there is no pervious row exists for the current row, then the
pervious row salary should be displayed as null.

The output should look like as

employee_id, salary, pre_row_salary


-----------------------------------
10, 1000, Null
Concept of DataStage Page 4
DATASTAGE

20, 2000, 1000


30, 3000, 2000
40, 5000, 3000

Q3. Create a job to get the next row salary for the current row. If there is no next row for the current row, then the next row salary
should be displayed as null.

The output should look like as

employee_id, salary, next_row_salary


------------------------------------
10, 1000, 2000
20, 2000, 3000
30, 3000, 5000
40, 5000, Null

Q4. Create a job to find the sum of salaries of all employees and this sum should repeat for all the rows.

The output should look like as

employee_id, salary, salary_sum


-------------------------------
10, 1000, 11000
20, 2000, 11000
30, 3000, 11000
40, 5000, 11000

2. Consider the following employees table as source

department_no, employee_name
----------------------------
20, R
10, A
10, D
20, P
10, B
10, C
20, Q
20, S

Q1. Create a job to load a target table with the following values from the above source?

department_no, employee_list
--------------------------------
10, A
10, A,B
10, A,B,C
10, A,B,C,D
20, A,B,C,D,P
20, A,B,C,D,P,Q
20, A,B,C,D,P,Q,R
20, A,B,C,D,P,Q,R,S

Q2. Create a job to load a target table with the following values from the above source?

department_no, employee_list
----------------------------
10, A
10, A,B
10, A,B,C
10, A,B,C,D

Concept of DataStage Page 5


DATASTAGE

20, P
20, P,Q
20, P,Q,R
20, P,Q,R,S

Q3. Create a job to load a target table with the following values from the above source?

department_no, employee_names
-----------------------------
10, A,B,C,D
20, P,Q,R,S

1. Consider the following product types data as the source.

Product_id, product_type
------------------------
10, video
10, Audio
20, Audio
30, Audio
40, Audio
50, Audio
10, Movie
20, Movie
30, Movie
40, Movie
50, Movie
60, Movie

Assume that there are only 3 product types are available in the source. The source contains 12 records and you dont know how many
products are available in each product type.

Q1. Create a job to select 9 products in such a way that 3 products should be selected from video, 3 products should be selected from
Audio and the remaining 3 products should be selected from Movie.

Q2. In the above problem Q1, if the number of products in a particular product type are less than 3, then you wont get the total 9
records in the target table. For example, see the videos type in the source data. Now design a mapping in such way that even if the
number of products in a particular product type are less than 3, then you have to get those less number of records from another product
types. For example: If the number of products in videos are 1, then the reamaining 2 records should come from audios or movies. So,
the total number of records in the target table should always be 9.

2. Create a job to convert column data into row data.


The source data looks like

col1, col2, col3


----------------
a, b, c
d, e, f

The target table data should look like

Col
---
a
b
c
d
e
f

Concept of DataStage Page 6


DATASTAGE

3. Create a job to convert row data into column data.

The source data looks like

id, value
---------
10, a
10, b
10, c
20, d
20, e
20, f

The target table data should look like

id, col1, col2, col3


--------------------
10, a, b, c
20, d, e, f

http://shortcut-tricks.blogspot.com/2016/04/datastage-scenario-based-questions-and.html

What is exact difference between Parallel Jobs and server Jobs....

Answer / kiran

Sorry Guys Your Answer is Wrong.

Server jobs don't support partitions and no parllalism and


its run only SMP mechines,.Performance is low

Parallal jobs Support Partions and parallism and its run on


SMP,MPP or Cluster Mechine(If require run on Us/OSS
Mechines also).Performance is high

Is This Answer Correct ? 18 Yes 3 No

What is exact difference between Parallel Jobs and server Jobs....

Answer / madhava

Parallel jobs:
1.parallel jobs run on parallel engine.
2.Supports pipeline and partition parallelism.
3.compiled into OSH

server jobs:
1.run on server engine

Concept of DataStage Page 7


DATASTAGE

2.doesn't support parallelism and partitioning techniques.


3.compiled in to BASIC language

Is This Answer Correct ? 8 Yes 0 No

What is exact difference between Parallel Jobs and server Jobs....

Answer / yarramasu

server jobs have only one single node


parallel jobs have multiple nodes
server jobs can support only smp and doesnt support mpp
parallel jobs support smp and mpp
server jobs doesnt support pipeline partition
parallel jobs can support pipelilne parition
server jobs performence very low
parallel jobs performence very high

https://www.allinterview.com/showanswers/33491/what-is-exact-difference-between-parallel-jobs-and-server-jobs.html

What is exact difference between Parallel Jobs and server Jobs....

Answer / sathya seenu priya.a

SERVER JOBS PARALLEL JOB

RUN ON SINGLE NODE MULTIPLE NODE

PIPELINING &
PARTITIONING DOES NOT SUPPORT SUPPORTS

LOADS WHEN 1 JOB FINISH SYNCHRONIZED

SERVER JOBS USES


SYMMETRICMULTIPROCESSING

PARALLEL JOBS
BOTH MASSIVE PARALLEL PROCESSING AND
SYMMETRICMULTIPROCESSING

Is This Answer Correct ? 38 Yes 1 No

Concept of DataStage Page 8


DATASTAGE

What is exact difference between Parallel Jobs and server Jobs....

Answer / bharath

SERVER JOBS
->Runs on single node
->Executes on DS Server Engine
->Handles less volume of data
->Slow data processing
->Having less no. of components(i.e, palette)
->Compiled into Basic language.

PARALLEL JOBS
->Runs on multiple nodes.
->Executes on DS parallel engine.
->Handles Huge volume of data
->Faster data processing.
->Having more no. of components
->Compiled into OSH(orchestrate shell script) except transformer( C++ and OSH )

Is This Answer Correct ? 21 Yes 0 No

What is exact difference between Parallel Jobs and server Jobs....

Answer / poonam

Parallel jobs are carriedout on miltiple nodes/processors


but in server jobs no multitasking is there to talk in
simple terms. There are many technical differences like no
server sort stage etc but this is the basic difference

https://www.allinterview.com/showanswers/33495/what-is-exact-difference-between-parallel-jobs-and-server-jobs.html

https://www.allinterview.com/company/1000/ibm/interview-questions/177/data-stage.html

Concept of DataStage Page 9

You might also like