0% found this document useful (0 votes)
63 views59 pages

Interview Questions

The document describes the roles and responsibilities of an ETL developer. The developer is responsible for: 1) Understanding data models and functional specifications to map source systems to target systems; 2) Creating Informatica mappings, sessions, and workflows to implement business logic and extract, transform, and load data; 3) Developing test cases and testing mappings to ensure quality and accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views59 pages

Interview Questions

The document describes the roles and responsibilities of an ETL developer. The developer is responsible for: 1) Understanding data models and functional specifications to map source systems to target systems; 2) Creating Informatica mappings, sessions, and workflows to implement business logic and extract, transform, and load data; 3) Developing test cases and testing mappings to ensure quality and accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 59

In my Current project my roles & responsibilities are basically

 I am working with onsite offshore model so we use to get the tasks from my onsite
team.
 As a developer first I need to understand the physical data model i.e. dimensions and
facts; their relationship & also functional specifications that tells the business
requirement designed by Business Analyst.
 I involved into the preparation of source to target mapping sheet (tech Specs) which
tell us what is the source and target and which column we need to map to target
column and also what would be the business logic. This document gives the clear
picture for the development.
 Creating informatica mappings, sessions and workflows using different
transformations to implement business logic.
 Preparation of Unit test cases also one of my responsibilities as per the business
requirement.
 And also involved into Unit testing for the mappings developed by myself.
 I use to source code review for the mappings & workflows developed by my team
members.
 And also involved into the preparation of deployment plan which contains list of
mappings and workflows they need to migrate based on this deployment team can
migrate the code from one environment to another environment.
 Once the code rollout to production we also work with the production support team
for 2 weeks where we parallel give the KT. So we also prepare the KT document as well
for the production team.
Manufacturing or Supply Chain

Coming to My Current Project:

Currently I am working for XXX project for YYY client. Generally YYY does not have a
manufacturing unit, What BIZ (Business) use to do here is before quarter ends they will call
for quotations for primary supply channels this process we called as a RFQ’s (Request for
quotations).Once BIZ creates RFQ automatically notification will go to supply channels .So

Page 1 of 59
these supply channels send back their respective quoted values that we called it as response
from the supply channel. After that biz will start negotiations with supply channels for the
best deal then they approve the RFQ’s.

All these activities (Creating RFQ, supplier response and approve RFQ etc.)Performed in the
oracle apps this is source frontend tool application. These data which get stored into OLTP.
So the OLTP contains all the RFQs, supplier response and approval status data.

We have some Oracle jobs running between OLTP and ODS which replicate the OLTP data to
ODS. It is designed in such a way that any transaction entering into the OLTP is immediately
reflected into the ODS.

We have a staging area where we load the entire ODS data into staging tables for this we
have created some ETL informatica mappings these mappings will truncate and reload the
staging tables for each session run. Before loading to staging tables we are dropping indexes
then after loading bulk data we are recreating indexes using store procedures.

Then we extract all this data from stage & load it into the dimensions & facts on top of
dims and facts we have created some materialized views as per the report requirement.
Finally report directly pulls the data from MV .These reports /dashboards performance
always good because we are not doing any calculation at reporting level. These
dashboards/reports can be used for the analysis purpose like say how many RFQs created,
how many RFQs approved, how many RFQs got responded from the supply channels?

What is the budget?What is budget approved?

Who is the approval manager pending with whom what is the feedback of the supply
channels from the past etc?

In the present system they don’t have the BI design, so they are using the manual process
by exporting the sql query data to excel sheetan preparing PI charts using macros. In the
new system we are providing BI passion reports like drill down, drillups, PIcharts, Graphs,
detail reports and Dashboards.

For Sales Project:


Page 2 of 59
Production Unit

Biz Reports

Coming to My Current Project:

Current system was designed in webmethods (it’s a middleware Deliverytool)


Centers
now they found some issues in
the existing system it’s not supporting BI capabilities like drill down and drill ups and etc. that’s why

Bizz (Business orClient) decided to migrate to Informatica.


Dist1
Dist2 Dist3
As is system was designed in Webmethods and

To be determined system is designed in Informatica for ETL and BO for reporting.

We are replacing the exact same functionality of web methods using Informatica.

Generally Once production completes for any product they will send it to delivery centers from the
DC (Delivery Centers) it will be shipped to Suppliy channels or Distributors from there it will go to
end customers.

Before start production of any product bizz approval is essential for the production Unit.

Before taking the decision Bizz has to do some analysis on the existing stock, previous sales history
and future orders etc..to do these they need the reports in BI passion(Drill down and Drill ups)these
reports created in BO, it would show what is on stock in the each delivery center ,shipping status
,previous sales History and what would be the customer orders for each of the product across all
the delivery centers.Bizz will buy those details from Third party IMS Company.

IMS will collect the information from different distrusters and delivery centers like what is the on
hand stock, shipping stock and how many orders we have in hand for the next quarter and also
what was the previous sales history for specific products.

Page 3 of 59
We have a staging area where we load the entire IMS data into staging tables for this we have
created some ETL informatica mappings these mappings will truncate and reload the staging tables
for each session run. Before loading to staging tables we are dropping indexes then after loading
bulk data we are recreating indexes using store procedures. After completion of sagging load we
load the data in to our dims and facts. On top of our data model we have created some materialized
views where we have complete reporting calculations from Materialized views we will pull the data
to BO reports with less join and less aggregations. So report performance is good.

ORACLE

How strong you are in SQL& PL/SQL?

1) I am good in SQL,I use to write the source qualifier queries for informatica mappings
as per the business requirement. (OR) I use to write sql queries in source qualifier for
informatica mappings as per the business requirement.
2) I am comfortable to work with joins; co related queries, sub queries, analyzing
tables, inline views and materialized views.
3) As a informatica developer I could not get more opportunity to work on pl/sql side.
But I worked on PL/SQL to informatica migration project so I do have exposure on
procedure, function and triggers.

What is the difference between view and materialized view?

View Materialized view


A view has a logical existence. It does not A materialized view has a physical existence.
contain data.
Its not a database object. It is a database object.
We cannot perform DML operation on view. We can perform DML operation on
materialized view.
When we do select * from view it will fetch When we do select * from materialized view
the data from base table. it will fetch the data from materialized view.
In view we cannot schedule to refresh. In materialized view we can schedule to
refresh.
We can keep aggregated data into
materialized view. Materialized view can be
created based on multiple tables.

Page 4 of 59
Materialized View
Materialized view is very essential for reporting. If we don’t have the materialized view it
will directly fetch the data from dimension and facts. This process is very slow since it
involves multiple joins. So the same report logic if we put in the materialized view. We can
fetch the data directly from materialized view for reporting purpose. So that we can avoid
multiple joins at report run time.
It is always necessary to refresh the materialized view. Then it can simply perform select
statement on materialized view.
Difference between Trigger and Procedure

Triggers Stored Procedures


In trigger no need to execute manually. Where as in procedure we need to execute
Triggers will be fired automatically. manually.
Triggers that run implicitly when an INSERT,
UPDATE, or DELETE statement is issued
against the associated table.
Differences between sub-query and co-related sub-query

Sub-query Co-related sub-query


A sub-query is executed once for the parent Where as co-related sub-query is executed
Query once for each row of the parent query.
Example: Example:
Select * from emp where deptno in (select Select a.* from emp e where sal>= (select
deptno from dept); avg(sal) from emp a where
a.deptno=e.deptno group by a.deptno);
Differences between where clause and having clause

Where clause Having clause


Both where and having clause can be used to filter the data.
Where as in where clause it is not But having clause we need to use it with the
mandatory. group by.
Where clause applies to the individual rows. Where as having clause is used to test some
condition on the group rather than on
individual rows.
Where clause is used to restrict rows. But having clause is used to restrict groups.
Restrict normal query by where Restrict group by function by having
In where clause every record is filtered In having clause it is with aggregate records
based on where. (group by functions).
Differences between stored procedure and functions

Stored Procedure Functions


Stored procedure may or may not return Function should return at least one output
values. parameter. Can return more than one
parameter using OUT argument.
Stored procedure can be used to solve the Function can be used to calculations
business logic.

Page 5 of 59
Stored procedure is a pre-compiled But function is not a pre-compiled
statement. statement.
Stored procedure accepts more than one Whereas function does not accept
argument. arguments.
Stored procedures are mainly used to Functions are mainly used to compute values
process the tasks.
Cannot be invoked from SQL statements. E.g. Can be invoked form SQL statements e.g.
SELECT SELECT
Can affect the state of database using Cannot affect the state of database.
commit.
Stored as a pseudo-code in database i.e. Parsed and compiled at runtime.
compiled form.

Differences between rowid and rownum

Rowid Rownum
Rowid is an oracle internal id that is Rownum is a row number returned by a
allocated every time a new record is inserted select statement.
in a table. This ID is unique and cannot be
changed by the user.
Rowid is permanent. Rownum is temporary.
Rowid is a globally unique identifier for a The rownumpseudocoloumn returns a
row in a database. It is created at the time number indicating the order in which oracle
the row is inserted into the table, and selects the row from a table or set of joined
destroyed when it is removed from a table. rows.

What is the difference between joiner and lookup

Joiner Lookup
In joiner on multiple matches it will In lookup it will return either first
return all matching records. record or last record or any value or
error value.
In joiner we cannot configure to use Where as in lookup we can configure
persistence cache, shared cache, to use persistence cache, shared
uncached and dynamic cache cache, uncached and dynamic cache.
We cannot override the query in We can override the query in lookup
joiner to fetch the data from multiple tables.
We can’t perform any filters along We can apply filters along with lkp

Page 6 of 59
with join condition in joiner conditions using lkp query override
transformation. lookup transformation.
We cannot use relational operators in Where as in lookup we can use the
joiner transformation.(i.e. <,>,<= and relation operators. (i.e. <,>,<= and so
so on) on)
What is the difference between source qualifier and lookup?

Source Qualifier Lookup


In source qualifier it will push all the Where as in lookup we can restrict
matching records. whether to display first value, last
value or any value
In source qualifier there is no concept Where as in lookup we concentrate on
of cache. cache concept.
When both source and lookup are in When the source and lookup table
same database we can use source exists in different database then we
qualifier. need to use lookup.

What is the difference between source qualifier and Joiner

Source Qualifier Joiner


We use source qualifier to join the We use joiner to join the tables if
tables if tables are in the same tables are in the different database
database
In source qualifier we can use any Where as in joiner we can’t use other
type of join between two tables. than 4 types of joins.
We can join N number of sources in a Where as in joiner we can join only 2
single source qualifier using sq sources using 1 joiner, to join N

Page 7 of 59
override sources we need N-1 joiners.

Difference between Stop and Abort?

Stoped:

You choose to stop the workflow or task in the Workflow Monitor or through
pmcmd. The Integration Service stops processing the task and all other tasks in
its path. The Integration Service continues running concurrent tasks like
backend store procedures.

Abort:

You choose to abort the workflow or task in the Workflow Monitor or through
pmcmd. The Integration Service kills the DTM process and aborts the task.

How to find out duplicate records in table?

Select empno, count (*) from EMP group by empno having count (*)>1;

How to delete duplicate records in a table?

Delete from EMP where rowid not in (select max (rowid) from EMP group by empno);

What is your tuning approach if SQL query taking long time? Or how do u tune SQL query?

If query taking long time then First will run the query in Explain Plan, The explain plan
process stores data in the PLAN_TABLE.

it will give us execution plan of the query like whether the query is using the relevant
indexes on the joining columns or indexes to support the query are missing.

If joining columns doesn’t have index then it will do the full table scan if it is full table scan
the cost will be more then will create the indexes on the joining columns and will run the

Page 8 of 59
query it should give better performance and also needs to analyze the tables if analyzation
happened long back.The ANALYZE statement can be used to gather statistics for a specific
table, index or cluster using

ANALYZE TABLE employees COMPUTE STATISTICS;

If still have performance issue then will use HINTS, hint is nothing but a clue. We can use
hints like

 ALL_ROWS
one of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing systems.

(/*+ ALL_ROWS */)

 FIRST_ROWS
one of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.

(/*+ FIRST_ROWS */)

 CHOOSE
one of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS, based on
statistics gathered.
 HASH
Hashes one table (full scan) and creates a hash index for that table. Then hashes
other table and uses hash index to find corresponding records. Therefore not
suitable for < or > join conditions.

/*+ use_hash */

Hints are most useful to optimize the query performance.

DWH Concepts

Difference between OLTP and DWH/DS/OLAP

OLTP DWH/DSS/OLAP
OLTP maintains only current information. OLAP contains full history.
It is a normalized structure. It is a de-normalized structure.
Its volatile system. Its non-volatile system.

Page 9 of 59
It cannot be used for reporting purpose. It’s a pure reporting system.
Since it is normalized structure so here it Here it does not require much joins to fetch
requires multiple joins to fetch the data. the data.
It’s not time variant. Its time variant.
It’s a pure relational model. It’s a dimensional model.

What is Staging area why we need it in DWH?

If target and source databases are different and target table volume is high it contains some
millions of records in this scenario without staging table we need to design your informatica
using look up to find out whether the record exists or not in the target table since target has
huge volumes so its costly to create cache it will hit the performance.

If we create staging tables in the target database we can simply do outer join in the source
qualifier to determine insert/update this approach will give you good performance.
It will avoid full table scan to determine insert/updates on target.

And also we can create index on staging tables since these tables were designed for
specific application it will not impact to any other schemas/users.

While processing flat files to data warehousing we can perform cleansing

Data cleansing, also known as data scrubbing, is the process of ensuring that a set of data
is correct and accurate. During data cleansing, records are checked for accuracy and
consistency.

 Since it is one-to-one mapping from ODS to staging we do truncate and reload.


 We can create indexes in the staging state, to perform our source qualifier best.
 If we have the staging area no need to relay on the informatics transformation to
known whether the record exists or not.

ODS:

My understanding of ODS is, it’s a replica of OLTP system and so the need of this, is to
reduce the burden on production system (OLTP) while fetching data for loading targets.
Hence its a mandate Requirement for every Warehouse.

So every day do we transfer data to ODS from OLTP to keep it up to date?


OLTP is a sensitive database they should not allow multiple select statements it may impact
the performance as well as if something goes wrong while fetching data from OLTP to data
warehouse it will directly impact the business.

Page 10 of 59
ODS is the replication of OLTP.
ODS is usually getting refreshed through some oracle jobs.

What is the difference between a primary key and a surrogate key?

A primary key is a special constraint on a column or set of columns. A primary key constraint
ensures that the column(s) so designated have no NULL values, and that every value is
unique. Physically, a primary key is implemented by the database system using a unique
index, and all the columns in the primary key must have been declared NOT NULL. A table
may have only one primary key, but it may be composite (consist of more than one column).

A surrogate key is any column or set of columns that can be declared as the primary key
instead of a "real" or natural key. Sometimes there can be several natural keys that could be
declared as the primary key, and these are all called candidate keys. So a surrogate is a
candidate key. A table could actually have more than one surrogate key, although this
would be unusual. The most common type of surrogate key is an incrementing integer, such
as an auto increment column in MySQL, or a sequence in Oracle, or an identity column in
SQL Server.
Have you done any Performance tuning in informatica?

1) Yes, One of my mapping was taking 3-4 hours to process 40 millions rows into
staging table we don’t have any transformation inside the mapping its 1 to 1
mapping .Here nothing is there to optimize the mapping so I created session
partitions using key range on effective date column. It improved performance lot,
rather than 4 hours it was running in 30 minutes for entire 40millions.Using
partitions DTM will creates multiple reader and writer threads.
2) There was one more scenario where I got very good performance in the mapping
level .Rather than using lookup transformation if we can able to do outer join in the
source qualifier query override this will give you good performance if both lookup
table and source were in the same database. If a lookup table is huge volumes then
creating cache is costly.
3) And also if we can able to optimize mapping using less no of transformations always
gives you good performance.
4) If any mapping taking long time to execute then first we need to look in to source
and target statistics in the monitor for the throughput and also find out where
exactly the bottle neck by looking busy percentage in the session log will come to

Page 11 of 59
know which transformation taking more time ,if your source query is the bottle neck
then it will show in the end of the session log as “query issued to database “that
means there is a performance issue in the source query.we need to tune the query
using .

How strong you are in UNIX?

1) I have UNIX shell scripting knowledge whatever informatica required like

If we want to run workflows in UNIX using PMCMD.

Below is the script to run workflow using UNIX.

cd /pmar/informatica/pc/pmserver/

/pmar/informatica/pc/pmserver/pmcmdstartworkflow -u $INFA_USER -p $INFA_PASSWD


-s $INFA_SERVER:$INFA_PORT -f $INFA_FOLDER -wait $1>>$LOG_PATH/$LOG_FILE

2) And if we suppose to process flat files using informatica but those files were exists in
remote server then we have to write script to get ftp into informatica server before start
process those files.

3) And also file watch mean that if indicator file available in the specified location then we
need to start our informatica jobs otherwise will send email notification using

Mail X command saying that previous jobs didn’t completed successfully something like
that.

4) Using shell script update parameter file with session start time and end time.

This kind of scripting knowledge I do have. If any new UNIX requirement comes then I can
Google and get the solution implement the same.

What is use of Shortcuts in informatica?

If we copy source definitions or target definitions or Mapplets from Shared folder to any
other folders that will become a shortcut.

Page 12 of 59
Let’s assume we have imported some source and target definitions in a shared folder after
that we are using those sources and target definitions in another folder as a shortcut in
some mappings.

If any modifications occur in the backend (Database) structurelike adding new columns or
drop existing columns either in source or target I f we’reimport into shared folder those new
changes automatically it would reflect in all folder/mappings wherever we used those
sources or target definitions.

How to Concat row data through informatica?

Source:

Ename EmpNo
stev 100
methew 100
john 101
tom 101

Target:

Ename EmpNo
Stevmethew 100
John tom 101
Ans:

Using Dynamic Lookup on Target table:

If record doesn’t exit do insert in target .If it is already exist then get corresponding Ename
value from lookup andconcat in expression with current Ename value then update the
target Ename column using update strategy.

Using Var port Approch:

Sort the data in sq based on EmpNo column then Use expression to store previous record
information using Var port after that use router to insert a record if it is first time if it is
already inserted then update Ename with concat value of prev name and current name
value then update in target.

Page 13 of 59
How to send Unique (Distinct) records into One target and duplicates into another tatget?

Source:

Ename EmpNo
stev 100
Stev 100
john 101
Mathew 102

Output:

Target_1:

Ename EmpNo
Stev 100
John 101
Mathew 102

Target_2:

Ename EmpNo
Stev 100

Ans:

Using Dynamic Lookup on Target table:

If record doen’t exit do insert in target_1 .If it is already exist then send it to Target_2 using
Router.

Using Var port Approch:

Sort the data in sq based on EmpNo column then Use expression to store previous record
information using Var ports after that use router to route the data into targets if it is first
time then sent it to first target if it is already inserted then send it to Tartget_2.

How to do Dymanic File generation in Informatica?

Page 14 of 59
I want to generate the separate file for every employee (as per Name, it should generate
file).It has to generate 5 flat files andname of the flat file is corresponding employee name
that is the requirement.

Below is my mapping.

Source (Table) -> SQ -> Target (FF)

Source:

Dept Ename EmpNo


A S 22
A R 27
B P 29
B X 30
B U 34

This functionality was added in informatica 8.5 onwards earlier versions it was not there.

We can achieve it with use of transaction control and special "FileName" port in the target
file .

In order to generate the target file names from the mapping, we should make use of the
special "FileName" port in the target file. You can't create this special port from the usual
New port button. There is a special button with label "F" on it to the right most corner of the
target flat file when viewed in "Target Designer".

When you have different sets of input data with different target files created, use the same
instance, but with a Transaction Control transformation which defines the boundary for the
source sets.

in target flat file there is option in column tab i.e filename as column.
when you click that one non editable column gets created in metadata of target.

in transaction control give condition as iif(not isnull(emp_no),tc_commit_before,continue)


else tc_commit_before

map the emp_no column to target's filename column

ur mapping will be like this

source ->squlf-> transaction control-> target

run it ,separate files will be created by name of Ename

Page 15 of 59
How do u populate 1st record to 1sttarget , 2nd record to 2nd target ,3rd record to 3rd target
and 4th record to 1sttargetthroughinformatica?

We can do using sequence generator by setting end value=3 and enable cycle option.then
in the router take 3 goups

In 1stgroup specify condition as seq next value=1 pass those records to 1st target simillarly

In 2nd group specify condition as seq next value=2 pass those records to 2 nd target

In 3rd group specify condition as seq next value=3 pass those records to 3rd target.

Since we have enabled cycle option after reaching end value sequence generator will start
from 1,for the 4th record seq.next value is 1 so it will go to 1st target.

How do you perform incremental logic or Delta or CDC?

Incremental means suppose today we processed 100 records ,for tomorrow run u need to
extract whatever the records inserted newly and updated after previous run based on last
updated timestamp (Yesterday run) this process called as incremental or delta.

Implementation for Incremental Load

Approach_1: Using set max var ()

1) First need to create mapping var ($$INCREMENT_TS)and assign initial value as


old date (01/01/1940).
2) Then override source qualifier query to fetch only LAT_UPD_DATE >=($
$INCREMENT_TS (Mapping var)
3) In the expression assign max last_upd_date value to ($$INCREMENT_TS
(mapping var) using set max var
4) Because its var so it stores the max last upd_date value in the repository, in the
next run our source qualifier query will fetch only the records updated or inseted
after previous run.

Page 16 of 59
Logic in the mapping variable is

Page 17 of 59
Logic in the SQ is

Page 18 of 59
In expression assign max last update date value to the variable using function set max variable

Page 19 of 59
Logic in the update strategy is below

Page 20 of 59
Approach_2:Using parameter file

1 First need to create mapping parameter($$LastUpdateDate Time )and assign


initial value as old date (01/01/1940) in the parameterfile.
2 Then override source qualifier query to fetch only LAT_UPD_DATE >=($
$LastUpdateDate Time (Mapping var)
3 Update mapping parameter($$LastUpdateDate Time) values in the parameter
file using shell script or another mapping after first session get completed
successfully
4 Because its mapping parameter so every time we need to update the value in
the parameter file after comptetion of main session.

Parameterfile format

[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.
ST:s_m_GEHC_APO_BAAN_SALES_HIST_AUSTRIA]

$DBConnection_Source=DMD2_GEMS_ETL

$DBConnection_Target=DMD2_GEMS_ETL

$$LastUpdateDate Time =01/01/1940

Page 21 of 59
Updating parameter File

Logic in the expression

Page 22 of 59
Main mapping

Page 23 of 59
Sqloverride in SQ Transformation

Workflod Design

Page 24 of 59
Parameter file

It is a text file below is the format for parameter file. We use to place this file in the unix box where
we have installed our informatic server.

[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.
ST:s_m_GEHC_APO_BAAN_SALES_HIST_AUSTRIA]

$InputFileName_BAAN_SALE_HIST=/interface/dev/etl/apo/srcfiles/HS_025_20070921

$DBConnection_Target=DMD2_GEMS_ETL

$$CountryCode=AT

$$CustomerNumber=120165

Page 25 of 59
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.
ST:s_m_GEHC_APO_BAAN_SALES_HIST_BELGIUM]

$DBConnection_Sourcet=DEVL1C1_GEMS_ETL

$OutputFileName_BAAN_SALES=/interface/dev/etl/apo/trgfiles/HS_002_20070921

$$CountryCode=BE

$$CustomerNumber=101495

Approach_3: Using oracle Control tables

1 First we need to create two control tables cont_tbl_1 and cont_tbl_1 with
structure of session_st_time,wf_name
2 Then insert one record in each table with session_st_time=1/1/1940 and
workflow_name
3 create two store procedures one for update cont_tbl_1 with session st_time,
set property of store procedure type as Source_pre_load .
4 In 2nd store procedure set property of store procedure type as Target
_Post_load.thisproc will update the session _st_time in Cont_tbl_2 from
cnt_tbl_1.
5 Then override source qualifier query to fetch only LAT_UPD_DATE >=(Select
session_st_time from cont_tbl_2 where workflow name=’Actual work flow
name’.
How to load cumulative salary in to target ?

Solution:

Using var ports in expression we can load cumulative salary into target.

Page 26 of 59
Also below is the logic for converting columns into Rows without using Normalizer
Transformation.

Page 27 of 59
1)      Source will contain two columns address and id.

2)      Use sorter to arrange the rows in ascending order.

3)      Then create expression as shown in below screen shot.

1) Use Aggregator transformation and check group by on port id only. As shown below:-

Page 28 of 59
Difference between dynamic lkp and static lkp cache?

1 In Dynamic lkp the cache memory will get refreshed as soon as the record get
inserted or updated/deleted in the lookup table where as in static lookup the
cache memory will not get refreshed even though record inserted or updated
in the lookup table it will refresh only in the next session run.
2 Best example where we need to use dynamic chache is if suppose first record
and last record both are same but there is a change in the address what
informatica mapping has to do here is first record needs to get insert and last
record should get update in the target table.
3 If we use static look up first record it will go to lookup and check in the lkp
cache based on the condition it will not find the macth so it returns null value
then in the router will send that recod to insert flow.
4 But still this record does not available in the cach memory so when the last
record comes to look up it will check in the cache it will not find the match it
returns null values again it will go to insert flow through router but it suppose
to go update flow because cache didn’t refreshed when the first record get
insert in to target table. So if we use dynamic look up we can achieve our
requirement because first time record get insert then immediately cache also
get refresh with the target data. When we process last record it will find the
match in the cache so it returns the value then router will route that record to
update flow.
What is the difference between snow flake and star schema

Star Schema Snow Flake Schema


The star schema is the simplest data Snowflake schema is a more complex data
warehouse scheme. warehouse model than a star schema.
In star schema each of the dimensions is In snow flake schema at least one hierarchy
represented in a single table .It should not should exists between dimension tables.
have any hierarchies between dims.
It contains a fact table surrounded by It contains a fact table surrounded by
dimension tables. If the dimensions are de- dimension tables. If a dimension is
normalized, we say it is a star schema normalized, we say it is a snow flaked design.
design.
In star schema only one join establishes the In snow flake schema since there is
relationship between the fact table and any relationship between the dimensions tables
one of the dimension tables. it has to do many joins to fetch the data.

Page 29 of 59
A star schema optimizes the performance by Snowflake schemas normalize dimensions to
keeping queries simple and providing fast eliminated redundancy. The result is more
response time. All the information about the complex queries and reduced query
each level is stored in one row. performance.
It is called a star schema because the It is called a snowflake schema because the
diagram resembles a star. diagram resembles a snowflake.

Difference between data mart and data warehouse

Data Mart Data Warehouse


Data mart is usually sponsored at the Data warehouse is a “Subject-Oriented,
department level and developed with a Integrated, Time-Variant, Nonvolatile
specific issue or subject in mind, a data mart collection of data in support of decision
is a data warehouse with a focused making”.
objective.
A data mart is used on a business division/ A data warehouse is used on an enterprise
department level. level
A Data Mart is a subset of data from a Data A Data Warehouse is simply an integrated
Warehouse. Data Marts are built for specificconsolidation of data from a variety of
user groups. sources that is specially designed to support
strategic and tactical decision making.
By providing decision makers with only a The main objective of Data Warehouse is to
subset of data from the Data Warehouse, provide an integrated environment and
Privacy, Performance and Clarity Objectives coherent picture of the business at a point in
can be attained. time.
Differences between connected lookup and unconnected lookup

Connected Lookup Unconnected Lookup


This is connected to pipleline and receives Which is not connected to pipeline and
the input values from pipleline. receives input values from the result of a:
LKP expression in another transformation via
arguments.
We cannot use this lookup more than once We can use this transformation more than
in a mapping. once within the mapping
We can return multiple columns from the Designate one return port (R), returns one
same row. column from each row.
We can configure to use dynamic cache. We cannot configure to use dynamic cache.
Pass multiple output values to another Pass one output value to another
transformation. Link lookup/output ports to transformation. The lookup/output/return
another transformation. port passes the value to the transformation
calling: LKP expression.
Use a dynamic or static cache Use a static cache
Supports user defined default values. Does not support user defined default
values.
Cache includes the lookup source column in Cache includes all lookup/output ports in the
the lookup condition and the lookup source lookup condition and the lookup/return port.

Page 30 of 59
columns that are output ports.

What is the difference between joiner and lookup

Joiner Lookup
In joiner on multiple matches it will return all
In lookup it will return either first record or
matching records. last record or any value or error value.
In joiner we cannot configure to use Where as in lookup we can configure to use
persistence cache, shared cache, uncached persistence cache, shared cache, uncached
and dynamic cache and dynamic cache.
We cannot override the query in joiner We can override the query in lookup to fetch
the data from multiple tables.
We can perform outer join in joiner We cannot perform outer join in lookup
transformation. transformation.
We cannot use relational operators in joiner Where as in lookup we can use the relation
transformation.(i.e. <,>,<= and so on) operators. (i.e. <,>,<= and so on)

What is the difference between source qualifier and lookup

Source Qualifier Lookup


In source qualifier it will push all the Where as in lookup we can restrict whether
matching records. to display first value, last value or any value
In source qualifier there is no concept of Where as in lookup we concentrate on cache
cache. concept.
When both source and lookup are in same When the source and lookup table exists in
database we can use source qualifier. different database then we need to use
lookup.

Differences between dynamic lookup and static lookup

Dynamic Lookup Cache Static Lookup Cache


In dynamic lookup the cache memory will get In static lookup the cache memory will not
refreshed as soon as the record get inserted get refreshed even though record inserted
or updated/deleted in the lookup table. or updated in the lookup table it will refresh
only in the next session run.
When we configure a lookup transformation It is a default cache.
to use a dynamic lookup cache, you can only
use the equality operator in the lookup
condition.
NewLookupRow port will enable
automatically.
Best example where we need to use dynamic If we use static lookup first record it will go
cache is if suppose first record and last to lookup and check in the lookup cache
record both are same but there is a change based on the condition it will not find the
in the address. What informatica mapping match so it will return null value then in the
Page 31 of 59
has to do here is first record needs to get router it will send that record to insert flow.
insert and last record should get update in But still this record dose not available in the
the target table. cache memory so when the last record
comes to lookup it will check in the cache it
will not find the match so it returns null
value again it will go to insert flow through
router but it is suppose to go to update flow
because cache didn’t get refreshed when
the first record get inserted into target table.

Howto Process multiple flat files to single target table through informatica if all files are
same structure?

We can process all flat files through one mapping and one session using list file.

First we need to create list file using unixscript for all flat file the extension of the list file
is .LST.

This list file it will have only flat file names.

At session level we need to set

source file directory as list file path

And source file name as list file name

And file type as indirect.

How to populate file name to target while loading multiple files using list file concept.

In informatica 8.6 by selecting Add currently processed flatfile name option in the
properties tab of source definition after import source file defination in source analyzer.It
will add new column as currently processed file name.we can map this column to target to
populate filename.

SCD Type-II Effective-Date Approach

 We have one of the dimension in current project called resource dimension. Here we
are maintaining the history to keep track of SCD changes.
 To maintain the history in slowly changing dimension or resource dimension. We
followed SCD Type-II Effective-Date approach.
 My resource dimension structure would be eff-start-date, eff-end-date, s.k and
source columns.

Page 32 of 59
 Whenever I do a insert into dimension I would populate eff-start-date with sysdate,
eff-end-date with future date and s.k as a sequence number.
 If the record already present in my dimension but there is change in the source data.
In that case what I need to do is
 Update the previous record eff-end-date with sysdate and insert as a new record
with source data.

Informatica design to implement SDC Type-II effective-date approach

 Once you fetch the record from source qualifier. We will send it to lookup to find out
whether the record is present in the target or not based on source primary key
column.
 Once we find the match in the lookup we are taking SCD column and s.k column
from lookup to expression transformation.
 In lookup transformation we need to override the lookup override query to fetch
active records from the dimension while building the cache.
 In expression transformation I can compare source with lookup return data.
 If the source and target data is same then I can make a flag as ‘S’.
 If the source and target data is different then I can make a flag as ‘U’.
 If source data does not exists in the target that means lookup returns null value. I
can flag it as ‘I’.
 Based on the flag values in router I can route the data into insert and update flow.
 If flag=’I’ or ‘U’ I will pass it to insert flow.
 If flag=’U’ I will pass this record to eff-date update flow
 When we do insert we are passing the sequence value to s.k.
 Whenever we do update we are updating the eff-end-date column based on lookup
return s.k value.

Complex Mapping

 We have one of the order file requirement. Requirement is every day in source
system they will place filename with timestamp in informatica server.
 We have to process the same date file through informatica.
 Source file directory contain older than 30 days files with timestamps.
 For this requirement if I hardcode the timestamp for source file name it will process
the same file every day.
 So what I did here is I created $InputFilename for source file name.
 Then I am going to use the parameter file to supply the values to session variables
($InputFilename).
 To update this parameter file I have created one more mapping.
 This mapping will update the parameter file with appended timestamp to file name.
 I make sure to run this parameter file update mapping before my actual mapping.

Page 33 of 59
How to handle errors in informatica?

 We have one of the source with numerator and denominator values we need to
calculate num/deno
 When populating to target.
 If deno=0 I should not load this record into target table.
 We need to send those records to flat file after completion of 1 st session run. Shell
script will check the file size.
 If the file size is greater than zero then it will send email notification to source
system POC (point of contact) along with deno zero record file and appropriate email
subject and body.
 If file size<=0 that means there is no records in flat file. In this case shell script will
not send any email notification.
 Or
 We are expecting a not null value for one of the source column.
 If it is null that means it is a error record.
 We can use the above approach for error handling.

Worklet

Worklet is a set of reusable sessions. We cannot run the worklet without workflow.

If we want to run 2 workflow one after another.

 If both workflow exists in same folder we can create 2 worklet rather than creating 2
workfolws.
 Finally we can call these 2 worklets in one workflow.
 There we can set the dependency.
 If both workflowsexists in different folders or repository then we cannot create
worklet.
 We can set the dependency between these two workflow using shell script is one
approach.
 The other approach is event wait and event rise.

In shell script approach

 As soon as it completes first workflow we are creating zero byte file (indicator file).
 If indicator file is available in particular location. We will run second workflow.
 If indicator file is not available we will wait for 5 minutes and again we will check for
the indicator. Like this we will continue the loop for 5 times i.e 30 minutes.
 After 30 minutes if the file does not exists we will send out email notification.

Event wait and Event rise approach

Page 34 of 59
In event wait it will wait for infinite time. Till the indicator file is available.

Why we need source qualifier?


Simply it performs select statement.
Select statement fetches the data in the form of row.
Source qualifier will select the data from the source table.
It identifies the record from the source.
It will convert the data types from database language to informatica understandable
language

Parameter file it will supply the values to session level variables and mapping level
variables.

Variables are of two types:

 Session level variables


 Mapping level variables

Session level variables are of four types:


 $DBConnection_Source
 $DBConnection_Target
 $InputFile
 $OutputFile

Mapping level variables are of two types:


 Variable
 Parameter

What is the difference between mapping level and session level variables?
Mapping level variables always starts with $$.
A session level variable always starts with $.

Flat File
Flat file is a collection of data in a file in the specific format.

Informatica can support two types of files

 Delimiter
 Fixed Width

In delimiter we need to specify the separator.

In fixed width we need to known about the format first. Means how many character to read
for particular column.
Page 35 of 59
In delimiter also it is necessary to know about the structure of the delimiter. Because to
know about the headers.

If the file contains the header then in definition we need to skip the first row.

List file:

If you want to process multiple files with same structure. We don’t need multiple mapping
and multiple sessions.

We can use one mapping one session using list file option.

First we need to create the list file for all the files. Then we can use this file in the main
mapping.

Aggregator Transformation:

Transformation type:
Active
Connected
The Aggregator transformation performs aggregate calculations, such as averages and sums.
The Aggregator transformation is unlike the Expression transformation, in that you use the
Aggregator transformation to perform calculations on groups. The Expression
transformation permits you to perform calculations on a row-by-row basis only.

Components of the Aggregator Transformation:

The Aggregator is an active transformation, changing the number of rows in the pipeline.
The Aggregator transformation has the following components and options

Aggregate cache: The Integration Service stores data in the aggregate cache until it
completes aggregate calculations. It stores group values in an index cache and row data in
the data cache.

Aggregate expression: Enter an expression in an output port. The expression can include
non-aggregate expressions and conditional clauses.

Group by port: Indicate how to create groups. The port can be any input, input/output,
output, or variable port. When grouping data, the Aggregator transformation outputs the
last row of each group unless otherwise specified.

Sorted input: Select this option to improve session performance. To use sorted input, you
must pass data to the Aggregator transformation sorted by group by port, in ascending or
descending order.

Aggregate Expressions:

Page 36 of 59
The Designer allows aggregate expressions only in the Aggregator transformation. An
aggregate expression can include conditional clauses and non-aggregate functions. It can
also include one aggregate function nested within another aggregate function, such as:

MAX (COUNT (ITEM))

The result of an aggregate expression varies depending on the group by ports used in the
transformation

Aggregate Functions

Use the following aggregate functions within an Aggregator transformation. You can nest
one aggregate function within another aggregate function.

The transformation language includes the following aggregate functions:

AVG
COUNT
FIRST
LAST
MAX
MEDIAN
MIN
PERCENTILE
STDDEV
SUM
VARIANCE

When you use any of these functions, you must use them in an expression within an
Aggregator transformation.

Tips

Use sorted input to decrease the use of aggregate caches.

Sorted input reduces the amount of data cached during the session and improves session
performance. Use this option with the Sorter transformation to pass sorted data to the
Aggregator transformation.

Limit connected input/output or output ports.

Limit the number of connected input/output or output ports to reduce the amount of data
the Aggregator transformation stores in the data cache.

Filter the data before aggregating it.

If you use a Filter transformation in the mapping, place the transformation before the
Aggregator transformation to reduce unnecessary aggregation.

Page 37 of 59
Normalizer Transformation:

Transformation type:
Active
Connected
The Normalizer transformation receives a row that contains multiple-occurring columns and
returns a row for each instance of the multiple-occurring data.

The Normalizer transformation parses multiple-occurring columns from COBOL sources,


relational tables, or other sources. It can process multiple record types from a COBOL source
that contains a REDEFINES clause.

The Normalizer transformation generates a key for each source row. The Integration Service
increments the generated key sequence number each time it processes a source row. When
the source row contains a multiple-occurring column or a multiple-occurring group of
columns, the Normalizer transformation returns a row for each occurrence. Each row
contains the same generated key value.
Transaction Control Transformation
Transformation type:
Active
Connected

PowerCenter lets you control commit and roll back transactions based on a set of rows that
pass through a Transaction Control transformation. A transaction is the set of rows bound
by commit or roll back rows. You can define a transaction based on a varying number of
input rows. You might want to define transactions based on a group of rows ordered on a
common key, such as employee ID or order entry date.

In PowerCenter, you define transaction control at the following levels:


Within a mapping. Within a mapping, you use the Transaction Control transformation to
define a transaction. You define transactions using an expression in a Transaction Control
transformation. Based on the return value of the expression, you can choose to commit, roll
back, or continue without any transaction changes.
Within a session.When you configure a session, you configure it for user-defined commit.
You can choose to commit or roll back a transaction if the Integration Service fails to
transform or write any row to the target.

When you run the session, the Integration Service evaluates the expression for each row
that enters the transformation. When it evaluates a commit row, it commits all rows in the
transaction to the target or targets. When the Integration Service evaluates a roll back row,
it rolls back all rows in the transaction from the target or targets.

If the mapping has a flat file target you can generate an output file each time the Integration
Service starts a new transaction. You can dynamically name each target flat file.

Page 38 of 59
Union Transformation

1. What is a union transformation?

A union transformation is used merge data from multiple sources similar to the UNION ALL SQL
statement to combine the results from two or more SQL statements.

2. As union transformation gives UNION ALL output, how you will get the UNION output?

Pass the output of union transformation to a sorter transformation. In the properties of sorter
transformation check the option select distinct. Alternatively you can pass the output of union
transformation to aggregator transformation and in the aggregator transformation specify all ports
as group by ports.

3. What are the guidelines to be followed while using union transformation?

The following rules and guidelines need to be taken care while working with union
transformation:

 You can create multiple input groups, but only one output group.
 All input groups and the output group must have matching ports. The precision, datatype,
and scale must be identical across all groups.
 The Union transformation does not remove duplicate rows. To remove duplicate rows,
you must add another transformation such as a Router or Filter transformation.
 You cannot use a Sequence Generator or Update Strategy transformation upstream from
a Union transformation.
 The Union transformation does not generate transactions.

4. Why union transformation is an active transformation?

Union is an active transformation because it combines two or more data streams into one.
Though the total number of rows passing into the Union is the same as the total number of rows
passing out of it, and the sequence of rows from any given input stream is preserved in the
output, the positions of the rows are not preserved, i.e. row number 1 from input stream 1 might
not be row number 1 in the output stream. Union does not even guarantee that the output is
repeatable

Aggregator Transformation

1. What is aggregator transformation?


Aggregator transformation performs aggregate calculations like sum, average, count etc. It is an
active transformation, changes the number of rows in the pipeline. Unlike expression
transformation (performs calculations on a row-by-row basis), an aggregator transformation
performs calculations on group of rows.

2. What is aggregate cache?


The integration service creates index and data cache in memory to process the aggregator
transformation and stores the data group in index cache, row data in data cache. If the
integration service requires more space, it stores the overflow values in cache files.

3. How can we improve performance of aggregate transformation?

Page 39 of 59
 Use sorted input: Sort the data before passing into aggregator. The integration service
uses memory to process the aggregator transformation and it does not use cache memory.
 Filter the unwanted data before aggregating.
 Limit the number of input/output or output ports to reduce the amount of data the
aggregator transformation stores in the data cache.

4. What are the different types of aggregate functions?

The different types of aggregate functions are listed below:

 AVG
 COUNT
 FIRST
 LAST
 MAX
 MEDIAN
 MIN
 PERCENTILE
 STDDEV
 SUM
 VARIANCE

5. Why cannot you use both single level and nested aggregate functions in a single aggregate
transformation?

The nested aggregate function returns only one output row, whereas the single level aggregate
function returns more than one row. Since the number of rows returned are not same, you cannot
use both single level and nested aggregate functions in the same transformation. If you include
both the single level and nested functions in the same aggregator, the designer marks the
mapping or mapplet as invalid. So, you need to create separate aggregator transformations.

6. Up to how many levels, you can nest the aggregate functions?

We can nest up to two levels only.


Example: MAX( SUM( ITEM ) )

7. What is incremental aggregation?

The integration service performs aggregate calculations and then stores the data in historical
cache. Next time when you run the session, the integration service reads only new data and uses
the historical cache to perform new aggregation calculations incrementally.

8. Why cannot we use sorted input option for incremental aggregation?

In incremental aggregation, the aggregate calculations are stored in historical cache on the
server. In this historical cache the data need not be in sorted order.  If you give sorted input, the
records come as presorted for that particular run but in the historical cache the data may not be
in the sorted order. That is why this option is not allowed.

9. How the NULL values are handled in Aggregator?

Page 40 of 59
You can configure the integration service to treat null values in aggregator functions as NULL or
zero. By default the integration service treats null values as NULL in aggregate functions.

Normalizer Transformation

1. What is normalizer transformation?

The normalizer transformation receives a row that contains multiple-occurring columns and
retruns a row for each instance of the multiple-occurring data. This means it converts column
data in to row data. Normalizer is an active transformation.

2. Which transformation is required to process the cobol sources?

Since the cobol sources contain denormalzed data, normalizer transformation is used to
normalize the cobol sources.

3. What is generated key and generated column id in a normalizer transformation?

 The integration service increments the generated key sequence number each time it
process a source row. When the source row contains a multiple-occurring column or a multiple-
occurring group of columns, the normalizer transformation returns a row for each occurrence.
Each row contains the same generated key value.
 The normalizer transformation has a generated column ID (GCID) port for each multiple-
occurring column. The GCID is an index for the instance of the multiple-occurring data. For
example, if a column occurs 3 times in a source record, the normalizer returns a value of 1,2 or
3 in the generated column ID.

4. What is VSAM?

VSAM (Virtual Storage Access Method) is a file access method for an IBM mainframe operating
system. VSAM organize records in indexed or sequential flat files.

5. What is VSAM normalizer transformation?

The VSAM normalizer transformation is the source qualifier transformation for a COBOL source
definition. A COBOL source is flat file that can contain multiple-occurring data and multiple types
of records in the same file.

6. What is pipeline normalizer transformation?

Pipeline normalizer transformation processes multiple-occurring data from relational tables or flat
files.

7. What is occurs clause and redefines clause in normalizer transformation?

 Occurs clause is specified when the source row has a multiple-occurring columns.
 A redefines clause is specified when the source has rows of multiple columns

Rank Transformation

1. What is rank transformation?

Page 41 of 59
A rank transformation is used to select top or bottom rank of data. This means, it
selects the largest or smallest numeric value in a port or group. Rank
transformation also selects the strings at the top or bottom of a session sort
order. Rank transformation is an active transformation.

2. What is rank cache?

The integration service compares input rows in the data cache, if the input row
out-ranks a cached row, the integration service replaces the cached row with the
input row. If you configure the rank transformation to rank across multiple groups,
the integration service ranks incrementally for each group it finds. The integration
service stores group information in index cache and row data in data cache.

3. What is RANKINDEX port?

The designer creates RANKINDEX port for each rank transformation. The
integration service uses the rank index port to store the ranking position for each
row in a group.

4. How do you specify the number of rows you want to rank in a rank
transformation?

In the rank transformation properties, there is an option 'Number of Ranks' for


specifying the number of rows you wants to rank.

5. How to select either top or bottom ranking for a column?

In the rank transformation properties, there is an option 'Top/Bottom' for selecting


the top or bottom ranking for a column.

6. Can we specify ranking on more than one port?

No. We can specify to rank the data based on only one port. In the ports tab, you
have to check the R option for designating the port as a rank port and this option
can be checked only on one port

Joiner Transformation

1. What is a joiner transformation?

A joiner transformation joins two heterogeneous sources. You can also join the data from the
same source. The joiner transformation joins sources with at least one matching column. The
joiner uses a condition that matches one or more joins of columns between the two sources.

2. How many joiner transformations are required to join n sources?

To join n sources n-1 joiner transformations are required.

3. What are the limitations of joiner transformation?

Page 42 of 59
 You cannot use a joiner transformation when input pipeline contains an update strategy
transformation.
 You cannot use a joiner if you connect a sequence generator transformation directly
before the joiner.

4. What are the different types of joins?

 Normal join: In a normal join, the integration service discards all the rows from the master
and detail source that do not match the join condition.
 Master outer join: A master outer join keeps all the rows of data from the detail source
and the matching rows from the master source. It discards the unmatched rows from the master
source.
 Detail outer join: A detail outer join keeps all the rows of data from the master source and
the matching rows from the detail source. It discards the unmatched rows from the detail
source.
 Full outer join: A full outer join keeps all rows of data from both the master and detail
rows.

5. What is joiner cache?

When the integration service processes a joiner transformation, it reads the rows from master
source and builds the index and data cached. Then the integration service reads the detail
source and performs the join. In case of sorted joiner, the integration service reads both sources
(master and detail) concurrently and builds the cache based on the master rows.

6. How to improve the performance of joiner transformation?

 Join sorted data whenever possible.


 For an unsorted Joiner transformation, designate the source with fewer rows as the
master source.
 For a sorted Joiner transformation, designate the source with fewer duplicate key values
as the master source.

7. Why joiner is a blocking transformation?

When the integration service processes an unsorted joiner transformation, it reads all master
rows before it reads the detail rows. To ensure it reads all master rows before the detail rows, the
integration service blocks all the details source while it caches rows from the master source. As it
blocks the detail source, the unsorted joiner is called a blocking transformation.

8. What are the settings used to configure the joiner transformation

 Master and detail source


 Type of join
 Join condition

Router Transformation

1. What is a router transformation?

A router is used to filter the rows in a mapping. Unlike filter transformation, you can specify one
or more conditions in a router transformation. Router is an active transformation.

Page 43 of 59
2. How to improve the performance of a session using router transformation?

Use router transformation in a mapping instead of creating multiple filter transformations to


perform the same task. The router transformation is more efficient in this case. When you use a
router transformation in a mapping, the integration service processes the incoming data only
once. When you use multiple filter transformations, the integration service processes the
incoming data for each transformation.

3. What are the different groups in router transformation?

The router transformation has the following types of groups:

 Input
 Output

4. How many types of output groups are there?

There are two types of output groups:

 User-defined group
 Default group

5. Where you specify the filter conditions in the router transformation?

You can creat the group filter conditions in the groups tab using the expression editor.

6. Can you connect ports of two output groups from router transformation to a single target?

No. You cannot connect more than one output group to one target or a single input group
transformation.

Stored Procedure Transformation

1. What is a stored procedure?

A stored procedure is a precompiled collection of database procedural statements. Stored


procedures are stored and run within the database.

2. Give some examples where a stored procedure is used?

The stored procedure can be used to do the following tasks

 Check the status of a target database before loading data into it.
 Determine if enough space exists in a database.
 Perform a specialized calculation.
 Drop and recreate indexes.

3. What is a connected stored procedure transformation?

Page 44 of 59
The stored procedure transformation is connected to the other transformations in the mapping
pipeline.

4. In which scenarios a connected stored procedure transformation is used?

 Run a stored procedure every time a row passes through the mapping.
 Pass parameters to the stored procedure and receive multiple output parameters.

5. What is an unconnected stored procedure transformation?

The stored procedure transformation is not connected directly to the flow of the mapping. It either
runs before or after the session or is called by an expression in another transformation in the
mapping.

6. In which scenarios an unconnected stored procedure transformation is used?

 Run a stored procedure before or after a session


 Run a stored procedure once during a mapping, such as pre or post-session.
 Run a stored procedure based on data that passes through the mapping, such as when a
specific port does not contain a null value.
 Run nested stored procedures.
 Call multiple times within a mapping.

7. What are the options available to specify when the stored procedure transformation needs to
be run?

The following options describe when the stored procedure transformation runs:

 Normal: The stored procedure runs where the transformation exists in the mapping on a
row-by-row basis. This is useful for calling the stored procedure for each row of data that
passes through the mapping, such as running a calculation against an input port. Connected
stored procedures run only in normal mode.
 Pre-load of the Source: Before the session retrieves data from the source, the stored
procedure runs. This is useful for verifying the existence of tables or performing joins of data in
a temporary table.
 Post-load of the Source: After the session retrieves data from the source, the stored
procedure runs. This is useful for removing temporary tables.
 Pre-load of the Target: Before the session sends data to the target, the stored procedure
runs. This is useful for verifying target tables or disk space on the target system.
 Post-load of the Target: After the session sends data to the target, the stored procedure
runs. This is useful for re-creating indexes on the database.

A connected stored procedure transformation runs only in Normal mode. A unconnected stored
procedure transformation runs in all the above modes.

8. What is execution order in stored procedure transformation?

The order in which the Integration Service calls the stored procedure used in the transformation,
relative to any other stored procedures in the same mapping. Only used when the Stored
Procedure Type is set to anything except Normal and more than one stored procedure exists.

Page 45 of 59
9. What is PROC_RESULT in stored procedure transformation?

PROC_RESULT is a system variable, where the output of an unconnected stored procedure


transformation is assigned by default.

10. What are the parameter types in a stored procedure?

There are three types of parameters exist in a stored procedure:

 IN: Input passed to the stored procedure


 OUT: Output returned from the stored procedure
 INOUT: Defines the parameter as both input and output. Only Oracle supports this
parameter type.

Source Qualifier Transformation

1. What is a source qualifier transformation?

A source qualifier represents the rows that the integration service reads when it runs a session.
Source qualifier is an active transformation.

2. Why you need a source qualifier transformation?

The source qualifier transformation converts the source data types into informatica native data
types.

3. What are the different tasks a source qualifier can do?

 Join two or more tables originating from the same source (homogeneous sources)
database.
 Filter the rows.
 Sort the data
 Selecting distinct values from the source
 Create custom query
 Specify a pre-sql and post-sql

4. What is the default join in source qualifier transformation?

The source qualifier transformation joins the tables based on the primary key-foreign key
relationship.

5. How to create a custom join in source qualifier transformation?

When there is no primary key-foreign key relationship between the tables, you can specify a
custom join using the 'user-defined join' option in the properties tab of source qualifier.

6. How to join heterogeneous sources and flat files?

Use joiner transformation to join heterogeneous sources and flat files

7. How do you configure a source qualifier transformation?

 SQL Query

Page 46 of 59
 User-Defined Join
 Source Filter
 Number of Sorted Ports
 Select Distinct
 Pre-SQL
 Post-SQL

Sequence Generator Transformation

1. What is a sequence generator transformation?

A Sequence generator transformation generates numeric values. Sequence generator


transformation is a passive transformation.

2. What is the use of a sequence generator transformation?

A sequence generator is used to create unique primary key values, replace missing primary key
values or cycle through a sequential range of numbers.

3. What are the ports in sequence generator transformation?

A sequence generator contains two output ports. They are CURRVAL and NEXTVAL.

4. What is the maximum number of sequence that a sequence generator can generate?

The maximum value is 9,223,372,036,854,775,807

5. When you connect both the NEXTVAL and CURRVAL ports to a target, what will be the output
values of these ports?

The output values are


NEXTVAL  CURRVAL
1        2
2        3
3        4
4        5
5        6

6. What will be the output value, if you connect only CURRVAL to the target without connecting
NEXTVAL?

The integration service passes a constant value for each row.

7. What will be the value of CURRVAL in a sequence generator transformation?

CURRVAL is the sum of "NEXTVAL" and "Increment By" Value.

8. What is the number of cached values set to default for a sequence generator transformation?

For non-reusable sequence generators, the number of cached values is set to zero.
For reusable sequence generators, the number of cached values is set to 1000.

9. How do you configure a sequence generator transformation?

The following properties need to be configured for a sequence generator transformation:

Page 47 of 59
 Start Value
 Increment By
 End Value
 Current Value
 Cycle
 Number of Cached Values

Lookup Transformation

1. What is a lookup transformation?


A lookup transformation is used to look up data in a flat file, relational table, view, and synonym.

2. What are the tasks of a lookup transformation?


The lookup transformation is used to perform the following tasks?

 Get a related value: Retrieve a value from the lookup table based on a value in the
source.
 Perform a calculation: Retrieve a value from a lookup table and use it in a calculation.
 Update slowly changing dimension tables: Determine whether rows exist in a target.

3. How do you configure a lookup transformation?


Configure the lookup transformation to perform the following types of lookups:

 Relational or flat file lookup


 Pipeline lookup
 Connected or unconnected lookup
 Cached or uncached lookup

4. What is a pipeline lookup transformation?


A pipeline lookup transformation is used to perform lookup on application sources such as JMS,
MSMQ or SAP. A pipeline lookup transformation has a source qualifier as the lookups source.

5. What is connected and unconnected lookup transformation?

 A connected lookup transformation is connected the transformations in the mapping


pipeline. It receives source data, performs a lookup and returns data to the pipeline.
 An unconnected lookup transformation is not connected to the other transformations in
the mapping pipeline. A transformation in the pipeline calls the unconnected lookup with a :LKP
expression.

6. What are the differences between connected and unconnected lookup transformation?

 Connected lookup transformation receives input values directly from the pipeline.
Unconnected lookup transformation receives input values from the result of a :LKP expression
in another transformation.
 Connected lookup transformation can be configured as dynamic or static cache.
Unconnected lookup transformation can be configured only as static cache.

Page 48 of 59
 Connected lookup transformation can return multiple columns from the same row or
insert into the dynamic lookup cache. Unconnected lookup transformation can return one
column from each row.
 If there is no match for the lookup condition, connected lookup transformation returns
default value for all output ports. If you configure dynamic caching, the Integration Service
inserts rows into the cache or leaves it unchanged. If there is no match for the lookup condition,
the unconnected lookup transformation returns null.
 In a connected lookup transformation, the cache includes the lookup source columns in
the lookup condition and the lookup source columns that are output ports. In an unconnected
lookup transformation, the cache includes all lookup/output ports in the lookup condition and the
lookup/return port.
 Connected lookup transformation passes multiple output values to another
transformation. Unconnected lookup transformation passes one output value to another
transformation.
 Connected lookup transformation supports user-defined values. Unconnected lookup
transformation does not support user-defined default values.

7. How do you handle multiple matches in lookup transformation? or what is "Lookup Policy on
Multiple Match"?
"Lookup Policy on Multiple Match" option is used to determine which rows that the lookup
transformation returns when it finds multiple rows that match the lookup condition. You can select
lookup to return first or last row or any matching row or to report an error.

8. What is "Output Old Value on Update"?


This option is used when dynamic cache is enabled. When this option is enabled, the integration
service outputs old values out of the lookup/output ports. When the Integration Service updates a
row in the cache, it outputs the value that existed in the lookup cache before it updated the row
based on the input data. When the Integration Service inserts a new row in the cache, it outputs
null values. When you disable this property, the Integration Service outputs the same values out
of the lookup/output and input/output ports.

9. What is "Insert Else Update" and "Update Else Insert"?


These options are used when dynamic cache is enabled.

 Insert Else Update option applies to rows entering the lookup transformation with the row
type of insert. When this option is enabled the integration service inserts new rows in the cache
and updates existing rows when disabled, the Integration Service does not update existing
rows.
 Update Else Insert option applies to rows entering the lookup transformation with the row
type of update. When this option is enabled, the Integration Service updates existing rows, and
inserts a new row if it is new. When disabled, the Integration Service does not insert new rows.

10. What are the options available to configure a lookup cache?


The following options can be used to configure a lookup cache:

 Persistent cache
 Recache from lookup source
 Static cache
 Dynamic cache
 Shared Cache
 Pre-build lookup cache

11. What is a cached lookup transformation and uncached lookup transformation?

Page 49 of 59
 Cached lookup transformation: The Integration Service builds a cache in memory when it
processes the first row of data in a cached Lookup transformation. The Integration Service
stores condition values in the index cache and output values in the data cache. The Integration
Service queries the cache for each row that enters the transformation.
 Uncached lookup transformation: For each row that enters the lookup transformation, the
Integration Service queries the lookup source and returns a value. The integration service does
not build a cache.

12. How the integration service builds the caches for connected lookup transformation?
The Integration Service builds the lookup caches for connected lookup transformation in the
following ways:

 Sequential cache: The Integration Service builds lookup caches sequentially. The
Integration Service builds the cache in memory when it processes the first row of the data in a
cached lookup transformation.
 Concurrent caches: The Integration Service builds lookup caches concurrently. It does
not need to wait for data to reach the Lookup transformation.

13. How the integration service builds the caches for unconnected lookup transformation?
The Integration Service builds caches for unconnected Lookup transformations as sequentially.

14. What is a dynamic cache?


The dynamic cache represents the data in the target. The Integration Service builds the cache
when it processes the first lookup request. It queries the cache based on the lookup condition for
each row that passes into the transformation. The Integration Service updates the lookup cache
as it passes rows to the target. The integration service either inserts the row in the cache or
updates the row in the cache or makes no change to the cache.

15. When you use a dynamic cache, do you need to associate each lookup port with the input
port?
Yes. You need to associate each lookup/output port with the input/output port or a sequence ID.
The Integration Service uses the data in the associated port to insert or update rows in the
lookup cache.

16. What are the different values returned by NewLookupRow port?


The different values are

 0 - Integration Service does not update or insert the row in the cache.
 1 - Integration Service inserts the row into the cache.
 2 - Integration Service updates the row in the cache.

17. What is a persistent cache?


If the lookup source does not change between session runs, then you can improve the
performance by creating a persistent cache for the source. When a session runs for the first time,
the integration service creates the cache files and saves them to disk instead of deleting them.
The next time when the session runs, the integration service builds the memory from the cache
file.

18. What is a shared cache?


You can configure multiple Lookup transformations in a mapping to share a single lookup cache.
The Integration Service builds the cache when it processes the first Lookup transformation. It
uses the same cache to perform lookups for subsequent Lookup transformations that share the
cache.

Page 50 of 59
19. What is unnamed cache and named cache?

 Unnamed cache: When Lookup transformations in a mapping have compatible caching


structures, the Integration Service shares the cache by default. You can only share static
unnamed caches.
 Named cache: Use a persistent named cache when you want to share a cache file
across mappings or share a dynamic and a static cache. The caching structures must match or
be compatible with a named cache. You can share static and dynamic named caches.

20. How do you improve the performance of lookup transformation?

 Create an index on the columns used in the lookup condition


 Place conditions with equality operator first
 Cache small lookup tables.
 Join tables in the database: If the source and the lookup table are in the same database,
join the tables in the database rather than using a lookup transformation.
 Use persistent cache for static lookups.
 Avoid ORDER BY on all columns in the lookup source. Specify explicitly the ORDER By
clause on the required columns.
 For flat file lookups, provide Sorted files as lookup source.

Transaction Control Transformation

1. What is a transaction control transformation?

A transaction is a set of rows bound by a commit or rollback of rows. The transaction control
transformation is used to commit or rollback a group of rows.

2. What is the commit type if you have a transaction control transformation in the mapping?

The commit type is "user-defined".

3. What are the different transaction levels available in transaction control transformation?
The following are the transaction levels or built-in variables:

 TC_CONTINUE_TRANSACTION: The Integration Service does not perform any


transaction change for this row. This is the default value of the expression.
 TC_COMMIT_BEFORE: The Integration Service commits the transaction, begins a new
transaction, and writes the current row to the target. The current row is in the new
transaction.
 TC_COMMIT_AFTER: The Integration Service writes the current row to the target,
commits the transaction, and begins a new transaction. The current row is in the
committed transaction.
 TC_ROLLBACK_BEFORE: The Integration Service rolls back the current transaction,
begins a new transaction, and writes the current row to the target. The current row is in
the new transaction.

Page 51 of 59
 TC_ROLLBACK_AFTER: The Integration Service writes the current row to the target,
rolls back the transaction, and begins a new transaction. The current row is in the rolled
back transaction.
 Basic Commands

 1. # system administrator prompt


$ user working prompt

 2. $pwd: It displays present working directory. (If you login with dwhlabs)

 /home/dwhlabs

 3. $logname: It displays the current username.

 dwhlabs

 4. $clear: It clears the screen.

 5. $date: It displays the current system date and time.

 Day Month date hour:min:sec standard name year


FRI MAY 23 10:50:30 IST 2008

 6. $banner: It prints a message in large letters.

 $banner "tecno"

 7. $exit: To logout from the current user

 8. $finger: It displays complete information about all the users who are logged in

 9. $who: To display information about all users who have logged into the system
currently (login name, terminal number, date and time).

 10. $who am i: It displays username, terminal number, date and time at which you logged
into the system.

 11. $cal: It displays previous, present and next month calendar

 $cal year
$cal month year

 12. $ls: This command is used to find list of files except hidden.

 Administrator Commands

Page 52 of 59
 1. # system administrator prompt
$ user working prompt

 2. #useradd: To add the user terminals.

 #useradd user1

 3. #password: To give the password to a particular user terminal.

 #password user1
Enter password:
Retype password:
 System Run Levels

 #init: To change the system run levels.

 #init 0: To shut down the system


#init 1or s: To bring the system to single user mode.
#init 2: To bring the system multi user mode with no resource shared.
#init 3: To bring the system multi user mode with resource shared.
#init 6: Halt and reboot the system to the default run level.

 ls command options

 $ls: This command is used to display list of files and directories.

 $ls -x: It displays width wise

 $ls | pg: It displays list of files and directories page wise & width wise.

 $ls -a: It displays files and directories including. and .. hidden files

 $ls -f: It displays files, directories, executable files, symbolic files

 a1
a2
sample\
letter*
notes@

 \ directory
* executable files
- symbolic link file

 $ls -r: It displays files and directories in reverse order (descending order)

Page 53 of 59
 $ls -R: It displays files and directories recursively.

 $ls -t: It displays files and directories based on date and time of creation.

 $ls -i: It displays files and directories along with in order number.

 $ls -l: It displays files and directories in long list format.


file type permission link uname group sizeinbytes date filename

- rw-r 1 Tec group 560 nov 3 1:30 sample

d rwxr-wr 2 Tec group 430 mar 7 7.30 student

 Different types of files:


- ordinary file
d directory file
b special block files
c special character file
l symboliclink file

 Wild card characters or Meta characters



 Wild card characters * , ? , [] ,-, . ,^,$

wild card character Description

   

* It matches one or more characters

? It matches any single character

[] It matches any single character in the given list

- It matches any character in that range

. It matches any single character except enter key character.

^ It displays line which starts with that character

$ It displays line which ends with that character.

Page 54 of 59

 Examples:

 * wild card

 $ ls t* :  It displays all files starting with 't' character.

 $ ls *s: It displays all files ending with 's'

 $ ls b*k: It displays all files starting with b and ending with k


 ? wild card

 $ ls t??:It displays all files starting with 't' character and also file length must be 3 characters.

 $ ls ?ram: It displays all files starting with any character and also the length of the file must
be 4 characters.



 - wild card

 $ ls [a-z]ra: It displays all files starting with any character between a to z and ends with
'ra'.The length of the file must be 3 characters.


 [ ] wild card

 $ ls [aeiou]ra: It displays all files starting with 'a' or 'e' or 'i' or 'o' or 'u' character and ending
with 'ra'.

 $ ls [a-b]*: It displays all files starting with any character from a to b.


 . wild card

 $ lst..u: It displays all files starting with 't' character and ending with 'u'. The length of the file
must be4 characters. It never includes enter key character.

Page 55 of 59

 ^ wild card

 $ ls *sta$: It displays all files which ends with 'sta'. The length of the word can be any
number of characters.

 Filter Commands: grep, fgrep and egrep:
 $ grep :Globally Search a Regular expression and print it.

 This command is used to search a particular pattern (single string) in a file or directory and
regular expression (pattern which uses wild card characters).

 Syntax - $grep pattern filename

 Ex. $grepdwhlabsdwh (It displays all the lines in a file which as dwhlabs pattern/string) -
Character Pattern
Ex: $grep "dwhlabs website" dwh - character pattern
eg. $grep \<dwhlabs\>dwh (It displays all the line in a file which as a exact word 'dwhlabs') -
Word Pattern
eg. $grep ^UNIX$ language (It displays all the lines in the file which has a single word 'UNIX' )
- Line Pattern

 $grep options

 $grep -i pattern filename (ignore case sensitive)


$grep -c pattern filename (It displays total count of lines)
$grep -n pattern filename (It displays along with the line number)
$grep -l pattern filename (It displays filename in the current directory)
$grep -v pattern filename (It displays unmatched pattern lines)


 Q) How to remove blank lines in a file?

 A)$grep -v "^$" file1>tmp | mv tmp>file1

Page 56 of 59
 Explanation: Piping concept is used to combine more than one command. Here we first
identifying the non blank files in file1 and redirecting the result to tmp (temporary file).This
temporary file is input to the next command. Now we are renaming the tmp into file1.Now
check the file1 result <You will not find any blank lines>


 fgrep:

 This command is used to search multiple strings or one string but not regular expression.

 Syntax - $fgrep "pattern1


>pattern2
>....
>patternn " filename
Ex: $fgrep "unix
>c++
>Data Warehouse" stud


 egrep: Extended grep

 This command is used to search for single or multiple patterns and also regular expressions.

 Filter Commands: cut, sort and uniq:


 Filter Commands:

 $ cut: This command is used to retrieve the required fields or characters from a file.

100 Rakesh UNIX HYD

101 Ramani C++ CHE

102 Prabhu C BAN

103 Jyosh DWH CHE

 Syntax - $cut -f 1-3 filename

Page 57 of 59
 Ex. $cut -f 0-2 employee (all the fields in the above file)

100 Rakesh UNIX

101 Ramani C++

102 Prabhu C

103 Jyosh DWH

 Ex. $cut -c 1-8 employee (first 8 characters)

100 Rake

101 Rama

102 Prab

103 Jyos

 Delimeters:

Default ( Tab )

: , ; * _

 $cut options

 $cut -f 0-2 employee


$cut -c 1-8 employee
$cut -d ',' 1-8 employee

 $ sort: This command is used to sort the file records in ascending or descending

 $sort Options

 $sort -r filename (reverse)


$sort -u filename (unique records)
$sort -n filename (show the number along sorting)

Page 58 of 59

 How to sort data in field wise?

 Syntax- $sort -f +pos1 -pos2 filename


Ex: $sort -f +1 -2 employee


 $ uniq: This command is used to filter the duplicate lines from a file
Note: Always the file must be in sorting order.

 Syntax - $uniq filename


Ex: $uniq employee

 $uniq Options

 Ex: $uniq -u employee (displays non duplicated lines)


Ex: $uniq -d employee (displays duplicated lines)


 Question: How to remove duplicate lines in a file?

 Answer: $uniq employee>temp


$mv temp employee

 $ tr: Translate the character in string1 from stdin into those in string2 in stdout.

$ sed: This command is used for editing files from a script or from the command line.
 $ head: Display first 'N' number of files.

$ tail :: Display last 'N' number of files.

$ cmp: Compare two files andd list where differences occur.

$ diff: Compare the two files and shows the difference.

$ wc: Display word (or character or line ) count for file.

Page 59 of 59

You might also like