Interview Questions
Interview Questions
I am working with onsite offshore model so we use to get the tasks from my onsite
team.
As a developer first I need to understand the physical data model i.e. dimensions and
facts; their relationship & also functional specifications that tells the business
requirement designed by Business Analyst.
I involved into the preparation of source to target mapping sheet (tech Specs) which
tell us what is the source and target and which column we need to map to target
column and also what would be the business logic. This document gives the clear
picture for the development.
Creating informatica mappings, sessions and workflows using different
transformations to implement business logic.
Preparation of Unit test cases also one of my responsibilities as per the business
requirement.
And also involved into Unit testing for the mappings developed by myself.
I use to source code review for the mappings & workflows developed by my team
members.
And also involved into the preparation of deployment plan which contains list of
mappings and workflows they need to migrate based on this deployment team can
migrate the code from one environment to another environment.
Once the code rollout to production we also work with the production support team
for 2 weeks where we parallel give the KT. So we also prepare the KT document as well
for the production team.
Manufacturing or Supply Chain
Currently I am working for XXX project for YYY client. Generally YYY does not have a
manufacturing unit, What BIZ (Business) use to do here is before quarter ends they will call
for quotations for primary supply channels this process we called as a RFQ’s (Request for
quotations).Once BIZ creates RFQ automatically notification will go to supply channels .So
Page 1 of 59
these supply channels send back their respective quoted values that we called it as response
from the supply channel. After that biz will start negotiations with supply channels for the
best deal then they approve the RFQ’s.
All these activities (Creating RFQ, supplier response and approve RFQ etc.)Performed in the
oracle apps this is source frontend tool application. These data which get stored into OLTP.
So the OLTP contains all the RFQs, supplier response and approval status data.
We have some Oracle jobs running between OLTP and ODS which replicate the OLTP data to
ODS. It is designed in such a way that any transaction entering into the OLTP is immediately
reflected into the ODS.
We have a staging area where we load the entire ODS data into staging tables for this we
have created some ETL informatica mappings these mappings will truncate and reload the
staging tables for each session run. Before loading to staging tables we are dropping indexes
then after loading bulk data we are recreating indexes using store procedures.
Then we extract all this data from stage & load it into the dimensions & facts on top of
dims and facts we have created some materialized views as per the report requirement.
Finally report directly pulls the data from MV .These reports /dashboards performance
always good because we are not doing any calculation at reporting level. These
dashboards/reports can be used for the analysis purpose like say how many RFQs created,
how many RFQs approved, how many RFQs got responded from the supply channels?
Who is the approval manager pending with whom what is the feedback of the supply
channels from the past etc?
In the present system they don’t have the BI design, so they are using the manual process
by exporting the sql query data to excel sheetan preparing PI charts using macros. In the
new system we are providing BI passion reports like drill down, drillups, PIcharts, Graphs,
detail reports and Dashboards.
Biz Reports
We are replacing the exact same functionality of web methods using Informatica.
Generally Once production completes for any product they will send it to delivery centers from the
DC (Delivery Centers) it will be shipped to Suppliy channels or Distributors from there it will go to
end customers.
Before start production of any product bizz approval is essential for the production Unit.
Before taking the decision Bizz has to do some analysis on the existing stock, previous sales history
and future orders etc..to do these they need the reports in BI passion(Drill down and Drill ups)these
reports created in BO, it would show what is on stock in the each delivery center ,shipping status
,previous sales History and what would be the customer orders for each of the product across all
the delivery centers.Bizz will buy those details from Third party IMS Company.
IMS will collect the information from different distrusters and delivery centers like what is the on
hand stock, shipping stock and how many orders we have in hand for the next quarter and also
what was the previous sales history for specific products.
Page 3 of 59
We have a staging area where we load the entire IMS data into staging tables for this we have
created some ETL informatica mappings these mappings will truncate and reload the staging tables
for each session run. Before loading to staging tables we are dropping indexes then after loading
bulk data we are recreating indexes using store procedures. After completion of sagging load we
load the data in to our dims and facts. On top of our data model we have created some materialized
views where we have complete reporting calculations from Materialized views we will pull the data
to BO reports with less join and less aggregations. So report performance is good.
ORACLE
1) I am good in SQL,I use to write the source qualifier queries for informatica mappings
as per the business requirement. (OR) I use to write sql queries in source qualifier for
informatica mappings as per the business requirement.
2) I am comfortable to work with joins; co related queries, sub queries, analyzing
tables, inline views and materialized views.
3) As a informatica developer I could not get more opportunity to work on pl/sql side.
But I worked on PL/SQL to informatica migration project so I do have exposure on
procedure, function and triggers.
Page 4 of 59
Materialized View
Materialized view is very essential for reporting. If we don’t have the materialized view it
will directly fetch the data from dimension and facts. This process is very slow since it
involves multiple joins. So the same report logic if we put in the materialized view. We can
fetch the data directly from materialized view for reporting purpose. So that we can avoid
multiple joins at report run time.
It is always necessary to refresh the materialized view. Then it can simply perform select
statement on materialized view.
Difference between Trigger and Procedure
Page 5 of 59
Stored procedure is a pre-compiled But function is not a pre-compiled
statement. statement.
Stored procedure accepts more than one Whereas function does not accept
argument. arguments.
Stored procedures are mainly used to Functions are mainly used to compute values
process the tasks.
Cannot be invoked from SQL statements. E.g. Can be invoked form SQL statements e.g.
SELECT SELECT
Can affect the state of database using Cannot affect the state of database.
commit.
Stored as a pseudo-code in database i.e. Parsed and compiled at runtime.
compiled form.
Rowid Rownum
Rowid is an oracle internal id that is Rownum is a row number returned by a
allocated every time a new record is inserted select statement.
in a table. This ID is unique and cannot be
changed by the user.
Rowid is permanent. Rownum is temporary.
Rowid is a globally unique identifier for a The rownumpseudocoloumn returns a
row in a database. It is created at the time number indicating the order in which oracle
the row is inserted into the table, and selects the row from a table or set of joined
destroyed when it is removed from a table. rows.
Joiner Lookup
In joiner on multiple matches it will In lookup it will return either first
return all matching records. record or last record or any value or
error value.
In joiner we cannot configure to use Where as in lookup we can configure
persistence cache, shared cache, to use persistence cache, shared
uncached and dynamic cache cache, uncached and dynamic cache.
We cannot override the query in We can override the query in lookup
joiner to fetch the data from multiple tables.
We can’t perform any filters along We can apply filters along with lkp
Page 6 of 59
with join condition in joiner conditions using lkp query override
transformation. lookup transformation.
We cannot use relational operators in Where as in lookup we can use the
joiner transformation.(i.e. <,>,<= and relation operators. (i.e. <,>,<= and so
so on) on)
What is the difference between source qualifier and lookup?
Page 7 of 59
override sources we need N-1 joiners.
Stoped:
You choose to stop the workflow or task in the Workflow Monitor or through
pmcmd. The Integration Service stops processing the task and all other tasks in
its path. The Integration Service continues running concurrent tasks like
backend store procedures.
Abort:
You choose to abort the workflow or task in the Workflow Monitor or through
pmcmd. The Integration Service kills the DTM process and aborts the task.
Select empno, count (*) from EMP group by empno having count (*)>1;
Delete from EMP where rowid not in (select max (rowid) from EMP group by empno);
What is your tuning approach if SQL query taking long time? Or how do u tune SQL query?
If query taking long time then First will run the query in Explain Plan, The explain plan
process stores data in the PLAN_TABLE.
it will give us execution plan of the query like whether the query is using the relevant
indexes on the joining columns or indexes to support the query are missing.
If joining columns doesn’t have index then it will do the full table scan if it is full table scan
the cost will be more then will create the indexes on the joining columns and will run the
Page 8 of 59
query it should give better performance and also needs to analyze the tables if analyzation
happened long back.The ANALYZE statement can be used to gather statistics for a specific
table, index or cluster using
If still have performance issue then will use HINTS, hint is nothing but a clue. We can use
hints like
ALL_ROWS
one of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing systems.
FIRST_ROWS
one of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.
CHOOSE
one of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS, based on
statistics gathered.
HASH
Hashes one table (full scan) and creates a hash index for that table. Then hashes
other table and uses hash index to find corresponding records. Therefore not
suitable for < or > join conditions.
/*+ use_hash */
DWH Concepts
OLTP DWH/DSS/OLAP
OLTP maintains only current information. OLAP contains full history.
It is a normalized structure. It is a de-normalized structure.
Its volatile system. Its non-volatile system.
Page 9 of 59
It cannot be used for reporting purpose. It’s a pure reporting system.
Since it is normalized structure so here it Here it does not require much joins to fetch
requires multiple joins to fetch the data. the data.
It’s not time variant. Its time variant.
It’s a pure relational model. It’s a dimensional model.
If target and source databases are different and target table volume is high it contains some
millions of records in this scenario without staging table we need to design your informatica
using look up to find out whether the record exists or not in the target table since target has
huge volumes so its costly to create cache it will hit the performance.
If we create staging tables in the target database we can simply do outer join in the source
qualifier to determine insert/update this approach will give you good performance.
It will avoid full table scan to determine insert/updates on target.
And also we can create index on staging tables since these tables were designed for
specific application it will not impact to any other schemas/users.
Data cleansing, also known as data scrubbing, is the process of ensuring that a set of data
is correct and accurate. During data cleansing, records are checked for accuracy and
consistency.
ODS:
My understanding of ODS is, it’s a replica of OLTP system and so the need of this, is to
reduce the burden on production system (OLTP) while fetching data for loading targets.
Hence its a mandate Requirement for every Warehouse.
Page 10 of 59
ODS is the replication of OLTP.
ODS is usually getting refreshed through some oracle jobs.
A primary key is a special constraint on a column or set of columns. A primary key constraint
ensures that the column(s) so designated have no NULL values, and that every value is
unique. Physically, a primary key is implemented by the database system using a unique
index, and all the columns in the primary key must have been declared NOT NULL. A table
may have only one primary key, but it may be composite (consist of more than one column).
A surrogate key is any column or set of columns that can be declared as the primary key
instead of a "real" or natural key. Sometimes there can be several natural keys that could be
declared as the primary key, and these are all called candidate keys. So a surrogate is a
candidate key. A table could actually have more than one surrogate key, although this
would be unusual. The most common type of surrogate key is an incrementing integer, such
as an auto increment column in MySQL, or a sequence in Oracle, or an identity column in
SQL Server.
Have you done any Performance tuning in informatica?
1) Yes, One of my mapping was taking 3-4 hours to process 40 millions rows into
staging table we don’t have any transformation inside the mapping its 1 to 1
mapping .Here nothing is there to optimize the mapping so I created session
partitions using key range on effective date column. It improved performance lot,
rather than 4 hours it was running in 30 minutes for entire 40millions.Using
partitions DTM will creates multiple reader and writer threads.
2) There was one more scenario where I got very good performance in the mapping
level .Rather than using lookup transformation if we can able to do outer join in the
source qualifier query override this will give you good performance if both lookup
table and source were in the same database. If a lookup table is huge volumes then
creating cache is costly.
3) And also if we can able to optimize mapping using less no of transformations always
gives you good performance.
4) If any mapping taking long time to execute then first we need to look in to source
and target statistics in the monitor for the throughput and also find out where
exactly the bottle neck by looking busy percentage in the session log will come to
Page 11 of 59
know which transformation taking more time ,if your source query is the bottle neck
then it will show in the end of the session log as “query issued to database “that
means there is a performance issue in the source query.we need to tune the query
using .
cd /pmar/informatica/pc/pmserver/
2) And if we suppose to process flat files using informatica but those files were exists in
remote server then we have to write script to get ftp into informatica server before start
process those files.
3) And also file watch mean that if indicator file available in the specified location then we
need to start our informatica jobs otherwise will send email notification using
Mail X command saying that previous jobs didn’t completed successfully something like
that.
4) Using shell script update parameter file with session start time and end time.
This kind of scripting knowledge I do have. If any new UNIX requirement comes then I can
Google and get the solution implement the same.
If we copy source definitions or target definitions or Mapplets from Shared folder to any
other folders that will become a shortcut.
Page 12 of 59
Let’s assume we have imported some source and target definitions in a shared folder after
that we are using those sources and target definitions in another folder as a shortcut in
some mappings.
If any modifications occur in the backend (Database) structurelike adding new columns or
drop existing columns either in source or target I f we’reimport into shared folder those new
changes automatically it would reflect in all folder/mappings wherever we used those
sources or target definitions.
Source:
Ename EmpNo
stev 100
methew 100
john 101
tom 101
Target:
Ename EmpNo
Stevmethew 100
John tom 101
Ans:
If record doesn’t exit do insert in target .If it is already exist then get corresponding Ename
value from lookup andconcat in expression with current Ename value then update the
target Ename column using update strategy.
Sort the data in sq based on EmpNo column then Use expression to store previous record
information using Var port after that use router to insert a record if it is first time if it is
already inserted then update Ename with concat value of prev name and current name
value then update in target.
Page 13 of 59
How to send Unique (Distinct) records into One target and duplicates into another tatget?
Source:
Ename EmpNo
stev 100
Stev 100
john 101
Mathew 102
Output:
Target_1:
Ename EmpNo
Stev 100
John 101
Mathew 102
Target_2:
Ename EmpNo
Stev 100
Ans:
If record doen’t exit do insert in target_1 .If it is already exist then send it to Target_2 using
Router.
Sort the data in sq based on EmpNo column then Use expression to store previous record
information using Var ports after that use router to route the data into targets if it is first
time then sent it to first target if it is already inserted then send it to Tartget_2.
Page 14 of 59
I want to generate the separate file for every employee (as per Name, it should generate
file).It has to generate 5 flat files andname of the flat file is corresponding employee name
that is the requirement.
Below is my mapping.
Source:
This functionality was added in informatica 8.5 onwards earlier versions it was not there.
We can achieve it with use of transaction control and special "FileName" port in the target
file .
In order to generate the target file names from the mapping, we should make use of the
special "FileName" port in the target file. You can't create this special port from the usual
New port button. There is a special button with label "F" on it to the right most corner of the
target flat file when viewed in "Target Designer".
When you have different sets of input data with different target files created, use the same
instance, but with a Transaction Control transformation which defines the boundary for the
source sets.
in target flat file there is option in column tab i.e filename as column.
when you click that one non editable column gets created in metadata of target.
Page 15 of 59
How do u populate 1st record to 1sttarget , 2nd record to 2nd target ,3rd record to 3rd target
and 4th record to 1sttargetthroughinformatica?
We can do using sequence generator by setting end value=3 and enable cycle option.then
in the router take 3 goups
In 1stgroup specify condition as seq next value=1 pass those records to 1st target simillarly
In 2nd group specify condition as seq next value=2 pass those records to 2 nd target
In 3rd group specify condition as seq next value=3 pass those records to 3rd target.
Since we have enabled cycle option after reaching end value sequence generator will start
from 1,for the 4th record seq.next value is 1 so it will go to 1st target.
Incremental means suppose today we processed 100 records ,for tomorrow run u need to
extract whatever the records inserted newly and updated after previous run based on last
updated timestamp (Yesterday run) this process called as incremental or delta.
Page 16 of 59
Logic in the mapping variable is
Page 17 of 59
Logic in the SQ is
Page 18 of 59
In expression assign max last update date value to the variable using function set max variable
Page 19 of 59
Logic in the update strategy is below
Page 20 of 59
Approach_2:Using parameter file
Parameterfile format
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.
ST:s_m_GEHC_APO_BAAN_SALES_HIST_AUSTRIA]
$DBConnection_Source=DMD2_GEMS_ETL
$DBConnection_Target=DMD2_GEMS_ETL
Page 21 of 59
Updating parameter File
Page 22 of 59
Main mapping
Page 23 of 59
Sqloverride in SQ Transformation
Workflod Design
Page 24 of 59
Parameter file
It is a text file below is the format for parameter file. We use to place this file in the unix box where
we have installed our informatic server.
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.
ST:s_m_GEHC_APO_BAAN_SALES_HIST_AUSTRIA]
$InputFileName_BAAN_SALE_HIST=/interface/dev/etl/apo/srcfiles/HS_025_20070921
$DBConnection_Target=DMD2_GEMS_ETL
$$CountryCode=AT
$$CustomerNumber=120165
Page 25 of 59
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.
ST:s_m_GEHC_APO_BAAN_SALES_HIST_BELGIUM]
$DBConnection_Sourcet=DEVL1C1_GEMS_ETL
$OutputFileName_BAAN_SALES=/interface/dev/etl/apo/trgfiles/HS_002_20070921
$$CountryCode=BE
$$CustomerNumber=101495
1 First we need to create two control tables cont_tbl_1 and cont_tbl_1 with
structure of session_st_time,wf_name
2 Then insert one record in each table with session_st_time=1/1/1940 and
workflow_name
3 create two store procedures one for update cont_tbl_1 with session st_time,
set property of store procedure type as Source_pre_load .
4 In 2nd store procedure set property of store procedure type as Target
_Post_load.thisproc will update the session _st_time in Cont_tbl_2 from
cnt_tbl_1.
5 Then override source qualifier query to fetch only LAT_UPD_DATE >=(Select
session_st_time from cont_tbl_2 where workflow name=’Actual work flow
name’.
How to load cumulative salary in to target ?
Solution:
Using var ports in expression we can load cumulative salary into target.
Page 26 of 59
Also below is the logic for converting columns into Rows without using Normalizer
Transformation.
Page 27 of 59
1) Source will contain two columns address and id.
1) Use Aggregator transformation and check group by on port id only. As shown below:-
Page 28 of 59
Difference between dynamic lkp and static lkp cache?
1 In Dynamic lkp the cache memory will get refreshed as soon as the record get
inserted or updated/deleted in the lookup table where as in static lookup the
cache memory will not get refreshed even though record inserted or updated
in the lookup table it will refresh only in the next session run.
2 Best example where we need to use dynamic chache is if suppose first record
and last record both are same but there is a change in the address what
informatica mapping has to do here is first record needs to get insert and last
record should get update in the target table.
3 If we use static look up first record it will go to lookup and check in the lkp
cache based on the condition it will not find the macth so it returns null value
then in the router will send that recod to insert flow.
4 But still this record does not available in the cach memory so when the last
record comes to look up it will check in the cache it will not find the match it
returns null values again it will go to insert flow through router but it suppose
to go update flow because cache didn’t refreshed when the first record get
insert in to target table. So if we use dynamic look up we can achieve our
requirement because first time record get insert then immediately cache also
get refresh with the target data. When we process last record it will find the
match in the cache so it returns the value then router will route that record to
update flow.
What is the difference between snow flake and star schema
Page 29 of 59
A star schema optimizes the performance by Snowflake schemas normalize dimensions to
keeping queries simple and providing fast eliminated redundancy. The result is more
response time. All the information about the complex queries and reduced query
each level is stored in one row. performance.
It is called a star schema because the It is called a snowflake schema because the
diagram resembles a star. diagram resembles a snowflake.
Page 30 of 59
columns that are output ports.
Joiner Lookup
In joiner on multiple matches it will return all
In lookup it will return either first record or
matching records. last record or any value or error value.
In joiner we cannot configure to use Where as in lookup we can configure to use
persistence cache, shared cache, uncached persistence cache, shared cache, uncached
and dynamic cache and dynamic cache.
We cannot override the query in joiner We can override the query in lookup to fetch
the data from multiple tables.
We can perform outer join in joiner We cannot perform outer join in lookup
transformation. transformation.
We cannot use relational operators in joiner Where as in lookup we can use the relation
transformation.(i.e. <,>,<= and so on) operators. (i.e. <,>,<= and so on)
Howto Process multiple flat files to single target table through informatica if all files are
same structure?
We can process all flat files through one mapping and one session using list file.
First we need to create list file using unixscript for all flat file the extension of the list file
is .LST.
How to populate file name to target while loading multiple files using list file concept.
In informatica 8.6 by selecting Add currently processed flatfile name option in the
properties tab of source definition after import source file defination in source analyzer.It
will add new column as currently processed file name.we can map this column to target to
populate filename.
We have one of the dimension in current project called resource dimension. Here we
are maintaining the history to keep track of SCD changes.
To maintain the history in slowly changing dimension or resource dimension. We
followed SCD Type-II Effective-Date approach.
My resource dimension structure would be eff-start-date, eff-end-date, s.k and
source columns.
Page 32 of 59
Whenever I do a insert into dimension I would populate eff-start-date with sysdate,
eff-end-date with future date and s.k as a sequence number.
If the record already present in my dimension but there is change in the source data.
In that case what I need to do is
Update the previous record eff-end-date with sysdate and insert as a new record
with source data.
Once you fetch the record from source qualifier. We will send it to lookup to find out
whether the record is present in the target or not based on source primary key
column.
Once we find the match in the lookup we are taking SCD column and s.k column
from lookup to expression transformation.
In lookup transformation we need to override the lookup override query to fetch
active records from the dimension while building the cache.
In expression transformation I can compare source with lookup return data.
If the source and target data is same then I can make a flag as ‘S’.
If the source and target data is different then I can make a flag as ‘U’.
If source data does not exists in the target that means lookup returns null value. I
can flag it as ‘I’.
Based on the flag values in router I can route the data into insert and update flow.
If flag=’I’ or ‘U’ I will pass it to insert flow.
If flag=’U’ I will pass this record to eff-date update flow
When we do insert we are passing the sequence value to s.k.
Whenever we do update we are updating the eff-end-date column based on lookup
return s.k value.
Complex Mapping
We have one of the order file requirement. Requirement is every day in source
system they will place filename with timestamp in informatica server.
We have to process the same date file through informatica.
Source file directory contain older than 30 days files with timestamps.
For this requirement if I hardcode the timestamp for source file name it will process
the same file every day.
So what I did here is I created $InputFilename for source file name.
Then I am going to use the parameter file to supply the values to session variables
($InputFilename).
To update this parameter file I have created one more mapping.
This mapping will update the parameter file with appended timestamp to file name.
I make sure to run this parameter file update mapping before my actual mapping.
Page 33 of 59
How to handle errors in informatica?
We have one of the source with numerator and denominator values we need to
calculate num/deno
When populating to target.
If deno=0 I should not load this record into target table.
We need to send those records to flat file after completion of 1 st session run. Shell
script will check the file size.
If the file size is greater than zero then it will send email notification to source
system POC (point of contact) along with deno zero record file and appropriate email
subject and body.
If file size<=0 that means there is no records in flat file. In this case shell script will
not send any email notification.
Or
We are expecting a not null value for one of the source column.
If it is null that means it is a error record.
We can use the above approach for error handling.
Worklet
Worklet is a set of reusable sessions. We cannot run the worklet without workflow.
If both workflow exists in same folder we can create 2 worklet rather than creating 2
workfolws.
Finally we can call these 2 worklets in one workflow.
There we can set the dependency.
If both workflowsexists in different folders or repository then we cannot create
worklet.
We can set the dependency between these two workflow using shell script is one
approach.
The other approach is event wait and event rise.
As soon as it completes first workflow we are creating zero byte file (indicator file).
If indicator file is available in particular location. We will run second workflow.
If indicator file is not available we will wait for 5 minutes and again we will check for
the indicator. Like this we will continue the loop for 5 times i.e 30 minutes.
After 30 minutes if the file does not exists we will send out email notification.
Page 34 of 59
In event wait it will wait for infinite time. Till the indicator file is available.
Parameter file it will supply the values to session level variables and mapping level
variables.
What is the difference between mapping level and session level variables?
Mapping level variables always starts with $$.
A session level variable always starts with $.
Flat File
Flat file is a collection of data in a file in the specific format.
Delimiter
Fixed Width
In fixed width we need to known about the format first. Means how many character to read
for particular column.
Page 35 of 59
In delimiter also it is necessary to know about the structure of the delimiter. Because to
know about the headers.
If the file contains the header then in definition we need to skip the first row.
List file:
If you want to process multiple files with same structure. We don’t need multiple mapping
and multiple sessions.
We can use one mapping one session using list file option.
First we need to create the list file for all the files. Then we can use this file in the main
mapping.
Aggregator Transformation:
Transformation type:
Active
Connected
The Aggregator transformation performs aggregate calculations, such as averages and sums.
The Aggregator transformation is unlike the Expression transformation, in that you use the
Aggregator transformation to perform calculations on groups. The Expression
transformation permits you to perform calculations on a row-by-row basis only.
The Aggregator is an active transformation, changing the number of rows in the pipeline.
The Aggregator transformation has the following components and options
Aggregate cache: The Integration Service stores data in the aggregate cache until it
completes aggregate calculations. It stores group values in an index cache and row data in
the data cache.
Aggregate expression: Enter an expression in an output port. The expression can include
non-aggregate expressions and conditional clauses.
Group by port: Indicate how to create groups. The port can be any input, input/output,
output, or variable port. When grouping data, the Aggregator transformation outputs the
last row of each group unless otherwise specified.
Sorted input: Select this option to improve session performance. To use sorted input, you
must pass data to the Aggregator transformation sorted by group by port, in ascending or
descending order.
Aggregate Expressions:
Page 36 of 59
The Designer allows aggregate expressions only in the Aggregator transformation. An
aggregate expression can include conditional clauses and non-aggregate functions. It can
also include one aggregate function nested within another aggregate function, such as:
The result of an aggregate expression varies depending on the group by ports used in the
transformation
Aggregate Functions
Use the following aggregate functions within an Aggregator transformation. You can nest
one aggregate function within another aggregate function.
AVG
COUNT
FIRST
LAST
MAX
MEDIAN
MIN
PERCENTILE
STDDEV
SUM
VARIANCE
When you use any of these functions, you must use them in an expression within an
Aggregator transformation.
Tips
Sorted input reduces the amount of data cached during the session and improves session
performance. Use this option with the Sorter transformation to pass sorted data to the
Aggregator transformation.
Limit the number of connected input/output or output ports to reduce the amount of data
the Aggregator transformation stores in the data cache.
If you use a Filter transformation in the mapping, place the transformation before the
Aggregator transformation to reduce unnecessary aggregation.
Page 37 of 59
Normalizer Transformation:
Transformation type:
Active
Connected
The Normalizer transformation receives a row that contains multiple-occurring columns and
returns a row for each instance of the multiple-occurring data.
The Normalizer transformation generates a key for each source row. The Integration Service
increments the generated key sequence number each time it processes a source row. When
the source row contains a multiple-occurring column or a multiple-occurring group of
columns, the Normalizer transformation returns a row for each occurrence. Each row
contains the same generated key value.
Transaction Control Transformation
Transformation type:
Active
Connected
PowerCenter lets you control commit and roll back transactions based on a set of rows that
pass through a Transaction Control transformation. A transaction is the set of rows bound
by commit or roll back rows. You can define a transaction based on a varying number of
input rows. You might want to define transactions based on a group of rows ordered on a
common key, such as employee ID or order entry date.
When you run the session, the Integration Service evaluates the expression for each row
that enters the transformation. When it evaluates a commit row, it commits all rows in the
transaction to the target or targets. When the Integration Service evaluates a roll back row,
it rolls back all rows in the transaction from the target or targets.
If the mapping has a flat file target you can generate an output file each time the Integration
Service starts a new transaction. You can dynamically name each target flat file.
Page 38 of 59
Union Transformation
A union transformation is used merge data from multiple sources similar to the UNION ALL SQL
statement to combine the results from two or more SQL statements.
2. As union transformation gives UNION ALL output, how you will get the UNION output?
Pass the output of union transformation to a sorter transformation. In the properties of sorter
transformation check the option select distinct. Alternatively you can pass the output of union
transformation to aggregator transformation and in the aggregator transformation specify all ports
as group by ports.
The following rules and guidelines need to be taken care while working with union
transformation:
You can create multiple input groups, but only one output group.
All input groups and the output group must have matching ports. The precision, datatype,
and scale must be identical across all groups.
The Union transformation does not remove duplicate rows. To remove duplicate rows,
you must add another transformation such as a Router or Filter transformation.
You cannot use a Sequence Generator or Update Strategy transformation upstream from
a Union transformation.
The Union transformation does not generate transactions.
Union is an active transformation because it combines two or more data streams into one.
Though the total number of rows passing into the Union is the same as the total number of rows
passing out of it, and the sequence of rows from any given input stream is preserved in the
output, the positions of the rows are not preserved, i.e. row number 1 from input stream 1 might
not be row number 1 in the output stream. Union does not even guarantee that the output is
repeatable
Aggregator Transformation
Page 39 of 59
Use sorted input: Sort the data before passing into aggregator. The integration service
uses memory to process the aggregator transformation and it does not use cache memory.
Filter the unwanted data before aggregating.
Limit the number of input/output or output ports to reduce the amount of data the
aggregator transformation stores in the data cache.
AVG
COUNT
FIRST
LAST
MAX
MEDIAN
MIN
PERCENTILE
STDDEV
SUM
VARIANCE
5. Why cannot you use both single level and nested aggregate functions in a single aggregate
transformation?
The nested aggregate function returns only one output row, whereas the single level aggregate
function returns more than one row. Since the number of rows returned are not same, you cannot
use both single level and nested aggregate functions in the same transformation. If you include
both the single level and nested functions in the same aggregator, the designer marks the
mapping or mapplet as invalid. So, you need to create separate aggregator transformations.
The integration service performs aggregate calculations and then stores the data in historical
cache. Next time when you run the session, the integration service reads only new data and uses
the historical cache to perform new aggregation calculations incrementally.
In incremental aggregation, the aggregate calculations are stored in historical cache on the
server. In this historical cache the data need not be in sorted order. If you give sorted input, the
records come as presorted for that particular run but in the historical cache the data may not be
in the sorted order. That is why this option is not allowed.
Page 40 of 59
You can configure the integration service to treat null values in aggregator functions as NULL or
zero. By default the integration service treats null values as NULL in aggregate functions.
Normalizer Transformation
The normalizer transformation receives a row that contains multiple-occurring columns and
retruns a row for each instance of the multiple-occurring data. This means it converts column
data in to row data. Normalizer is an active transformation.
Since the cobol sources contain denormalzed data, normalizer transformation is used to
normalize the cobol sources.
The integration service increments the generated key sequence number each time it
process a source row. When the source row contains a multiple-occurring column or a multiple-
occurring group of columns, the normalizer transformation returns a row for each occurrence.
Each row contains the same generated key value.
The normalizer transformation has a generated column ID (GCID) port for each multiple-
occurring column. The GCID is an index for the instance of the multiple-occurring data. For
example, if a column occurs 3 times in a source record, the normalizer returns a value of 1,2 or
3 in the generated column ID.
4. What is VSAM?
VSAM (Virtual Storage Access Method) is a file access method for an IBM mainframe operating
system. VSAM organize records in indexed or sequential flat files.
The VSAM normalizer transformation is the source qualifier transformation for a COBOL source
definition. A COBOL source is flat file that can contain multiple-occurring data and multiple types
of records in the same file.
Pipeline normalizer transformation processes multiple-occurring data from relational tables or flat
files.
Occurs clause is specified when the source row has a multiple-occurring columns.
A redefines clause is specified when the source has rows of multiple columns
Rank Transformation
Page 41 of 59
A rank transformation is used to select top or bottom rank of data. This means, it
selects the largest or smallest numeric value in a port or group. Rank
transformation also selects the strings at the top or bottom of a session sort
order. Rank transformation is an active transformation.
The integration service compares input rows in the data cache, if the input row
out-ranks a cached row, the integration service replaces the cached row with the
input row. If you configure the rank transformation to rank across multiple groups,
the integration service ranks incrementally for each group it finds. The integration
service stores group information in index cache and row data in data cache.
The designer creates RANKINDEX port for each rank transformation. The
integration service uses the rank index port to store the ranking position for each
row in a group.
4. How do you specify the number of rows you want to rank in a rank
transformation?
No. We can specify to rank the data based on only one port. In the ports tab, you
have to check the R option for designating the port as a rank port and this option
can be checked only on one port
Joiner Transformation
A joiner transformation joins two heterogeneous sources. You can also join the data from the
same source. The joiner transformation joins sources with at least one matching column. The
joiner uses a condition that matches one or more joins of columns between the two sources.
Page 42 of 59
You cannot use a joiner transformation when input pipeline contains an update strategy
transformation.
You cannot use a joiner if you connect a sequence generator transformation directly
before the joiner.
Normal join: In a normal join, the integration service discards all the rows from the master
and detail source that do not match the join condition.
Master outer join: A master outer join keeps all the rows of data from the detail source
and the matching rows from the master source. It discards the unmatched rows from the master
source.
Detail outer join: A detail outer join keeps all the rows of data from the master source and
the matching rows from the detail source. It discards the unmatched rows from the detail
source.
Full outer join: A full outer join keeps all rows of data from both the master and detail
rows.
When the integration service processes a joiner transformation, it reads the rows from master
source and builds the index and data cached. Then the integration service reads the detail
source and performs the join. In case of sorted joiner, the integration service reads both sources
(master and detail) concurrently and builds the cache based on the master rows.
When the integration service processes an unsorted joiner transformation, it reads all master
rows before it reads the detail rows. To ensure it reads all master rows before the detail rows, the
integration service blocks all the details source while it caches rows from the master source. As it
blocks the detail source, the unsorted joiner is called a blocking transformation.
Router Transformation
A router is used to filter the rows in a mapping. Unlike filter transformation, you can specify one
or more conditions in a router transformation. Router is an active transformation.
Page 43 of 59
2. How to improve the performance of a session using router transformation?
Input
Output
User-defined group
Default group
You can creat the group filter conditions in the groups tab using the expression editor.
6. Can you connect ports of two output groups from router transformation to a single target?
No. You cannot connect more than one output group to one target or a single input group
transformation.
Check the status of a target database before loading data into it.
Determine if enough space exists in a database.
Perform a specialized calculation.
Drop and recreate indexes.
Page 44 of 59
The stored procedure transformation is connected to the other transformations in the mapping
pipeline.
Run a stored procedure every time a row passes through the mapping.
Pass parameters to the stored procedure and receive multiple output parameters.
The stored procedure transformation is not connected directly to the flow of the mapping. It either
runs before or after the session or is called by an expression in another transformation in the
mapping.
7. What are the options available to specify when the stored procedure transformation needs to
be run?
The following options describe when the stored procedure transformation runs:
Normal: The stored procedure runs where the transformation exists in the mapping on a
row-by-row basis. This is useful for calling the stored procedure for each row of data that
passes through the mapping, such as running a calculation against an input port. Connected
stored procedures run only in normal mode.
Pre-load of the Source: Before the session retrieves data from the source, the stored
procedure runs. This is useful for verifying the existence of tables or performing joins of data in
a temporary table.
Post-load of the Source: After the session retrieves data from the source, the stored
procedure runs. This is useful for removing temporary tables.
Pre-load of the Target: Before the session sends data to the target, the stored procedure
runs. This is useful for verifying target tables or disk space on the target system.
Post-load of the Target: After the session sends data to the target, the stored procedure
runs. This is useful for re-creating indexes on the database.
A connected stored procedure transformation runs only in Normal mode. A unconnected stored
procedure transformation runs in all the above modes.
The order in which the Integration Service calls the stored procedure used in the transformation,
relative to any other stored procedures in the same mapping. Only used when the Stored
Procedure Type is set to anything except Normal and more than one stored procedure exists.
Page 45 of 59
9. What is PROC_RESULT in stored procedure transformation?
A source qualifier represents the rows that the integration service reads when it runs a session.
Source qualifier is an active transformation.
The source qualifier transformation converts the source data types into informatica native data
types.
Join two or more tables originating from the same source (homogeneous sources)
database.
Filter the rows.
Sort the data
Selecting distinct values from the source
Create custom query
Specify a pre-sql and post-sql
The source qualifier transformation joins the tables based on the primary key-foreign key
relationship.
When there is no primary key-foreign key relationship between the tables, you can specify a
custom join using the 'user-defined join' option in the properties tab of source qualifier.
SQL Query
Page 46 of 59
User-Defined Join
Source Filter
Number of Sorted Ports
Select Distinct
Pre-SQL
Post-SQL
A sequence generator is used to create unique primary key values, replace missing primary key
values or cycle through a sequential range of numbers.
A sequence generator contains two output ports. They are CURRVAL and NEXTVAL.
4. What is the maximum number of sequence that a sequence generator can generate?
5. When you connect both the NEXTVAL and CURRVAL ports to a target, what will be the output
values of these ports?
6. What will be the output value, if you connect only CURRVAL to the target without connecting
NEXTVAL?
8. What is the number of cached values set to default for a sequence generator transformation?
For non-reusable sequence generators, the number of cached values is set to zero.
For reusable sequence generators, the number of cached values is set to 1000.
Page 47 of 59
Start Value
Increment By
End Value
Current Value
Cycle
Number of Cached Values
Lookup Transformation
Get a related value: Retrieve a value from the lookup table based on a value in the
source.
Perform a calculation: Retrieve a value from a lookup table and use it in a calculation.
Update slowly changing dimension tables: Determine whether rows exist in a target.
6. What are the differences between connected and unconnected lookup transformation?
Connected lookup transformation receives input values directly from the pipeline.
Unconnected lookup transformation receives input values from the result of a :LKP expression
in another transformation.
Connected lookup transformation can be configured as dynamic or static cache.
Unconnected lookup transformation can be configured only as static cache.
Page 48 of 59
Connected lookup transformation can return multiple columns from the same row or
insert into the dynamic lookup cache. Unconnected lookup transformation can return one
column from each row.
If there is no match for the lookup condition, connected lookup transformation returns
default value for all output ports. If you configure dynamic caching, the Integration Service
inserts rows into the cache or leaves it unchanged. If there is no match for the lookup condition,
the unconnected lookup transformation returns null.
In a connected lookup transformation, the cache includes the lookup source columns in
the lookup condition and the lookup source columns that are output ports. In an unconnected
lookup transformation, the cache includes all lookup/output ports in the lookup condition and the
lookup/return port.
Connected lookup transformation passes multiple output values to another
transformation. Unconnected lookup transformation passes one output value to another
transformation.
Connected lookup transformation supports user-defined values. Unconnected lookup
transformation does not support user-defined default values.
7. How do you handle multiple matches in lookup transformation? or what is "Lookup Policy on
Multiple Match"?
"Lookup Policy on Multiple Match" option is used to determine which rows that the lookup
transformation returns when it finds multiple rows that match the lookup condition. You can select
lookup to return first or last row or any matching row or to report an error.
Insert Else Update option applies to rows entering the lookup transformation with the row
type of insert. When this option is enabled the integration service inserts new rows in the cache
and updates existing rows when disabled, the Integration Service does not update existing
rows.
Update Else Insert option applies to rows entering the lookup transformation with the row
type of update. When this option is enabled, the Integration Service updates existing rows, and
inserts a new row if it is new. When disabled, the Integration Service does not insert new rows.
Persistent cache
Recache from lookup source
Static cache
Dynamic cache
Shared Cache
Pre-build lookup cache
Page 49 of 59
Cached lookup transformation: The Integration Service builds a cache in memory when it
processes the first row of data in a cached Lookup transformation. The Integration Service
stores condition values in the index cache and output values in the data cache. The Integration
Service queries the cache for each row that enters the transformation.
Uncached lookup transformation: For each row that enters the lookup transformation, the
Integration Service queries the lookup source and returns a value. The integration service does
not build a cache.
12. How the integration service builds the caches for connected lookup transformation?
The Integration Service builds the lookup caches for connected lookup transformation in the
following ways:
Sequential cache: The Integration Service builds lookup caches sequentially. The
Integration Service builds the cache in memory when it processes the first row of the data in a
cached lookup transformation.
Concurrent caches: The Integration Service builds lookup caches concurrently. It does
not need to wait for data to reach the Lookup transformation.
13. How the integration service builds the caches for unconnected lookup transformation?
The Integration Service builds caches for unconnected Lookup transformations as sequentially.
15. When you use a dynamic cache, do you need to associate each lookup port with the input
port?
Yes. You need to associate each lookup/output port with the input/output port or a sequence ID.
The Integration Service uses the data in the associated port to insert or update rows in the
lookup cache.
0 - Integration Service does not update or insert the row in the cache.
1 - Integration Service inserts the row into the cache.
2 - Integration Service updates the row in the cache.
Page 50 of 59
19. What is unnamed cache and named cache?
A transaction is a set of rows bound by a commit or rollback of rows. The transaction control
transformation is used to commit or rollback a group of rows.
2. What is the commit type if you have a transaction control transformation in the mapping?
3. What are the different transaction levels available in transaction control transformation?
The following are the transaction levels or built-in variables:
Page 51 of 59
TC_ROLLBACK_AFTER: The Integration Service writes the current row to the target,
rolls back the transaction, and begins a new transaction. The current row is in the rolled
back transaction.
Basic Commands
2. $pwd: It displays present working directory. (If you login with dwhlabs)
/home/dwhlabs
dwhlabs
$banner "tecno"
8. $finger: It displays complete information about all the users who are logged in
9. $who: To display information about all users who have logged into the system
currently (login name, terminal number, date and time).
10. $who am i: It displays username, terminal number, date and time at which you logged
into the system.
$cal year
$cal month year
12. $ls: This command is used to find list of files except hidden.
Administrator Commands
Page 52 of 59
1. # system administrator prompt
$ user working prompt
#useradd user1
#password user1
Enter password:
Retype password:
System Run Levels
ls command options
$ls | pg: It displays list of files and directories page wise & width wise.
$ls -a: It displays files and directories including. and .. hidden files
a1
a2
sample\
letter*
notes@
\ directory
* executable files
- symbolic link file
$ls -r: It displays files and directories in reverse order (descending order)
Page 53 of 59
$ls -R: It displays files and directories recursively.
$ls -t: It displays files and directories based on date and time of creation.
$ls -i: It displays files and directories along with in order number.
file type permission link uname group sizeinbytes date filename
Page 54 of 59
Examples:
* wild card
? wild card
$ ls t??:It displays all files starting with 't' character and also file length must be 3 characters.
$ ls ?ram: It displays all files starting with any character and also the length of the file must
be 4 characters.
- wild card
$ ls [a-z]ra: It displays all files starting with any character between a to z and ends with
'ra'.The length of the file must be 3 characters.
[ ] wild card
$ ls [aeiou]ra: It displays all files starting with 'a' or 'e' or 'i' or 'o' or 'u' character and ending
with 'ra'.
. wild card
$ lst..u: It displays all files starting with 't' character and ending with 'u'. The length of the file
must be4 characters. It never includes enter key character.
Page 55 of 59
^ wild card
$ ls *sta$: It displays all files which ends with 'sta'. The length of the word can be any
number of characters.
Filter Commands: grep, fgrep and egrep:
$ grep :Globally Search a Regular expression and print it.
This command is used to search a particular pattern (single string) in a file or directory and
regular expression (pattern which uses wild card characters).
Ex. $grepdwhlabsdwh (It displays all the lines in a file which as dwhlabs pattern/string) -
Character Pattern
Ex: $grep "dwhlabs website" dwh - character pattern
eg. $grep \<dwhlabs\>dwh (It displays all the line in a file which as a exact word 'dwhlabs') -
Word Pattern
eg. $grep ^UNIX$ language (It displays all the lines in the file which has a single word 'UNIX' )
- Line Pattern
$grep options
Q) How to remove blank lines in a file?
Page 56 of 59
Explanation: Piping concept is used to combine more than one command. Here we first
identifying the non blank files in file1 and redirecting the result to tmp (temporary file).This
temporary file is input to the next command. Now we are renaming the tmp into file1.Now
check the file1 result <You will not find any blank lines>
fgrep:
This command is used to search multiple strings or one string but not regular expression.
egrep: Extended grep
This command is used to search for single or multiple patterns and also regular expressions.
$ cut: This command is used to retrieve the required fields or characters from a file.
Page 57 of 59
Ex. $cut -f 0-2 employee (all the fields in the above file)
102 Prabhu C
100 Rake
101 Rama
102 Prab
103 Jyos
Delimeters:
Default ( Tab )
: , ; * _
$cut options
$ sort: This command is used to sort the file records in ascending or descending
$sort Options
Page 58 of 59
How to sort data in field wise?
$ uniq: This command is used to filter the duplicate lines from a file
Note: Always the file must be in sorting order.
$uniq Options
Question: How to remove duplicate lines in a file?
$ tr: Translate the character in string1 from stdin into those in string2 in stdout.
$ sed: This command is used for editing files from a script or from the command line.
$ head: Display first 'N' number of files.
Page 59 of 59