Informatica Handbook

Download as pdf or txt
Download as pdf or txt
You are on page 1of 182

Tech-Zealous

Software
Solutions

ORACLE STATEMENTS
Data Definition Language (DDL)
Create
Alter
Drop
Truncate

Data Manipulation Language (DML)


Insert
Update
Delete

Data Querying Language (DQL)


Select

Data Control Language (DCL)


Grant
Revoke

Transactional Control Language (TCL)


Commit
Rollback
Save point

Syntaxes:

CREATE OR REPLACE SYNONYM HZ_PARTIES FOR SCOTT.HZ_PARTIES

CREATE DATABASE LINK CAASEDW CONNECT TO ITO_ASA IDENTIFIED BY exact123 USING 'CAASEDW’

Materialized View syntax:


CREATE MATERIALIZED VIEW EBIBDRO.HWMD_MTH_ALL_METRICS_CURR_VIEW
REFRESH COMPLETE
START WITH sysdate
NEXT TRUNC(SYSDATE+1)+ 4/24
WITH PRIMARY KEY
AS
select * from HWMD_MTH_ALL_METRICS_CURR_VW;

Another Method to refresh:


DBMS_MVIEW.REFRESH('MV_COMPLEX', 'C');

Case Statement:
Select NAME,
(CASE
WHEN (CLASS_CODE = 'Subscription')

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

THEN ATTRIBUTE_CATEGORY
ELSE TASK_TYPE
END) TASK_TYPE,
CURRENCY_CODE
From EMP

Decode ()
Select empname, Decode (address,’HYD’,’Hyderabad’,
‘Bang’, Bangalore’, address) as address from emp;

Procedure:
CREATE OR REPLACE PROCEDURE Update_bal (
cust_id_IN In NUMBER,
amount_IN In NUMBER DEFAULT 1) AS
BEGIN
Update account_tbl Set amount= amount_IN where cust_id= cust_id_IN
End

Trigger:
CREATE OR REPLACE TRIGGER EMP_AUR
AFTER/BEFORE UPDATE ON EMP
REFERENCING
NEW AS NEW
OLD AS OLD
FOR EACH ROW

DECLARE
BEGIN
IF (:NEW.last_upd_tmst <> :OLD.last_upd_tmst) THEN
-- Insert into Control table record
Insert into table emp_w values('wrk',sysdate)
ELSE
-- Exec procedure
Exec update_sysdate()
END;

ORACLE JOINS:

1. Equi join
2. Non-equi join
3. Self join
4. Natural join
5. Cross join
6. Outer join
1. Left outer
2

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

2. Right outer
3. Full outer

Equi Join/Inner Join/Simple Join:

SQL> select empno,ename,job,dname,loc from emp e,dept d where e.deptno=d.deptno;


USING CLAUSE
SQL> select empno,ename,job ,dname,loc from emp e join dept d using(deptno);
ON CLAUSE
SQL> select empno,ename,job,dname,loc from emp e join dept d on(e.deptno=d.deptno);
Non-Equi Join

A join which contains an operator other than ‘=’ in the joins condition.
Ex: SQL> select empno,ename,job,dname,loc from emp e,dept d where e.deptno >d.deptno;
OR
Ex: SQL> select e.ename,e.sal,s.salgrade from emp e ,grade s where e.sal between s.losal and s.hisal
Self Join

Joining the table itself is called self join.


Ex: SQL> select e1.empno, e2.ename, e1.job, e2.deptno from emp e1, emp e2 where
e1.empno=e2.mgr;
Natural Join

Natural join compares all the common columns.


Ex: SQL> select empno,ename,job,dname,loc from emp natural join dept;
Cross Join

This will give the cross product.


Ex: SQL> select empno,ename,job,dname,loc from emp cross join dept;
Outer Join

Outer join gives the non-matching records along with matching records.
Left Outer Join

This will display the all matching records and the records which are in left hand side table those that are
not in right hand side table.

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Ex: SQL> select empno,ename,job,dname,loc from emp e left outer join dept d
on(e.deptno=d.deptno);
OR
SQL> select empno,ename,job,dname,loc from emp e,dept d where e.deptno=d.deptno(+);

Right Outer Join

This will display the all matching records and the records which are in right hand side table those that are
not in left hand side table.
Ex: SQL> select empno,ename,job,dname,loc from emp e right outer join dept d
on(e.deptno=d.deptno);
OR
SQL> select empno,ename,job,dname,loc from emp e,dept d where e.deptno(+) = d.deptno;

Full Outer Join

This will display the all matching records and the non-matching records from both tables.
Ex: SQL> select empno,ename,job,dname,loc from emp e full outer join dept d
on(e.deptno=d.deptno);
OR
Ex: SQL>select empno,ename,job,dname,loc from emp e, dept d where e.deptno = d.deptno(+)
Union
Select empno,ename,job,dname,loc from emp e, dept d where e.deptno (+) = d.deptno

What’s the difference between View and Materialized View?

View:

Why Use Views?


• To restrict data access
• To make complex queries easy
• To provide data independence
A simple view is one that:
– Derives data from only one table
– Contains no functions or groups of data
– Can perform DML operations through the view.

A complex view is one that:


– Derives data from many tables
– Contains functions or groups of data
– Does not always allow DML operations through the view

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

A view has a logical existence but a materialized view has a physical existence. Moreover a materialized
view can be indexed, analyzed and so on....that is all the things that we can do with a table can also be
done with a materialized view.

We can keep aggregated data into materialized view. We can schedule the MV to refresh but table can’t.
MV can be created based on multiple tables.

Materialized View:

In DWH materialized views are very essential because in reporting side if we do aggregate calculations as
per the business requirement report performance would be de graded. So to improve report
performance rather than doing report calculations and joins at reporting side if we put same logic in the
MV then we can directly select the data from MV without any joins and aggregations. We can also
schedule MV (Materialize View).
Inline view:

If we write a select statement in from clause that is nothing but inline view.

Ex:
Get dept wise max sal along with empname and emp no.

Select a.empname, a.empno, b.sal, b.deptno


From EMP a, (Select max (sal) sal, deptno from EMP group by deptno) b
Where
a.sal=b.sal and
a.deptno=b.deptno

What is the difference between view and materialized view?

View Materialized view


A view has a logical existence. It does not contain A materialized view has a physical existence.
data.
It is not a database object. It is a database object.
We cannot perform DML operation on view. We can perform DML operation on materialized
view.
When we do select * from view it will fetch the When we do select * from materialized view it will
data from base table. fetch the data from materialized view.
In view we cannot schedule to refresh. In materialized view we can schedule to refresh.
We can keep aggregated data into materialized
view. Materialized view can be created based on
multiple tables.

What is the Difference between Delete, Truncate and Drop?

DELETE
The DELETE command is used to remove rows from a table. A WHERE clause can be used to only remove
some rows. If no WHERE condition is specified, all rows will be removed. After performing a DELETE

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

operation you need to COMMIT or ROLLBACK the transaction to make the change permanent or to undo
it.
TRUNCATE
TRUNCATE removes all rows from a table. The operation cannot be rolled back. As such, TRUCATE is faster
and doesn't use as much undo space as a DELETE.
DROP
The DROP command removes a table from the database. All the tables' rows, indexes and privileges will
also be removed. The operation cannot be rolled back.

Difference between Rowid and Rownum?

ROWID
A globally unique identifier for a row in a database. It is created at the time the row is inserted into a
table, and destroyed when it is removed from a table.'BBBBBBBB.RRRR.FFFF' where BBBBBBBB is the block
number, RRRR is the slot(row) number, and FFFF is a file number.

ROWNUM
For each row returned by a query, the ROWNUM pseudo column returns a number indicating the order in
which Oracle selects the row from a table or set of joined rows. The first row selected has a ROWNUM of 1,
the second has 2, and so on.

You can use ROWNUM to limit the number of rows returned by a query, as in this example:

SELECT * FROM employees WHERE ROWNUM < 10;


SELECT * FROM employees WHERE ROWNUM > 10; /** Greater than does not work on ROWNUM**/

Rowid Row-num

Rowid is an oracle internal id that is allocated Row-num is a row number returned by a select
every time a new record is inserted in a table. statement.
This ID is unique and cannot be changed by the
user.
Rowid is permanent. Row-num is temporary.
Rowid is a globally unique identifier for a row in a The row-num pseudocoloumn returns a number
database. It is created at the time the row is indicating the order in which oracle selects the
inserted into the table, and destroyed when it is row from a table or set of joined rows.
removed from a table.

Order of where and having:


SELECT column, group_function
FROM table
[WHERE condition]
[GROUP BY group_by_expression]
[HAVING group_condition]
[ORDER BY column];
6

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

The WHERE clause cannot be used to restrict groups. you use theHAVING clause to restrict groups.

Differences between where clause and having clause

Where clause Having clause


Both where and having clause can be used to filter the data.
Where as in where clause it is not mandatory. But having clause we need to use it with the group
by.
Where clause applies to the individual rows. Where as having clause is used to test some
condition on the group rather than on individual
rows.
Where clause is used to restrict rows. But having clause is used to restrict groups.
Restrict normal query by where Restrict group by function by having
In where clause every record is filtered based on In having clause it is with aggregate records (group
where. by functions).

MERGE Statement

You can use merge command to perform insert and update in a single command.
Ex: Merge into student1 s1
Using (select * from student2) s2
On (s1.no=s2.no)
When matched then
Update set marks = s2.marks
When not matched then
Insert (s1.no, s1.name, s1.marks) Values (s2.no, s2.name, s2.marks);
What is the difference between sub-query & co-related sub query?
A sub query is executed once for the parent statement
Whereas the correlated sub query is executed once for each
row of the parent query.
Sub Query:
Example:
Select deptno, ename, sal from emp a where sal in (select sal from Grade where sal_grade=’A’ or
sal_grade=’B’)
Co-Related Sub query:
Example:
Find all employees who earn more than the average salary in their department.
SELECT last-named, salary, department_id FROM employees A
WHERE salary > (SELECT AVG (salary)
FROM employees B WHERE B.department_id =A.department_id
Group by B.department_id)
EXISTS:
The EXISTS operator tests for existence of rows in
the results set of the subquery.
7

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Select dname from dept where exists


(select 1 from EMP
where dept.deptno= emp.deptno);

Sub-query Co-related sub-query


A sub-query is executed once for the parent Where as co-related sub-query is executed once
Query for each row of the parent query.
Example: Example:
Select * from emp where deptno in (select Select a.* from emp e where sal >= (select
deptno from dept); avg(sal) from emp a where a.deptno=e.deptno
group by a.deptno);

Indexes:
1. Bitmap indexes are most appropriate for columns having low distinct values—such as GENDER,
MARITAL_STATUS, and RELATION. This assumption is not completely accurate, however. In reality, a
bitmap index is always advisable for systems in which data is not frequently updated by many
concurrent systems. In fact, as I'll demonstrate here, a bitmap index on a column with 100-
percent unique values (a column candidate for primary key) is as efficient as a B-tree index.
2. When to Create an Index
3. You should create an index if:
4. A column contains a wide range of values
5. A column contains a large number of null values
6. One or more columns are frequently used together in a WHERE clause or a join condition
7. The table is large and most queries are expected to retrieve less than 2 to 4 percent of the rows
8. By default if u create index that is nothing but b-tree index.

Why hints Require?

It is a perfect valid question to ask why hints should be used. Oracle comes with an optimizer that
promises to optimize a query's execution plan. When this optimizer is really doing a good job, no hints
should be required at all.

Sometimes, however, the characteristics of the data in the database are changing rapidly, so that the
optimizers (or more accurately, its statistics) are out of date. In this case, a hint could help.
You should first get the explain plan of your SQL and determine what changes can be done to make the
code operate without using hints if possible. However, hints such as ORDERED, LEADING, INDEX, FULL, and
the various AJ and SJ hints can take a wild optimizer and give you optimal performance

Tables analyze and update Analyze Statement


The ANALYZE statement can be used to gather statistics for a specific table, index or cluster. The statistics
can be computed exactly, or estimated based on a specific number of rows, or a percentage of rows:
ANALYZE TABLE employees COMPUTE STATISTICS;

ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 15 PERCENT;

EXEC DBMS_STATS.gather_table_stats ('SCOTT', 'EMPLOYEES');

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Automatic Optimizer Statistics Collection


By default Oracle 10g automatically gathers optimizer statistics using a scheduled job called
GATHER_STATS_JOB. By default this job runs within maintenance windows between 10 P.M. to 6 A.M. week
nights and all day on weekends. The job calls the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC
internal procedure which gathers statistics for tables with either empty or stale statistics, similar to the
DBMS_STATS.GATHER_DATABASE_STATS procedure using the GATHER AUTO option. The main difference is
that the internal job prioritizes the work such that tables most urgently requiring statistics updates are
processed first.

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Hint categories:
Hints can be categorized as follows:
1. ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing systems.
(/*+ ALL_ROWS */)
2. FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.
(/*+ FIRST_ROWS */)
3. CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS, based on statistics
gathered.
1. Hints for Join Orders,
2. Hints for Join Operations,
3. Hints for Parallel Execution, (/*+ parallel(a,4) */) specify degree either 2 or 4 or 16
4. Additional Hints
5. HASH
Hashes one table (full scan) and creates a hash index for that table. Then hashes other table and
uses hash index to find corresponding records. Therefore not suitable for < or > join conditions.
/*+ use_hash */
Use Hint to force using index

SELECT /*+INDEX (TABLE_NAME INDEX_NAME) */ COL1, COL2 FROM TABLE_NAME


Select (/*+ hash */) empno from
ORDERED- This hint forces tables to be joined in the order specified. If you know table X has fewer rows,
then ordering it first may speed execution in a join.
PARALLEL (table, instances) This specifies the operation is to be done in parallel.
If index is not able to create then will go for /*+ parallel(table, 8)*/-----For select and update example---in
where clause like st,not in ,>,< ,<> then we will use.
Explain Plan:
Explain plan will tell us whether the query properly using indexes or not.whatis the cost of the table whether
it is doing full table scan or not, based on these statistics we can tune the query.
The explain plan process stores data in the PLAN_TABLE. This table can be located in the current schema
or a shared schema and is created using in SQL*Plus as follows:
SQL> CONN sys/password AS SYSDBA
Connected
SQL> @$ORACLE_HOME/rdbms/admin/utlxplan.sql
SQL> GRANT ALL ON sys.plan_table TO public;

SQL> CREATE PUBLIC SYNONYM plan_table FOR sys.plan_table;

10

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

What is your tuning approach if SQL query taking long time? Or how do u tune SQL query?

If query taking long time then First will run the query in Explain Plan, The explain plan process stores data in
the PLAN_TABLE.
it will give us execution plan of the query like whether the query is using the relevant indexes on the joining
columns or indexes to support the query are missing.
If joining columns doesn’t have index then it will do the full table scan if it is full table scan the cost will be
more then will create the indexes on the joining columns and will run the query it should give better
performance and also needs to analyze the tables if analyzation happened long back. The ANALYZE
statement can be used to gather statistics for a specific table, index or cluster using
ANALYZE TABLE employees COMPUTE STATISTICS;
If still have performance issue then will use HINTS, hint is nothing but a clue. We can use hints like
6. ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing systems.
(/*+ ALL_ROWS */)
7. FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.
(/*+ FIRST_ROWS */)
8. CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS, based on statistics
gathered.
9. HASH
Hashes one table (full scan) and creates a hash index for that table. Then hashes other table and
uses hash index to find corresponding records. Therefore not suitable for < or > join conditions.
/*+ use_hash */
Hints are most useful to optimize the query performance.

What are the differences between stored procedures and triggers?


Stored procedure normally used for performing tasks
But the Trigger normally used for tracing and auditing logs.

Stored procedures should be called explicitly by the user in order to execute


But the Trigger should be called implicitly based on the events defined in the table.

Stored Procedure can run independently


But the Trigger should be part of any DML events on the table.

Stored procedure can be executed from the Trigger


But the Trigger cannot be executed from the Stored procedures.

Stored Procedures can have parameters.


But the Trigger cannot have any parameters.
Stored procedures are compiled collection of programs or SQL statements in the database.

11

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Using storedprocedure we can access and modify data present in many tables.
Also a stored procedure is not associated with any particular database object.
But triggers are event-driven special procedures which are attached to a specific database object say
a table.
Stored procedures are not automatically run and they have to be called explicitly by the user. But
triggers get executed when the particular event associated with the event gets fired.

Packages:
Packages provide a method of encapsulating related procedures, functions, and associated cursors and
variables together as a unit in the database.
package that contains several procedures and functions that process related to same transactions.
A package is a group of related procedures and functions, together with the cursors and variables they
use,
Packages provide a method of encapsulating related procedures, functions, and associated cursors and
variables together as a unit in the database.

Triggers:
Oracle lets you define procedures called triggers that run implicitly when an INSERT, UPDATE, or DELETE
statement is issued against the associated table
Triggers are similar to stored procedures. A trigger stored in the database can include SQL and PL/SQL

Types of Triggers
This section describes the different types of triggers:
1. Row Triggers and Statement Triggers
2. BEFORE and AFTER Triggers
3. INSTEAD OF Triggers
4. Triggers on System Events and User Events
Row Triggers
A row trigger is fired each time the table is affected by the triggering statement. For example, if an
UPDATE statement updates multiple rows of a table, a row trigger is fired once for each row affected by
the UPDATE statement. If a triggering statement affects no rows, a row trigger is not run.
BEFORE and AFTER Triggers
When defining a trigger, you can specify the trigger timing--whether the trigger action is to be run before
or after the triggering statement. BEFORE and AFTER apply to both statement and row triggers.
BEFORE and AFTER triggers fired by DML statements can be defined only on tables, not on views.

Difference between Trigger and Procedure

Triggers Stored Procedures


In trigger no need to execute manually. Triggers will Where as in procedure we need to execute
be fired automatically. manually.
Triggers that run implicitly when an INSERT, UPDATE,
or DELETE statement is issued against the associated
table.

12

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Differences between stored procedure and functions

Stored Procedure Functions


Stored procedure may or may not return values. Function should return at least one output
parameter. Can return more than one parameter
using OUT argument.
Stored procedure can be used to solve the business Function can be used to calculations
logic.
Stored procedure is a pre-compiled statement. But function is not a pre-compiled statement.
Stored procedure accepts more than one Whereas function does not accept arguments.
argument.
Stored procedures are mainly used to process the Functions are mainly used to compute values
tasks.
Cannot be invoked from SQL statements. E.g. Can be invoked form SQL statements e.g. SELECT
SELECT
Can affect the state of database using commit. Cannot affect the state of database.
Stored as a pseudo-code in database i.e. compiled Parsed and compiled at runtime.
form.

Data files Overview:

A tablespace in an Oracle database consists of one or more physical datafiles. A datafile can be
associated with only one tablespace and only one database.
Table Space:
Oracle stores data logically in tablespaces and physically in datafiles associated with the corresponding
tablespace.
A database is divided into one or more logical storage units called tablespaces. Tablespaces are divided
into logical units of storage called segments.
Control File:
A control file contains information about the associated database that is required for access by an
instance, both at startup and during normal operation. Control file information can be modified only by
Oracle; no database administrator or user can edit a control file.

IMPORTANT QUERIES

1. Get duplicate rows from the table:


Select empno, count (*) from EMP group by empno having count (*)>1;
2. Remove duplicates in the table:
Delete from EMP where rowid not in (select max (rowid) from EMP group by empno);
3. Below query transpose columns into rows.
Name No Add1 Add2
abc 100 hyd bang
xyz 200 Mysore pune

Select name, no, add1 from A


UNION
Select name, no, add2 from A;
13

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

4. Below query transpose rows into columns.


select
emp_id,
max(decode(row_id,0,address))as address1,
max(decode(row_id,1,address)) as address2,
max(decode(row_id,2,address)) as address3
from (select emp_id,address,mod(rownum,3) row_id from ( select emp_id,address from emp order by
emp_id ))
group by emp_id

Other query:
select
emp_id,
max(decode(rank_id,1,address)) as add1,
max(decode(rank_id,2,address)) as add2,
max(decode(rank_id,3,address))as add3
from
(select emp_id,address,rank() over (partition by emp_id order by emp_id,address )rank_id from temp )
group by
emp_id

5. Rank query:

Select empno, ename, sal, r from (select empno, ename, sal, rank () over (order by sal desc) r from EMP);

6. Dense rank query:

The DENSE_RANK function works acts like the RANK function except that it assigns consecutive ranks:
Select empno, ename, Sal, from (select empno, ename, sal, dense_rank () over (order by sal desc) r from
emp);

7. Top 5 salaries by using rank:

Select empno, ename, sal,r from (select empno,ename,sal,dense_rank() over (order by sal desc) r from
emp) where r<=5;

OR

Select * from (select * from EMP order by sal desc) where rownum<=5;

8. 2 nd highest Sal:

Select empno, ename, sal, r from (select empno, ename, sal, dense_rank () over (order by sal desc) r from
EMP) where r=2;

14

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

9. Top sal:

Select * from EMP where sal= (select max (sal) from EMP);

10. How to display alternative rows in a table?

SQL> select *from emp where (rowid, 0) in (select rowid,mod(rownum,2) from emp);

11. Hierarchical queries

Starting at the root, walk from the top down, and eliminate employee Higgins in the result, but
process the child rows.
SELECT department_id, employee_id, last_name, job_id, salary
FROM employees
WHERE last_name! = ’Higgins’
START WITH manager_id IS NULL
CONNECT BY PRIOR employee_id = menagerie;

15

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

DWHCONCEPTS
What is BI?
Business Intelligence refers to a set of methods and techniques that are used by organizations for tactical
and strategic decision making. It leverages methods and technologies that focus on counts, statistics and
business objectives to improve business performance.
The objective of Business Intelligence is to better understand customers and improve customer service,
make the supply and distribution chain more efficient, and to identify and address business problems and
opportunities quickly.
Warehouse is used for high level data analysis purpose. It is used for predictions, timeseries analysis, financial
Analysis, what -if simulations etc. Basically it is used for better decision making.

What is a Data Warehouse?

Data Warehouse is a "Subject-Oriented, Integrated, Time-Variant Nonvolatile collection of data in


support of decision making".
In terms of design data warehouse and data mart are almost the same.
In general a Data Warehouse is used on an enterprise level and a Data Marts is used on a business
division/department level.
Subject Oriented:
Data that gives information about a particular subject instead of about a company's ongoing operations.
Integrated:
Data that is gathered into the data warehouse from a variety of sources and merged into a coherent
whole.
Time-variant:
All data in the data warehouse is identified with a particular time period.
Non-volatile:
Data is stable in a data warehouse. More data is added but data is never removed.
What is a DataMart?
Datamart is usually sponsored at the department level and developed with a specific details or subject
in mind, a Data Mart is a subset of data warehouse with a focused objective.

What is the difference between a data warehouse and a data mart?

In terms of design data warehouse and data mart are almost the same.
In general a Data Warehouse is used on an enterprise level and a Data Mart is used on a business
division/department level.
A data mart only contains data specific to a particular subject areas.

Difference between data mart and data warehouse

Data Mart Data Warehouse

Data mart is usually sponsored at the department Data warehouse is a “Subject-Oriented, Integrated,
level and developed with a specific issue or subject Time-Variant, Nonvolatile collection of data in
in mind, a data mart is a data warehouse with a support of decision making”.
16

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

focused objective.
A data mart is used on a business division/ A data warehouse is used on an enterprise level
department level.
A Data Mart is a subset of data from a Data A Data Warehouse is simply an integrated
Warehouse. Data Marts are built for specific user consolidation of data from a variety of sources that
groups. is specially designed to support strategic and
tactical decision making.
By providing decision makers with only a subset of The main objective of Data Warehouse is to provide
data from the Data Warehouse, Privacy, an integrated environment and coherent picture of
Performance and Clarity Objectives can be the business at a point in time.
attained.

What is fact less fact table?

A fact table that contains only primary keys from the dimension tables, and that do not contain any
measures that type of fact table is called fact less fact table.

What is a Schema?

Graphical Representation of the datastructure.First Phase in implementation of Universe

What are the most important features of a data warehouse?

DRILL DOWN, DRILL ACROSS, Graphs, PI charts, dashboards and TIME HANDLING

To be able to drill down/drill across is the most basic requirement of an end user in a data warehouse.
Drilling down most directly addresses the natural end-user need to see more detail in an result. Drill down
should be as generic as possible becuase there is absolutely no good way to predict users drill-down
path.

What does it mean by grain of the star schema?

In Data warehousing grain refers to the level of detail available in a given fact table as well as to the
level of detail provided by a star schema.
It is usually given as the number of records per key within the table. In general, the grain of the fact table
is the grain of the star schema.

What is a star schema?


Star schema is a data warehouse schema where there is only one “fact table" and many denormalized
dimension tables.

Fact table contains primary keys from all the dimension tables and other numeric columns columns of
additive, numeric facts.

17

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

What is a snowflake schema?

Unlike Star-Schema, Snowflake schema contain normalized dimension tables in a tree like structure with
many nesting levels.

Snowflake schema is easier to maintain but queries require more joins.

What is the difference between snow flake and star schema

Star Schema Snow Flake Schema


The star schema is the simplest data warehouse Snowflake schema is a more complex data
scheme. warehouse model than a star schema.
In star schema each of the dimensions is In snow flake schema at least one hierarchy should
represented in a single table .It should not have exists between dimension tables.
any hierarchies between dims.
It contains a fact table surrounded by dimension It contains a fact table surrounded by dimension
tables. If the dimensions are de-normalized, we say tables. If a dimension is normalized, we say it is a
it is a star schema design. snow flaked design.
18

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

In star schema only one join establishes the In snow flake schema since there is relationship
relationship between the fact table and any one between the dimensions tables it has to do many
of the dimension tables. joins to fetch the data.
A star schema optimizes the performance by Snowflake schemas normalize dimensions to
keeping queries simple and providing fast response eliminated redundancy. The result is more complex
time. All the information about the each level is queries and reduced query performance.
stored in one row.
It is called a star schema because the diagram It is called a snowflake schema because the
resembles a star. diagram resembles a snowflake.

What is Fact and Dimension?

A "fact" is a numeric value that a business wishes to count or sum. A "dimension" is essentially an entry point
for getting at the facts. Dimensions are things of interest to the business.

A set of level properties that describe a specific aspect of a business, used for analyzing the factual
measures.

What is Fact Table?

A Fact Table in a dimensional model consists of one or more numeric facts of importance to a
business. Examples of facts are as follows:
1. the number of products sold
2. the value of products sold
3. the number of products produced
4. the number of service calls received

What is Factless Fact Table?

Factless fact table captures the many-to-many relationships between dimensions, but contains no
numeric or textual facts. They are often used to record events or coverage information.

Common examples of factless fact tables include:


1. Identifying product promotion events (to determine promoted products that didn’t sell)
2. Tracking student attendance or registration events
3. Tracking insurance-related accident events

Types of facts?
There are three types of facts:

1. Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact
table.
2. Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions
in the fact table, but not the others.

19

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

3. Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions
present in the fact table.

What is Granularity?

Principle: create fact tables with the most granular data possible to support analysis of the business
process.

In Data warehousing grain refers to the level of detail available in a given fact table as well as to the level
of detail provided by a star schema.

It is usually given as the number of records per key within the table. In general, the grain of the fact table
is the grain of the star schema.

Facts: Facts must be consistent with the grain. All facts are at a uniform grain.

1. Watch for facts of mixed granularity


2. Total sales for day & monthly total
Dimensions: each dimension associated with fact table must take on a single value for each fact row.
1. Each dimension attribute must take on one value.
2. Outriggers are the exception, not the rule.

What is slowly Changing Dimension?


Slowly changing dimensions refers to the change in dimensional attributes over time.
An example of slowly changing dimension is a Resource dimension where attributes of a particular
employee change over time like their designation changes or dept changes etc.
What is Conformed Dimension?
Conformed Dimensions (CD): these dimensions are something that is built once in your model and can be
reused multiple times with different fact tables. For example, consider a model containing multiple fact
tables, representing different data marts. Now look for a dimension that is common to these facts tables.
In this example let’s consider that the product dimension is common and hence can be reused by
creating short cuts and joining the different fact tables. Some of the examples are time dimension,
customer dimensions, product dimension.

What is Junk Dimension?


A "junk" dimension is a collection of random transactional codes, flags and/or text attributes that are
unrelated to any particular dimension. The junk dimension is simply a structure that provides a convenient
place to store the junk attributes. A good example would be a trade fact in a company that brokers
equity trades.
When you consolidate lots of small dimensions and instead of having 100s of small dimensions, that will
have few records in them, cluttering your database with these mini ‘identifier’ tables, all records from all
these small dimension tables are loaded into ONE dimension table and we call this dimension table Junk
dimension table. (Since we are storing all the junk in this one table) For example: a company might have
handful of manufacture plants, handful of order types, and so on, so forth, and we can consolidate them
in one dimension table called junked dimension table
It’s a dimension table which is used to keep junk attributes

What is De Generated Dimension?


20

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

An item that is in the fact table but is stripped off of its description, because the description belongs in
dimension table, is referred to as Degenerated Dimension. Since it looks like dimension, but is really in fact
table and has been degenerated of its description, hence is called degenerated dimension..
Degenerated Dimension: a dimension which is located in fact table known as Degenerated dimension

Dimensional Model:
A type of data modeling suited for data warehousing. In a dimensional model, there are two
types of tables: dimensional tables and fact tables. Dimensional table records information on
each dimension, and fact table records all the "fact", or measures.
Data modeling
There are three levels of data modeling. They are conceptual, logical, and physical

The differences between a logical data model and physical data model are shown below.
Logical vs Physical Data Modeling

Logical Data Model Physical Data Model

Represents business information and defines Represents the physical implementation of the model
business rules in a database.
Entity Table
Attribute Column
Primary Key Primary Key Constraint
Alternate Key Unique Constraint or Unique Index
Inversion Key Entry Non Unique Index
Rule Check Constraint, Default Value
Relationship Foreign Key
Definition Comment

Below is the simple data model

21

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Below is the sq for project dim

22

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

23

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

EDIII – Logical Design

24

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

ACW_ORGANIZATION_D
ACW_DF_FEES_STG ACW_DF_FEES_F Primary Key
Non-Key Attributes Primary Key ORG_KEY [PK1]
SEGMENT1 ACW_DF_FEES_KEY Non-Key Attributes
ORGANIZATION_ID [PK1] ORGANIZATION_CODE
ITEM_TYPE Non-Key Attributes CREA TED_BY
BUYER_ID CREA TION_DATE
PRODUCT_KEY
COST_REQUIRED LAST_UPDATE_DATE
ORG_KEY
QUARTER_1_COST LAST_UPDATED_BY
DF_MGR_KEY
QUARTER_2_COST D_CREATED_BY
COST_REQUIRED
QUARTER_3_COST
DF_FEES D_CREATION_DATE PID for DF Fees
QUARTER_4_COST D_LAST_UPDATE_DATE
COSTED_BY
COSTED_BY D_LAST_UPDATED_BY
COSTED_DATE
COSTED_DATE
APPROV ING_MGR
APPROV ED_BY
APPROV ED_DATE
APPROV ED_DATE
D_CREATED_BY
D_CREATION_DATE ACW_USERS_D
D_LAST_UPDATE_BY Primary Key
D_LAST_UPDATED_DATE USER_KEY [PK1]
EDW_TIME_HIERARCHY Non-Key Attributes
PERSON_ID
EMAIL_ADDRESS
ACW_PCBA_A PPROVAL_F LAST_NAME
Primary Key FIRST_NAME
ACW_PCBA_A PPROVAL_STG FULL_NAME
PCBA _APPROVAL_KEY
Non-Key Attributes [PK1] EFFECTIV E_STA RT_DATE
INV ENTORY_ITEM_ID Non-Key Attributes EFFECTIV E_END_DATE
LATEST_REV PART_KEY EMPLOYEE_NUMBER
LOCATION_ID CISCO_PART_NUMBER LAST_UPDATED_BY
LOCATION_CODE LAST_UPDATE_DATE
SUPPLY_CHANNEL_KEY
APPROV AL_FLAG NPI CREA TION_DATE
ADJUSTMENT APPROV AL_FLAG CREA TED_BY
APPROV AL_DATE ADJUSTMENT D_LAST_UPDATED_BY
TOTA L_ADJUSTMENT APPROV AL_DATE D_LAST_UPDATE_DATE
TOTA L_ITEM_COST ADJUSTMENT_AMT D_CREATION_DATE
DEMAND SPEND_BY _ASSEMBLY D_CREATED_BY
COMM_MGR COMM_MGR_KEY ACW_PRODUCTS_D
BUYER_ID BUYER_ID Primary Key
BUYER RFQ_CREATED ACW_PART_TO_PID_D PRODUCT_KEY [PK1]
RFQ_CREATED RFQ_RESPONSE Users
Primary Key Non-Key Attributes
RFQ_RESPONSE
CSS PART_TO_PID_KEY [PK1] PRODUCT_NA ME
CSS D_CREATED_BY Non-Key Attributes BUSINESS_UNIT_ID
D_CREATED_DATE PART_KEY BUSINESS_UNIT
D_LAST_UPDATED_BY CISCO_PART_NUMBER PRODUCT_FAMILY_ID
ACW_DF_A PPROVAL_STG D_LAST_UPDATE_DATE PRODUCT_KEY PRODUCT_FAMILY
Non-Key Attributes PRODUCT_NA ME ITEM_TYPE
LATEST_REVISION D_CREATED_BY
INV ENTORY_ITEM_ID ACW_DF_A PPROVAL_F
D_CREATED_BY D_CREATION_DATE
CISCO_PART_NUMBER Primary Key
D_CREATION_DATE D_LAST_UPDATE_BY
LATEST_REV DF_APPROVAL_KEY D_LAST_UPDATED_BY D_LAST_UPDATED_DATE
PCBA _ITEM_FLAG [PK1]
APPROV AL_FLAG D_LAST_UPDATE_DATE
Non-Key Attributes
APPROV AL_DATE
LOCATION_ID PART_KEY
LOCATION_CODE CISCO_PART_NUMBER
BUYER SUPPLY_CHANNEL_KEY
BUYER_ID PCBA _ITEM_FLAG
RFQ_CREATED APPROV ED ACW_SUPPLY_CHA NNEL_D
RFQ_RESPONSE APPROV AL_DATE
Primary Key
CSS BUYER_ID
RFQ_CREATED SUPPLY_CHANNEL_KEY
RFQ_RESPONSE [PK1]
CSS Non-Key Attributes
D_CREATED_BY SUPPLY_CHANNEL
D_CREATION_DATE DESCRIPTION
D_LAST_UPDATED_BY LAST_UPDATED_BY
D_LAST_UPDATE_DATE LAST_UPDATE_DATE
CREA TED_BY
CREA TION_DATE
D_LAST_UPDATED_BY
D_LAST_UPDATE_DATE
D_CREATED_BY
D_CREATION_DATE

25

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

What is Staging area why we need it in DWH?

If target and source databases are different and target table volume is high it contains some millions of
records in this scenario without staging table we need to design your informatica using look up to find out
whether the record exists or not in the target table since target has huge volumes so its costly to create
cache it will hit the performance.

If we create staging tables in the target database we can simply do outer join in the source qualifier to
determine insert/update this approach will give you good performance.

It will avoid full table scan to determine insert/updates on target.


And also we can create index on staging tables since these tables were designed for specific application it
will not impact to any other schemas/users.

While processing flat files to data warehousing we can perform cleansing.


Data cleansing, also known as data scrubbing, is the process of ensuring that a set of data is correct and
accurate. During data cleansing, records are checked for accuracy and consistency.

1. Since it is one-to-one mapping from ODS to staging we do truncate and reload.

2. We can create indexes in the staging state, to perform our source qualifier best.

3. If we have the staging area no need to relay on the informatics transformation to known whether
the record exists or not.

Data cleansing

Weeding out unnecessary or unwanted things (characters and spaces etc) from incoming data to
make it more meaningful and informative

Data merging
Data can be gathered from heterogeneous systems and put together

Data scrubbing
Data scrubbing is the process of fixing or eliminating individual pieces of data that are incorrect,
incomplete or duplicated before the data is passed to end user.

Data scrubbing is aimed at more than eliminating errors and redundancy. The goal is also to bring
consistency to various data sets that may have been created with different, incompatible business
rules.

ODS (Operational Data Sources):

My understanding of ODS is, its a replica of OLTP system and so the need of this, is to reduce the burden
on production system (OLTP) while fetching data for loading targets. Hence its a mandate Requirement
for every Warehouse.
So every day do we transfer data to ODS from OLTP to keep it up to date?
OLTP is a sensitive database they should not allow multiple select statements it may impact the

26

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

performance as well as if something goes wrong while fetching data from OLTP to data warehouse it will
directly impact the business.
ODS is the replication of OLTP.
ODS is usually getting refreshed through some oracle jobs.
enables management to gain a consistent picture of the business.

What is a surrogate key?


A surrogate key is a substitution for the natural primary key. It is a unique identifier or number (normally
created by a database sequence generator) for each record of a dimension table that can be used for
the primary key to the table.

A surrogate key is useful because natural keys may change.

What is the difference between a primary key and a surrogate key?

A primary key is a special constraint on a column or set of columns. A primary key constraint ensures that
the column(s) so designated have no NULL values, and that every value is unique. Physically, a primary
key is implemented by the database system using a unique index, and all the columns in the primary key
must have been declared NOT NULL. A table may have only one primary key, but it may be composite
(consist of more than one column).

A surrogate key is any column or set of columns that can be declared as the primary key instead of a
"real" or natural key. Sometimes there can be several natural keys that could be declared as the primary
key, and these are all called candidate keys. So a surrogate is a candidate key. A table could actually
have more than one surrogate key, although this would be unusual. The most common type of surrogate
key is an incrementing integer, such as an auto increment column in MySQL, or a sequence in Oracle, or
an identity column in SQL Server.

27

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

ETL-INFORMATICA
Differences between connected lookup and unconnected lookup

Connected Lookup Unconnected Lookup


This is connected to pipeline and receives the Which is not connected to pipeline and receives
input values from pipeline. input values from the result of a: LKP expression in
another transformation via arguments.
We cannot use this lookup more than once in a We can use this transformation more than once
mapping. within the mapping
We can return multiple columns from the same Designate one return port (R), returns one column
row. from each row.
We can configure to use dynamic cache. We cannot configure to use dynamic cache.
Pass multiple output values to another Pass one output value to another transformation.
transformation. Link lookup/output ports to The lookup/output/return port passes the value to
another transformation. the transformation calling: LKP expression.
Use a dynamic or static cache Use a static cache
Supports user defined default values. Does not support user defined default values.
Cache includes the lookup source column in the Cache includes all lookup/output ports in the
lookup condition (as index cache) and the lookup condition and the lookup/return port.
lookup source columns that are output ports (as
data cache).

Differences between dynamic lookup and static lookup

Dynamic Lookup Cache Static Lookup Cache


In dynamic lookup the cache memory will get In static lookup the cache memory will not get
refreshed as soon as the record get inserted or refreshed even though record inserted or
updated/deleted in the lookup table. updated in the lookup table it will refresh only in
the next session run.
When we configure a lookup transformation to It is a default cache.
use a dynamic lookup cache, you can only use
the equality operator in the lookup condition.
NewLookupRow port will enable automatically.
Best example where we need to use dynamic If we use static lookup first record it will go to
cache is if suppose first record and last record lookup and check in the lookup cache based
both are same but there is a change in the on the condition it will not find the match so it
address. What informatica mapping has to do will return null value then in the router it will send

28

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

here is first record needs to get insert and last that record to insert flow.
record should get update in the target table. But still this record dose not available in the
cache memory so when the last record comes
to lookup it will check in the cache it will not find
the match so it returns null value again it will go
to insert flow through router but it is suppose to
go to update flow because cache didn’t get
refreshed when the first record get inserted into
target table.

What is the difference between joiner and lookup

Joiner Lookup

In joiner on multiple matches it will return all In lookup it will return either first record or last record
matching records. or any value or error value.
In joiner we cannot configure to use persistence Where as in lookup we can configure to use
cache, shared cache, uncached and dynamic persistence cache, shared cache, uncached and
cache dynamic cache.
We cannot override the query in joiner We can override the query in lookup to fetch the
data from multiple tables.
We can perform outer join in joiner transformation. We cannot perform outer join in lookup
transformation. But lookup by default work as a left
outer join
We cannot use relational operators in joiner Where as in lookup we can use the relation
transformation.(i.e. <,>,<= and so on) operators. (i.e. <,>,<= and so on)

What is the difference between source qualifier and lookup

Source Qualifier Lookup

In source qualifier it will push all the matching Where as in lookup we can restrict whether to
records. display first value, last value or any value
In source qualifier there is no concept of cache. Where as in lookup we concentrate on cache
concept.
When both source and lookup are in same When the source and lookup table exists in
database we can use source qualifier. different database then we need to use lookup.

Have you done any Performance tuning in informatica?

1. Yes, One of my mapping was taking 3-4 hours to process 40 millions rows into staging table we
don’t have any transformation inside the mapping its 1 to 1 mapping .Here nothing is there to
optimize the mapping so I created session partitions using key range on effective date column. It
improved performance lot, rather than 4 hours it was running in 30 minutes for entire
40millions.Using partitions DTM will creates multiple reader and writer threads.

2. There was one more scenario where I got very good performance in the mapping level .Rather
29

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

than using lookup transformation if we can able to do outer join in the source qualifier query
override this will give you good performance if both lookup table and source were in the same
database. If lookup tables is huge volumes then creating cache is costly.

3. And also if we can able to optimize mapping using less no of transformations always gives you
good performance.

4. If any mapping taking long time to execute then first we need to look in to source and target
statistics in the monitor for the throughput and also find out where exactly the bottle neck by
looking busy percentage in the session log will come to know which transformation taking more
time, if your source query is the bottle neck then it will show in the end of the session log as “query
issued to database “that means there is a performance issue in the source query. We need to
tune the query using.

Informatica Session Log shows busy percentage

If we look into session logs it shows busy percentage based on that we need to find out where is bottle
neck.

***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] ****

Thread [READER_1_1_1] created for [the read stage] of partition point [SQ_ACW_PCBA_APPROVAL_STG]
has completed: Total Run Time = [7.193083] secs, Total Idle Time = [0.000000] secs, Busy Percentage =
[100.000000]

Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point


[SQ_ACW_PCBA_APPROVAL_STG] has completed. The total run time was insufficient for any meaningful
statistics.

Thread [WRITER_1_*_1] created for [the write stage] of partition point [ACW_PCBA_APPROVAL_F1,
ACW_PCBA_APPROVAL_F] has completed: Total Run Time = [0.806521] secs, Total Idle Time = [0.000000]
secs, Busy Percentage = [100.000000]

If suppose I've to load 40 lacs records in the target table and the workflow is taking about 10 - 11 hours to
finish. I've already increased the cache size to 128MB. There are no joiner, just lookups and expression
transformations

(1) If the lookups is uncached and have many records, try creating indexes on the columns used in the
lkp condition. And try increasing the lookup cache.If this doesnt increase the performance. If the target
has any indexes disable them in the target pre load and enable them in the target post load.

(2) Three things you can do w.r.t it.

1. Increase the Commit intervals (by default its 10000)


2. Use bulk mode instead of normal mode incase ur target doesn't have primary keys or use pre and post
session SQL to implement the same (depending on the business req.)
3. Uses Key partitionning to load the data faster.

30

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

(3)If your target consists key constraints and indexes u slowthe loading of data. To improve the session
performance inthis case drop constraints and indexes before you run thesession and rebuild them after
completion of session.

What is Constraint based loading in informatica?

By setting Constraint Based Loading property at session level in Configaration tab we can load the data
into parent and child relational tables (primary foreign key).

Genarally What it do is it will load the data first in parent table then it will load it in to child table.

What is use of Shortcuts in informatica?

If we copy source definaltions or target definations or mapplets from Shared folder to any other folders
that will become a shortcut.

Let’s assume we have imported some source and target definitions in a shared folder after that we are
using those sources and target definitions in another folder as a shortcut in some mappings.

If any modifications occur in the backend (Database) structure like adding new columns or drop existing
columns either in source or target I f we reimport into shared folder those new changes automatically it
would reflect in all folder/mappings wherever we used those sources or target definitions.

Target Update Override

If we don’t have primary key on target table using Target Update Override option we can perform
updates.By default, the Integration Service updates target tables based on key values. However, you can
override the default UPDATE statement for each target in a mapping. You might want to update the
target based on non-key columns.

Overriding the WHERE Clause

You can override the WHERE clause to include non-key columns. For example, you might want to update
records for employees named Mike Smith only. To do this, you edit the WHERE clause as follows:

UPDATE T_SALES SET DATE_SHIPPED =:TU.DATE_SHIPPED,


TOTAL_SALES = :TU.TOTAL_SALES WHERE EMP_NAME = :TU.EMP_NAME and
EMP_NAME = 'MIKE SMITH'

If you modify the UPDATE portion of the statement, be sure to use: TU to specify ports.

SCD Type-II Effective-Date Approach


1. We have one of the dimensions in current project called resource dimension. Here we are
maintaining the history to keep track of SCD changes.
2. To maintain the history in slowly changing dimension or resource dimension. We followed SCD
Type-II Effective-Date approach.
3. My resource dimension structure would be eff-start-date, eff-end-date, s.k and source columns.
4. Whenever I do a insert into dimension I would populate eff-start-date with sysdate, eff-end-date
31

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

with future date and s.k as a sequence number.


5. If the record already present in my dimension but there is change in the source data. In that case
what I need to do is
6. Update the previous record eff-end-date with sysdate and insert as a new record with source
data.
Informatica design to implement SCD Type-II effective-date approach
1. Once you fetch the record from source qualifier. We will send it to lookup to find out whether the
record is present in the target or not based on source primary key column.
2. Once we find the match in the lookup we are taking SCD column from lookup and source
columns from SQ to expression transformation.
3. In lookup transformation we need to override the lookup override query to fetch Active records
from the dimension while building the cache.
4. In expression transformation I can compare source with lookup return data.
5. If the source and target data is same then I can make a flag as ‘S’.
6. If the source and target data is different than I can make a flag as ‘U’.
7. If source data does not exists in the target that means lookup returns null value. I can flag it as ‘I’.
8. Based on the flag values in router I can route the data into insert and update flow.
9. If flag=’I’ or ‘U’ I will pass it to insert flow.
10. If flag=’U’ I will pass this record to eff-date update flow
11. When we do insert we are passing the sequence value to s.k.
12. Whenever we do update we are updating the eff-end-date column based on lookup return s.k
value.

Complex Mapping
1. We have one of the order file requirement. Requirement is every day in source system they will
place filename with timestamp in informatica server.
2. We have to process the same date file through informatica.
3. Source file directory contain older than 30 days files with timestamps.
4. For this requirement if I hardcode the timestamp for source file name it will process the same file
every day.
5. So what I did here is I created $InputFilename for source file name.
6. Then I am going to use the parameter file to supply the values to session variables
($InputFilename).
7. To update this parameter file I have created one more mapping.
8. This mapping will update the parameter file with appended timestamp to file name.
9. I make sure to run this parameter file update mapping before my actual mapping.

How to handle errors in informatica?


1. We have one of the source with numerator and denominator values we need to calculate
num/deno
2. When populating to target.
3. If deno=0 I should not load this record into target table.
4. We need to send those records to flat file after completion of 1st session run. Shell script will check
the file size.

32

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

5. If the file size is greater than zero then it will send email notification to source system POC (point of
contact) along with deno zero record file and appropriate email subject and body.
6. If file size<=0 that means there is no records in flat file. In this case shell script will not send any
email notification.
7. Or
8. We are expecting a not null value for one of the source column.
9. If it is null that means it is a error record.
10. We can use the above approach for error handling.

Why we need source qualifier?


Simply it performs select statement.
Select statement fetches the data in the form of row.
Source qualifier will select the data from the source table.
It identifies the record from the source.

Parameter file it will supply the values to session level variables and mapping level variables.
Variables are of two types:
1. Session level variables
2. Mapping level variables
Session level variables:
Session parameters, like mapping parameters, represent values you might want to change between
sessions, such as a database connection or source file. Use session parameters in the session properties,
and then define the parameters in a parameter file. You can specify the parameter file for the session to
use in the session properties. You can also specify it when you use pmcmd to start the session.The
Workflow Manager provides one built-in session parameter, $PMSessionLogFile.With $PMSessionLogFile,
you can change the name of the session log generated for the session.The Workflow Manager also allows
you to create user-defined session parameters.

Naming Conventions for User-Defined Session Parameters


Parameter Type Naming Convention
Database Connection $DBConnectionName
Source File $InputFileName
Target File $OutputFileName
Lookup File $LookupFileName
Reject File $BadFileName

Use session parameters to make sessions more flexible. For example, you have the same type of
transactional data written to two different databases, and you use the database connections TransDB1
and TransDB2 to connect to the databases. You want to use the same mapping for both tables. Instead
of creating two sessions for the same mapping, you can create a database connection parameter,
$DBConnectionSource, and use it as the source database connection for the session. When you create a
parameter file for the session, you set $DBConnectionSource to TransDB1 and run the session. After the
session completes, you set $DBConnectionSource to TransDB2 and run the session again.
You might use several session parameters together to make session management easier. For example,
you might use source file and database connection parameters to configure a session to read data from
33

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

different source files and write the results to different target databases. You can then use reject file
parameters to write the session reject files to the target machine. You can use the session log parameter,
$PMSessionLogFile, to write to different session logs in the target machine, as well.
When you use session parameters, you must define the parameters in the parameter file. Session
parameters do not have default values. When the PowerCenter Server cannot find a value for a session
parameter, it fails to initialize the session.
Mapping level variables are of two types:
1. Variable
2. Parameter

What is the difference between mapping level and session level variables?
Mapping level variables always starts with $$.
A session level variable always starts with $.

Flat File
Flat file is a collection of data in a file in the specific format.
Informatica can support two types of files
1. Delimiter
2. Fixed Width
In delimiter we need to specify the separator.
In fixed width we need to known about the format first. Means how many character to read for particular
column.
In delimiter also it is necessary to know about the structure of the delimiter. Because to know about the
headers.
If the file contains the header then in definition we need to skip the first row.

List file:
If you want to process multiple files with same structure. We don’t need multiple mapping and multiple
sessions.
We can use one mapping one session using list file option.
First we need to create the list file for all the files. Then we can use this file in the main mapping.

Parameter file Format:


It is a text file below is the format for parameter file. We use to place this file in the unix box where we
have installed our informatic server.

[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.ST:s_m_G
EHC_APO_BAAN_SALES_HIST_AUSTRI]
$InputFileName_BAAN_SALE_HIST=/interface/dev/etl/apo/srcfiles/HS_025_20070921
$DBConnection_Target=DMD2_GEMS_ETL
$$CountryCode=AT
$$CustomerNumber=120165
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.ST:s_m_G
EHC_APO_BAAN_SALES_HIST_BELUM]
$DBConnection_Sourcet=DEVL1C1_GEMS_ETL
34

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

$OutputFileName_BAAN_SALES=/interface/dev/etl/apo/trgfiles/HS_002_20070921
$$CountryCode=BE
$$CustomerNumber=101495

How do you perform incremental logic or Delta or CDC?


Incremental means suppose today we processed 100 records ,for tomorrow run u need to extract
whatever the records inserted newly and updated after previous run based on last updated timestamp
(Yesterday run) this process called as incremental or delta.
Approach_1: Using set max var ()
1. First need to create mapping var ($$Pre_sess_max_upd) and assign initial value as old date
(01/01/1940).
2. Then override source qualifier query to fetch only LAT_UPD_DATE >=$$Pre_sess_max_upd
(Mapping var)
3. In the expression assign max last_upd_date value to $$Pre_sess_max_upd(mapping var) using
set max var
4. Because its var so it stores the max last upd_date value in the repository, in the next run our
source qualifier query will fetch only the records updated or inseted after previous run.
Approach_2:Using parameter file
1. First need to create mapping parameter ($$Pre_sess_start_tmst) and assign initial value as old
date (01/01/1940) in the parameterfile.
2. Then override source qualifier query to fetch only LAT_UPD_DATE >=$$Pre_sess_start_tmst
(Mapping var)
3. Update mapping parameter($$Pre_sess_start_tmst) values in the parameter file using shell script or
another mapping after first session get completed successfully
4. Because it’s mapping parameter so every time we need to update the value in the parameter
file after comptetion of main session.

Approach_3: Using oracle Control tables


1. First we need to create two control tables cont_tbl_1 and cont_tbl_2 with structure of
session_st_time,wf_name
2. Then insert one record in each table with session_st_time=1/1/1940 and workflow_name
3. Create two store procedures one for update cont_tbl_1 with session st_time, set property of store
procedure type as Source_pre_load .
4. In 2nd store procedure set property of store procedure type as Target _Post_load.this proc will
update the session _st_time in Cont_tbl_2 from cnt_tbl_1.
5. Then override source qualifier query to fetch only LAT_UPD_DATE >= (Select session_st_time from
cont_tbl_2 where workflow name=’Actual work flow name’.
Approach_1: Using set max var ()
1. First need to create mapping var ($$INCREMENT_TS) and assign initial value as old date
(01/01/1940).
2. Then override source qualifier query to fetch only LAT_UPD_DATE >=($$INCREMENT_TS (Mapping
var)
3. In the expression assign max last_upd_date value to ($$INCREMENT_TS (mapping var) using set
max var

35

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

4. Because its var so it stores the max last upd_date value in the repository, in the next run our
source qualifier query will fetch only the records updated or inseted after previous run.

Logic in the mapping variable is

36

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Logic in the SQ is

37

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

In expression assign max last update date value to the variable using function set max variable.

38

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

39

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Logic in the update strategy is below

40

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

41

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Approach_2:Using parameter file


First need to create mapping parameter ($$LastUpdateDate Time) and assign initial value as old date
(01/01/1940) in the parameterfile.
Then override source qualifier query to fetch only LAT_UPD_DATE >=($$LastUpdateDate Time (Mapping
var)
Update mapping parameter ($$LastUpdateDate Time) values in the parameter file using shell script or
another mapping after first session get completed successfully
Because its mapping parameter so every time we need to update the value in the parameter file after
comptetion of main session.
Parameterfile:

[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.ST:s_m_GE
HC_APO_BAAN_SALES_HIST_AUSTRI]
$DBConnection_Source=DMD2_GEMS_ETL
$DBConnection_Target=DMD2_GEMS_ETL
$$LastUpdateDate Time =01/01/1940

Updating parameter File

42

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Logic in the expression

Main mapping

43

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Sql override in SQ Transformation

Workflod Design

44

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Informatica Tuning

The aim of performance tuning is optimize sessionperformance so sessions run during the available load
windowfor the Informatica Server. Increase the session performance by following.

The performance of the Informatica Server is related tonetwork connections. Data generally moves
across a networkat less than 1 MB per second, whereas a local disk movesdata five to twenty times faster.
Thus network connectionsoften affect on session performance.

1. Cache lookups if source table is under 500,000 rows and DON’T cache for tables over 500,000
rows.
2. Reduce the number of transformations. Don’t use an Expression Transformation to collect fields.
Don’t use an Update Transformation if only inserting. Insert mode is the default.
3. If a value is used in multiple ports, calculate the value once (in a variable) and reuse the result
instead of recalculating it for multiple ports.
4. Reuse objects where possible.
5. Delete unused ports particularly in the Source Qualifier and Lookups.
6. Use Operators in expressions over the use of functions.
7. Avoid using Stored Procedures, and call them only once during the mapping if possible.
8. Remember to turn off Verbose logging after you have finished debugging.

45

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

9. Use default values where possible instead of using IIF (ISNULL(X),,) in Expression port.
10. When overriding the Lookup SQL, always ensure to put a valid Order By statement in the SQL. This
will cause the database to perform the order rather than Informatica Server while building the
Cache.
11. Improve session performance by using sorted data with the Joiner transformation. When the
Joiner transformation is configured to use sorted data, the Informatica Server improves
performance by minimizing disk input and output.
12. Improve session performance by using sorted input with the Aggregator Transformation since it
reduces the amount of data cached during the session.
13. Improve session performance by using limited number of connected input/output or output ports
to reduce the amount of data the Aggregator transformation stores in the data cache.
14. Use a Filter transformation prior to Aggregator transformation to reduce unnecessary
aggregation.
15. Performing a join in a database is faster than performing join in the session. Also use the Source
Qualifier to perform the join.
16. Define the source with less number of rows and master source in Joiner Transformations, since this
reduces the search time and also the cache.
17. When using multiple conditions in a lookup conditions, specify the conditions with the equality
operator first.
18. Improve session performance by caching small lookup tables.
19. If the lookup table is on the same database as the source table, instead of using a Lookup
transformation, join the tables in the Source Qualifier Transformation itself if possible.
20. If the lookup table does not change between sessions, configure the Lookup transformation to
use a persistent lookup cache. The Informatica Server saves and reuses cache files from session to
session, eliminating the time required to read the lookup table.
21. Use :LKP reference qualifier in expressions only when calling unconnected Lookup
Transformations.
22. Informatica Server generates an ORDER BY statement for a cached lookup that contains all
lookup ports. By providing an override ORDER BY clause with fewer columns, session performance
can be improved.
23. Eliminate unnecessary data type conversions from mappings.
24. Reduce the number of rows being cached by using the Lookup SQL Override option to add a
WHERE clause to the default SQL statement.
Tuning

Tuning a PowerCenter 8 ETL environment is not that straightforward. A chain is only as strong as the
weakest link. There are four crucial domains that require attention: system, network, database and the
PowerCenter 8 installation itself. It goes without saying that without a well performing infrastructure the
tuning of the PowerCenter 8 environment will not make much of a difference.

As the first three domains are located in the realms of administrators, this article will only briefly touch these
subjects and will mainly focus on the items available to developers within PowerCenter 8.

Tuning is an iterative process: at each iteration the largest bottleneck is removed, gradually improving
performance. Bottlenecks can occur on the system, on the database (either source or target), or within
the mapping or session ran by the Integration Service. To identify bottlenecks, run test sessions, monitor the
system usage and gather advanced performance statistics while running. Examine the session log in detail
as it provides valuable information concerning session performance. From the perspective of a developer,

46

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

search for performance problems in the following order:

1. source / target
2. mapping
3. session
4. system

If tuning the mapping and session still proves to be inadequate, the underlying system will need to be
examined closer. This extended examination needs to be done in close collaboration with the system
administrators and database administrators (DBA). They have several options to improve performance
without invoking hardware changes. Examples are distributing database files over different disks,
improving network bandwidth and lightening the server workload by moving other applications. However
if none of this helps, only hardware upgrades will bring redemption to your performance problems.

Session logs

The PowerCenter session log provides very detailed information that can be used to establish a baseline
and will identify potential problems.

Very useful for the developer are the detailed thread statistics that will help benchmarking your actions.
The thread statistics will show if the bottlenecks occur while transforming data or while reading/writing.
Always focus attention on the thread with the highest busy percentage first. For every thread, detailed
information on the run and idle time are presented. The busy percentage is calculated: (run time – idle
time) / (run time * 100).

Each session has a minimum of three threads:

1. reader thread
2. transformation thread
3. writer thread

An example:

***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] *****
Thread [READER_1_1_1] created for [the read stage] of partition point [SQ_X_T_CT_F_SITE_WK_ENROLL] has
completed: Total Run Time = [31.984171] secs, Total Idle Time = [0.000000] secs, Busy Percentage =
[100.000000].
Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point
[SQ_X_T_CT_F_SITE_WK_ENROLL] has completed: Total Run Time = [0.624996] secs, Total Idle Time =
[0.453115] secs, Busy Percentage = [27.501083].
Thread [WRITER_1_*_1] created for [the write stage] of partition point [T_CT_F_SITE_WK_BSC] has
completed: Total Run Time = [476.668825] secs, Total Idle Time = [0.000000] secs, Busy Percentage =
[100.000000].

In this particular case it is obvious that the database can be considered as the main bottleneck. Both
reading and writing use most of the execution time. The actual transformations only use a very small
47

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

amount of time. If a reader or writer thread is 100% busy, consider partitioning the session. This will allow the
mapping to open several connections to the database, each reading/writing data from/to a partition
thus improving data read/write speed.

Severity Timestamp Node Thread Message Code Message


INFO 23/Dec/2008 09:02:22 node_etl02 MANAGER PETL_24031

***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] *****
Thread [READER_1_1_1] created for [the read stage] of partition point [SQ_T_CT_F_SITE_WK_BSC] has
completed. The total run time was insufficient for any meaningful statistics.

Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point [SQ_T_CT_F_SITE_WK_BSC]
has completed: Total Run Time = [22.765478] secs, Total Idle Time = [0.000000] secs, Busy Percentage =
[100.000000].
Thread [WRITER_1_*_1] created for [the write stage] of partition point [T_CT_F_SITE_WK_BSC] has
completed: Total Run Time = [30.937302] secs, Total Idle Time = [20.345600] secs, Busy Percentage =
[34.23602355].

In the example above, the transformation thread poses the largest bottleneck and needs to be dealt with
first. The reader thread finished so quickly no meaningful statistics were possible. The writer thread spends
the majority of time in the idle state, waiting for data to emerge from the transformation thread. Perhaps
an unsorted aggregator is used, causing the Integration Service to sort all data before releasing any
aggregated record?

The number of threads can increase if the sessions will read/write to multiple targets, if sessions have
multiple execution paths, if partitioning is used …

Establishing a baseline

To be able to benchmark the increase or decrease of performance following an action it is important to


establish a baseline to compare with. It is good practice to log in detail every iteration in the tuning
process. This log will enable the developer to clearly identify the actions that enhanced or decreased
performance and serve as later reference for future tuning efforts. Never assume that an action will
improve performance because the action is a best practice or worked before: always test and compare
with hard figures. The thread statistics are used to build this log.

This log file could look like this:

48

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

The image part with relationship ID rId37 was not found in the file.

Optimally reading sources

Reading from sources breaks down into two distinct categories: reading relational sources and reading flat
files. Sometimes, both source types are combined in a single mapping.
A homogeneous join is a join between relational sources that combine data from a single origin: for
example a number of Oracle tables being joined.
A heterogeneous joins is a join between sources that combine data from different origins: when for
example Oracle-data is joined with a flat file.

Whatever source you are trying to read, always try to limit the incoming data stream maximally. Place
filters as early as possible in the mapping, preferably in the source qualifier. This will ensure only data
needed by the Integration Services is picked up from the database and transported over the network. If
you suspect the performance of reading relational data is not optimal, replace the relational source with a
flat file source containing the same data. If there is a difference in performance, the path towards the
source database should be investigated more closely, such as execution plan of the query, database
performance, network performance, network package sizes,…

When using homogeneous relational sources, use a single source qualifier with a user defined join instead
of a joiner transformation. This will force the join being executed on the database instead of the
PowerCenter 8 platform. If a joiner transformation is used instead, all data is first picked up from the
database server, then transported to the PowerCenter 8 platform, sorted and only as a last step joined by
the Integration server.
Consider pre-sorting the data in the database, this will make further sorting for later aggregators, joiners,…
by the Integration Service unnecessary. Make sure the query executed on the database has a favourable
execution plan. Use the explain plan (Oracle) facility to verify the query's execution plan if indexes are
optimally used. Do not use synonyms or database links unless really needed as these will slow down the
data stream.

In general it is good practice to always generate keys for primary key and foreign key fields. If no key is
available or known a dummy key should be used. In the reference table an extra dummy record should be
49

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

inserted. This method will improve join performance when using homogenous joins. In general three dummy
rows should be included:

1. 999999 Not applicable

2. 999998 Not available

3. 999997 Missing

When using heterogeneous sources there is no alternative but to use a joiner transformation. To ease up
matters for the Integration service, ensure that all relational sources are sorted and joined in advance in the
source qualifier. Flat file sources need to be sorted before joining, using a sorter transformation. When
joining the 2 sorted sources, check the sorted input property at the joiner transformation. The sorted input
option allows the joiner transformation to start passing records to subsequent transformations as soon as the
key value changes. Normal behaviour would be to hold passing data until all data is sorted and processed
in the joiner transformation.

By matching the session property Line Sequential buffer length to the size of exactly one record overhead is
minimized. If possible stage flat files in a staging table. Joining and filtering can then be done in the
database.

Optimally writing to targets

One of the most common performance issues in PowerCenter 8 is slow writing to target databases. This is
usually caused by a lack of database or network performance. You can test for this behaviour by replacing
the relational target with a flat file target. If performance increases considerably it is clear something is
wrong with the relational target.
Indexes are usually the root cause of slow target behaviour. In Oracle, every index on a table will decrease
the performance of an insert statement by 25%. The more indexes are defined, the slower insert/update
statements will be. Every time an update/insert statement is executed, the indexes need to be updated as
well. Try dropping the indexes on the target table. If this does not increases performance, the network is
likely causing the problem.

In general avoid having too many targets in a single mapping. Increasing the commit interval will decrease
the amount of session overhead. Three different commit types are available for targets:

1. target based commit: fastest

2. source base commit: in between

3. user defined commit: slowest , avoid using user defined commit when not really necessary

There are two likely scenarios when writing to targets:

Session is only inserting data

PowerCenter has two methods for inserting data in a relational target: normal or bulk loads. Normal loads
will generate DML-statements. Bulk loads will bypass the database log and are available for DB2, Sybase,
Oracle (SQL Loader), or Microsoft SQL Server. This loading method has a considerable performance gain
50

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

but has two drawbacks: the recovery of a session will not be possible as no rollback data is kept by the
database. When bulk loading, the target table cannot have any indexes defined upon, so drop and
recreate the indexes before and after the session. For every case you will have to weigh if dropping and
recreating the indexes while using a bulk load outperforms a classic insert-statement with all indexes in
place.

Remember to use a very large commit interval when using bulk loads with Oracle and Microsoft to avoid
unnecessary overhead. Dropping and recreating indexes can be done by using pre -and post session tasks
or by calling a stored procedure within the mapping.

Session is mainly updating a limited set of data

When the session is updating a set of records in a large table, the use of a primary key or unique index is
absolutely necessary. Be sure to check the explain plan and verify the proper index usage. Sometimes it is
faster to only keep unique indexes while loading data and dropping the non-unique indexes not needed
by the session. These indexes can be recreated at the end of the session.

Clever mapping logic

Now data is being read and written in the most optimal way, it is time to focus our attention to the actual
mapping. The basic idea is simple: minimize the incoming data stream and create as little as possible
overhead within the mapping. A first step in achieving this goal is to reduce the number of transformations
to a minimum by reusing common logic. Perhaps the use of the same lookup in different pipes could be
redesigned to only use the lookup once? By using clever caching strategies, cache can be reused in the
mapping or throughout the workflow. Especially with active transformations (transformations where the
number of records is being changed) the use of caching is extremely important. Active transformations
that reduce the number of records should be placed as early as possible in the mapping.

Data type conversions between transformations in the mapping are costly. Be sure to check if all explicit
and implicit conversions really are necessary. When the data from a source is passed directly to a target
without any other actions to be done, connect the source qualifier directly to the target without the use of
other transformations.

Single pass reading allows multiple targets being populated using the data from the same source qualifier.
Consider using single pass reading if there are multiple sessions using the same source: the existing mapping
logic can be combined by using multiple pipelines. Common data manipulations for all pipelines should be
done before splitting out the pipeline.

By times it is better not to create mappings at all: staging mappings could be replaced by snapshots or
replication in the database. Databases are specialized in these types of data transfer and are in general
far more efficient in processing then passing the data through PowerCenter .

Transformation Mechanics

Every transformation has its specifics related to performance. In the section below the most important items
are discussed.
51

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

A joiner transformation should be used to join heterogeneous data sources. Homogeneous sources should
always be joined in the database by using the user defined join in the source qualifier transformation.

If not sorted at database level, always use a sorter transformation to sort the data before entering the joiner
transformation. Make sure the sorter transformation has sufficient cache to enable a 1-pass sort. Not having
sufficient cache will plummet performance. The data could be sorted in the joiner, but there are three
advantages of using the sorter transformation:

The use of sorted input enables the joiner transformation to start passing data to subsequent
transformations before all data was passed in the joiner transformation. Consequently, the transformations
following the joiner transformation start receiving data nearly immediately and do not have to wait until all
the data was sorted and joined in the joiner transformation. This logic is only valid when the source can be
sorted in the database: for example when joining SQL-Server and Oracle. Both sources can be sorted in the
database, making additional sorting using sorters superfluous. When a sorter is needed, for example when
joining Oracle and a flat file, the sorter will have to wait until all data is read from the flat file before records
to the joiner transformation can be passed.

The sorting algorithm used in the sorter is faster then the algorithm used in joiners or aggregators.

The use of sorted input in the joiner transformation, allows for a smaller cache size, leaving more memory for
other transformations or sessions. Again, when a flat file is used, a sorter will be needed prior to the joiner
transformation. Although the joiner transformation uses less cache, the sorter cache will need to be
sufficiently large to enable sorting all input records.
The image part with relationship ID rId37 was not found in the file.

As outer joins are far more expensive than inner joins, try to avoid them as much as possible. The master
source should be designated as the source containing fewer rows then the detail source. Join as early as
possible in the pipeline as this limits the number of pipes and decreases the amount of data being sent to
other transformations.

Only use a filter transformation for non-relational sources. When using relational sources, filter in the source
qualifier. Filter as early as possible in the data stream. Try filtering by using numeric lookup conditions.
Numeric matching is considerably faster then the matching of strings. Avoid complex logic in the filter
52

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

condition. Be creative in rewriting complex expressions to the shortest possible length. When multiple filters
are needed, consider using a router transformation as this will simplify mapping logic.

A lookup transformation is used to lookup values from another table. Clever use of lookup caches can
make a huge difference in performance.

By default, lookup transformations are cached. The selected lookup fields from the lookup table are read
into memory and a lookup cache file is built every time the lookup is called. To minimize the usage of
lookup cache, only retrieve lookup ports that are really needed.

However, to cache or not to cache a lookup really depends on the situation. An uncached lookup makes
perfect sense if only a small percentage of lookup rows will be used. For example if we only need 200 rows
in a 10 000 000 rows table. In this particular case, building the lookup cache would require an extensive
amount of time. A direct select to the database for every lookup row will be much faster on the condition
that the lookup key in the database is indexed.

Sometimes a lookup is used multiple times in an execution path of a mapping or workflow. Re-caching the
lookup every time would be time consuming and unnecessary, as long as the lookup source table remains
unchanged. The persistent lookup cache property was created to handle this type of situation. Only when
calling the lookup the first time, the lookup cache file is refreshed. All following lookups reuse the persistent
cache file. Using a persistent cache can improve performance considerably because the Integration
Service builds the memory cache from the cache files instead of the database.

Use dynamic lookup cache when the lookup source table is a target in the mapping and updated
dynamically throughout the mapping. Normal lookups caches are static. The records that were inserted in
the session are not available to the lookup cache. When using dynamic lookup cache, newly inserted or
updated records are update in the lookup cache immediately.

Ensure sufficient memory cache is available for the lookup. If not, the Integration server will have to write to
disk, slowing down.

By using the Additional Concurrent Pipelines property at session level, lookup caches will start building
concurrently at the start of the mapping. Normal behaviour would be that a lookup cache is created only
when the lookup is called. Pre-building caches versus building caches on demand can increase the total
session performance considerably, but only when the pre-build lookups will be used for sure in the session.
Again, the performance gain of setting this property will depend on the particular situation.

An aggregator transformation is an active transformation, used to group and aggregate data. If the input
was not sorted already, always use a sorter transformation in front of the aggregator transformation. As with
the joiner transformation the aggregator transformation will accumulate data until the dataset is complete
and only starts processing and sending output records from there on. When sorted input is used, the
aggregator will process and sent output records as soon as the first set of records is complete. This will allow
for much faster processing and smaller caches. Use as little as possible functions and difficult nested
conditions. Especially avoid the use of complex expressions in the group by ports. If needed use an
expression transformation to build these expressions in advance. When using change data capture,
incremental aggregation will enhance performance considerably.

53

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Sometimes, simple aggregations can be done by using an expression transformation that uses variables. In
certain cases this could be a valid alternative for an aggregation transformation.

Expression transformations are generally used to calculate variables. Try not to use complex nested
conditions, use decode instead. Functions are more expensive than operators; avoid using functions if the
same can be achieved by using operators. Implicit data type conversion is expensive. Try to convert data
types as little as possible. Working with numbers is generally faster then working with strings. Be creative in
rewriting complex expressions to the shortest possible length.

The use of a sequence generator transformation versus a database sequence depends on the load
method of the target table. If using bulk loading, database sequences cannot be used. The sequence
generator transformation can overcome this problem. Every row is given a unique sequence number.
Typically a number of values are cached for performance reasons.

There is however a big catch. Unused sequence numbers at the end of the session are lost. The next time
the session is run, the sequence generator will cache a new batch of numbers.

For example: a sequence generator caches 10000 values. 10000 rows are loaded, using the cached
values. At row 10001, a new batch of sequence values is cached: from 10000 to 20000. However the last
row in the session is 10002. All values between 10002 and 20000 are lost. Next time the session is ran, the first
inserted row will have a key of 20001.

To avoid these gaps use a sequence generator in combination with an unconnected lookup. First look up
the latest key value in the target table. Then use an expression that will call the sequence generator and
add one to the key value that was just retrieved. The sequence generator should restart numbering at
every run. There are some advantages to this approach:

1. a database sequence is emulated and bulk loads remain possible.

2. if rows are deleted, gaps between key values will be caused by deletes and not by caching issues.

3. the limits of a sequence will not be reached so quickly

As an added advantage, the use of this method will prevent migration problems with persistent values
between repositories. This method is easy to implement and does not imply a performance penalty while
running the session.

Clean up error handling

Dealing with transformation errors in advance can save a lot of time while executing a session. By default,
every error row is written into a bad file, which is a text based file containing all error rows. On top of that
the error data is logged into the session log that, if sufficient transformation errors occur, will explode in size.
Both cause a lot of extra overhead and slow down a session considerably.

In general, it is better to capture all data quality issues that could cause transformation errors in advance
and write flawed records into an error table.

54

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Collecting Advanced Performance data

To really get into a session and understand exactly what is happening within a session, even more detailed
performance data than available into the session log can be captured. This can be done by, at session
level, enabling the checkbox ‘Collect performance data'.

This option will allow the developer to see detailed transformation based statistics in the Workflow Monitor
while running the session. When finished a performance file is written to the session log folder. For every
source qualifier and target definition performance details are provided, along with counters that show
performance information about each transformation.

A number of counters are of particular interest to the developer:

1. errorrows: should always be zero, if errors occur, remove the cause

2. readfromdisk/writetodisk: indicates not enough cache memory is available. Increase the cache
size until this counter is no longer shown.

55

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

The image part with relationship ID rId37 was not found in the file.

Memory Optimization

Memory plays an important role when the Integration Service is running sessions. Optimizing cache sizes
can really make a huge difference in performance.

Buffer memory is used to hold source and target data while processing and is allocated when the session is
initialized. DTM Buffer is used to create the internal data structures. Buffer blocks are used to bring data in
and out of the Integration Service. Increasing the DTM buffer size will increase the amount of blocks
available to the Integration Service. Ideally a buffer block can contain 100 rows at the same time.
You can configure the amount of buffer memory yourself or you can configure the Integration Service to
automatically calculate buffer settings at run time. In stead of calculating all values manually or by trial
and error, run the session once on auto and retrieve the correct values from the session log:

56

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Severity Timestamp Node Thread Message Code Message


INFO 12/30/2008 1:23:57 AM INFO8_ASPMIS003 MAPPING TM_6660 Total Buffer Pool size is 90000000 bytes
and Block size is 65536 bytes.

The Integration server uses the index and data caches for mostly active transformations: aggregator, joiner,
sorter, lookup, rank…

Configuring the correct amount of cache is really necessary as the Integration Server will write and read
from disk if not properly sized. The index cache should be about half of the data cache. Cache files should
be stored on a fast drive and surely not on a network share.
The easiest way of calculating the correct cache sizes is by keeping the defaults on auto and examining
the session log. In the session log a line is written for every lookup that looks like this:

Severity Timestamp Node Thread Message Code Message


INFO 12/30/2008 1:26:11 AM INFO8_ASPMIS003 LKPDP_2:TRANSF_1_1 DBG_21641 LKP_VORIG_RECORD: Index
cache size = [12000800], Data cache size = [24002560]
Copy these values into the session properties. Be sure to verify the performance counters to validate no disk
reads/writes are done.
Sorter transformations need special attention concerning cache sizes as well. If not enough cache is
available, the sorter will require a multi pass sort, dropping the session performance. If so, a warning will be
displayed in the session log:

TRANSF_1_1_1> SORT_40427 Sorter Transformation [srt_PRESTATIE] required 4-pass sort (1-pass temp I/O:
19578880 bytes). You may try to set the cache size to 27 MB or higher for 1-pass in-memory sort.

The maximum amount of memory used by transformation caches is set by two properties:

1. Maximum Memory Allowed for Automatic Memory Attributes

2. Maximum Percentage of Total Memory Allowed for Automatic Memory Attributes

The smaller of the two is used. When the value is 0, the automatic memory attributes are disabled. If this
value is set too low, an error will occur if a lookup with manually configure cache wants to allocate more
memory then available. Keep in mind that sessions can run in parallel: every session will try to allocate RAM-
memory.

Ensure plenty of RAM-memory is available for the Integration Service. Do not assume that adding cache
memory will increase performance, at a certain point optimum performance is reached and adding
further memory will not be beneficial.

Further session optimization

High precision

The high precision mode will allow using decimals up to a precision of 28 digits. Using this kind of precision
will result in a performance penalty in reading and writing data. It is therefore recommended to disable

57

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

high precision when not really needed. When turned off, decimals are converted to doubles that have a
precision up to 15 digits.

Concurrent sessions

Depending on the available hardware, sessions can be run concurrently instead of sequential. At
Integration Service level the number of concurrent sessions can be set. This value is set default to 10.
Depending on the number of CPU's at the PowerCenter server and at source and target database this
value can be increased or decreased. The next step is in designing a workflow that will launch a number of
sessions concurrently. By trial and error an optimal sampling can be achieved.

Session logging

The amount detail in a session log is determined by the tracing level. This level ranges from ‘Terse' to
‘Verbose Data'. For debugging or testing purposes the ‘Verbose Data' option will trace every row that
passes in the mapping in the session log. At terse, only initialization information, error messages, and
notification of rejected data are logged. It is quite clear the ‘Verbose Data' option causes a severe
performance penalty.

For lookups, use the ‘Additional Concurrent Pipelines for Lookup Creation' to start building lookups as soon
as the session is initialized. By the time the lookups are needed in the session, the cache creation hopefully
is already finished.

Partitioning

If a transformation thread is 100% busy, consider adding a partition point in the segment. Pipeline
partitioning will allow for parallel execution within a single session. A session will have multiple threads for
processing data concurrently. Processing data in pipeline partitions can improve performance, but only if
enough CPU's are available. As a rule of thumb, 1.5 CPU's should be available per partition. Adding a
partition point will increase the number of pipeline stages. This means a transformation will logically be used
a number of times, so remember to multiply the cache memory of transformations, session,..by the number
of partitions. Partitioning can be specified on sources / targets and the mapping transformations.

58

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Using partitioning requires the ‘Partition Option' in the PowerCenter license.


The image part with relationship ID rId37 was not found in the file.

Pushdown Optimization

Pushdown optimization will push the transformation processing to the database level without extracting the
data. This will reduce the movement of data when source and target are in the same database instance.
Possibly, more optimal specific database processing can be used to even further enhancing performance.
The metadata en lineage however is kept in PowerCenter.

Three different options are possible:

1. Partial pushdown optimization to source: one or more transformations can be processed in the
source
2. Partial pushdown optimization to target: one or more transformations can be processed in the
target
3. Full pushdown optimization: all transformations can be processed in the database A number of
transformations are not supported for pushdown: XML, ranker, router, Normalizer, Update Strategy,

Pushdown optimization can be used with sessions with multiple partitions, if the partition types are pass-
through of key range partitioning. You can configure a session for pushdown optimization in the session
properties. Use the Pushdown Optimization Viewer to examine the transformations that can be pushed to
the database. Using pushdown requires the ‘Pushdown Optimization Option' in the PowerCenter license.

Architecture

64-Bit PowerCenter versions will allow better memory usage as the 2GB limitation is removed. When
PowerCenter 8 is run on a grid, the workflows can be configured to use resources efficiently and maximize
scalability. Within a grid, tasks are distributed to nodes. To improve performance on a grid, the network
bandwidth between the nodes is of importance as a lot of data is transferred between the nodes. This data
should always be stored on local disks for optimal performance. This includes the caches and any source
and target file. Of course even 64-bit computing will not help if the system is not properly setup. Make sure
plenty of disk space is available at the PowerCenter 8 server.
59

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

For optimal performance, consider running the Integration service in ASCII data movement mode when all
sources and targets use 7 or 8-bit ASCII as UNICODE can take up to 16 bits.

The repository database should be located on the PowerCenter machine. If not, the repository database
should be physically separated from any target or source database. This will prevent the same database
machine is writing to a target while reading from the repository. Always use native connections over ODBC
connections as they are a lot faster. Maximize the use of parallel operations on the database. The use of
parallelism will cut execution times considerably. Remove any other application from the PowerCenter
server apart from the repository database installation.

Increase the database network packet size to further improve performance. For Oracle this can be done in
the listener.ora and tnsnames.ora. Each database vender has some specific options that can be beneficial
for performance. For Oracle, the use of IPC protocol over TCP will result in a performance gain of at least
by a factor 2 to 6. Inter Process Control (IPC) will remove the network layer between the client and Oracle
database-server. This can only be used if the database is residing on the same machine as the
PowerCenter 8 server. Check the product documentation for further documentation.

By careful load monitoring of the target/source databases and the servers of PowerCenter and databases
while running a session, potential bottlenecks at database or system level can be identified. Perhaps the
database memory is insufficient? Perhaps too much swapping is occurring on the PowerCenter 8 server ?
Perhaps the CPU's are overloaded ?
The tuning of servers and database is just as important as delivering an optimized mapping and should not
be ignored. Tuning a system for a datawarehouse poses different challenges then tuning a system for an
OLTP-application. Try to involve DBA's and admins as soon as possible in this process so they fully
understand the sensitivities involved with data warehousing.

Development Guidelines& UTP


The starting point of the development is the logical model created by the Data Architect. This logical
model forms the foundation for metadata, which will be continuously be maintained throughout the Data
Warehouse Development Life Cycle (DWDLC). The logical model is formed from the requirements of the
project. At the completion of the logical model technical documentation defining the sources, targets,
requisite business rule transformations, mappings and filters. This documentation serves as the basis for the
creation of the Extraction, Transformation and Loading tools to actually manipulate the data from the
applications sources into the Data Warehouse/Data Mart.

To start development on any data mart you should have the following things set up by the Informatica
Load Administrator

1. Informatica Folder. The development team in consultation with the BI Support Group can decide
a three-letter code for the project, which would be used to create the informatica folder as well
as Unix directory structure.

60

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

2. Informatica Userids for the developers


3. Unix directory structure for the data mart.
4. A schema XXXLOAD on DWDEV database.

Transformation Specifications

Before developing the mappings you need to prepare the specifications document for the mappings you
need to develop. A good template is placed in the templates folder You can use your own template as
long as it has as much detail or more than that which is in this template.
While estimating the time required to develop mappings the thumb rule is as follows.
1. Simple Mapping – 1 Person Day
2. Medium Complexity Mapping – 3 Person Days
3. Complex Mapping – 5 Person Days.
Usually the mapping for the fact table is most complex and should be allotted as much time for
development as possible.

Data Loading from Flat Files


It’s an accepted best practice to always load a flat file into a staging table before any transformations
are done on the data in the flat file.
Always use LTRIM, RTRIM functions on string columns before loading data into a stage table.
You can also use UPPER function on string columns but before using it you need to ensure that the data is
not case sensitive (e.g. ABC is different from Abc)
If you are loading data from a delimited file then make sure the delimiter is not a character which could
appear in the data itself. Avoid using comma-separated files. Tilde (~) is a good delimiter to use.

Failure Notification
Once in production your sessions and batches need to send out notification when then fail to the Support
team. You can do this by configuring email task in the session level.

Naming Conventions and usage of Transformations


Port Standards:
Input Ports – It will be necessary to change the name of input ports for lookups, expression and filters
where ports might have the same name. If ports do have the same name then will be defaulted to
having a number after the name. Change this default to a prefix of “in_”. This will allow you to keep track
of input ports through out your mappings.
Prefixed with: IN_

Variable Ports – Variable ports that are created within an expression


Transformation should be prefixed with a “v_”. This will allow the developer to distinguish between
input/output and variable ports. For more explanation of Variable Ports see the section “VARIABLES”.
Prefixed with: V_

Output Ports – If organic data is created with a transformation that will be mapped to the target, make

61

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

sure that it has the same name as the target port that it will be mapped to.
Prefixed with: O_

Quick Reference

Object Type Syntax


Folder XXX_<Data Mart Name>
Mapping m_fXY_ZZZ_<Target Table Name>_x.x
Session s_fXY_ZZZ_<Target Table Name>_x.x
Batch b_<Meaningful name representing the sessions inside>
Source Definition <Source Table Name>
Target Definition <Target Table Name>
Aggregator AGG_<Purpose>
Expression EXP_<Purpose>
Filter FLT_<Purpose>
Joiner JNR_<Names of Joined Tables>
Lookup LKP_<Lookup Table Name>
Normalizer Norm_<Source Name>
Rank RNK_<Purpose>
Router RTR_<Purpose>
Sequence Generator SEQ_<Target Column Name>
Source Qualifier SQ_<Source Table Name>
Stored Procedure STP_<Database Name>_<Procedure Name>
Update Strategy UPD_<Target Table Name>_xxx
Mapplet MPP_<Purpose>
Input Transformation INP_<Description of Data being funneled in>
Output Tranformation OUT_<Description of Data being funneled out>
Database Connections XXX_<Database Name>_<Schema Name>

62

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Unit Test Cases (UTC):

QA Life Cycle consists of 5 types of

Testing regimens:
1. Unit Testing
2. Functional Testing
3. System Integration Testing
4. User Acceptance Testing

Unit testing:The testing, by development, of the application modules to verify each unit (module) itself
meets the accepted user requirements and design and development standards

Functional Testing:The testing of all the application’s modules individually to ensure the modules, as
released from development to QA, work together as designed and meet the accepted user requirements
and system standards

System Integration Testing:Testing of all of the application modules in the same environment, database
instance, network and inter-related applications, as it would function in production. This includes security,
volume and stress testing.

User Acceptance Testing(UAT):The testing of the entire application by the end-users ensuring the
application functions as set forth in the system requirements documents and that the system meets the
business needs.

63

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

UTP Template:

Actual Pass or Tested


Results, Fail By
Step Description Test Conditions Expected Results (P or F)
#
SAP-
CMS
Interfac
es
1 Check for the SOURCE: Both the source and target Should be Pass Stev
total count of table load record count same as the
records in SELECT count(*) FROM should match. expected
source tables XST_PRCHG_STG
that is fetched
and the total
records in the TARGET:
PRCHG table
for a perticular Select count(*) from
session _PRCHG
timestamp

2 Check for all select PRCHG_ID, Both the source and target Should be Pass Stev
the target PRCHG_DESC, table record values should same as the
columns DEPT_NBR, return zero records expected
whether they EVNT_CTG_CDE,
are getting PRCHG_TYP_CDE,
populated PRCHG_ST_CDE,
correctly with from T_PRCHG
source data. MINUS
select PRCHG_ID,
PRCHG_DESC,
DEPT_NBR,
EVNT_CTG_CDE,
PRCHG_TYP_CDE,
PRCHG_ST_CDE,
from PRCHG

3 Check for Identify a one record from It should insert a record into Should be Pass Stev
Insert strategy the source which is not in target table with source same as the
to load target table. Then run the data expected
records into session
target table.

64

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Actual Pass or Tested


Results, Fail By
Step Description Test Conditions Expected Results (P or F)
#
4 Check for Identify a one Record It should update record into Should be Pass Stev
Update from the source which is target table with source same as the
strategy to already present in the data for that existing record expected
load records target table with different
into target PRCHG_ST_CDE or
table. PRCHG_TYP_CDE values
Then run the session

UNIX
How strong you are in UNIX?

1) I have Unix shell scripting knowledge whatever informatica required like


If we want to run workflows in Unix using PMCMD.
Below is the script to run workflow using Unix.
cd /pmar/informatica/pc/pmserver/

/pmar/informatica/pc/pmserver/pmcmd startworkflow -u $INFA_USER -p $INFA_PASSWD -s


$INFA_SERVER:$INFA_PORT -f $INFA_FOLDER -wait $1>> $LOG_PATH/$LOG_FILE

2) And if we suppose to process flat files using informatica but those files were exists in remote server then
we have to write script to get ftp into informatica server before start process those files.
3) And also file watch mean that if indicator file available in the specified location then we need to start
our informatica jobs otherwise will send email notification using
Mail X command saying that previous jobs didn’t completed successfully something like that.
4) Using shell script update parameter file with session start time and end time.
This kind of scripting knowledge I do have. If any new UNIX requirement comes then I can Google and get
the solution implement the same.

Basic Commands:

Cat file1 (cat is the command to create none zero byte file)
cat file1 file2 > all -----it will combined (it will create file if it doesn’t exit)
cat file1 >> file2---it will append to file 2

1. > will redirect output from standard out (screen) to file or printer or whatever you like.

2. >> Filename will append at the end of a file called filename.

3. < will redirect input to a process or command.

How to create zero byte file?

65

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Touch filename (touch is the command to create zero byte file)

How to find all processes that are running

ps -A

Crontab command
Crontab command is used to schedule jobs. You must have permission to run this command by Unix
Administrator. Jobs are scheduled in five numbers, as follows.

Minutes (0-59) Hour (0-23) Day of month (1-31) month (1-12) Day of week (0-6) (0 is Sunday)

so for example you want to schedule a job which runs from script named backup jobs in /usr/local/bin
directory on sunday (day 0) at 11.25 (22:25) on 15th of month. The entry in crontab file will be. * represents
all values.

25 22 15 * 0 /usr/local/bin/backup_jobs

The * here tells system to run this each month.


Syntax is
crontab fileSo a create a file with the scheduled jobs as above and then typecrontab filename .This will
scheduled the jobs.

Below cmd gives total no of users logged in at this time

who | wc -l

echo "are total number of people logged in at this time."

Below cmd will display only directories

$ ls -l | grep '^d'

Pipes:

The pipe symbol "|" is used to direct the output of one command to the input of another.

Moving, renaming, and copying files:

Cp file1 file2 copy a file

mv file1 newname move or rename a file

mv file1 ~/AAA/ move file1 into sub-directory AAA in your home directory.

rm file1 [file2 ...] remove or delete a file

66

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

To display hidden files

ls –a

Viewing and editing files:

cat filename Dump a file to the screen in ascii.

More file name to view the file content

head filename Show the first few lines of a file.

head -5 filename Show the first 5 lines of a file.

tail filename Show the last few lines of a file.

Tail -7 filename Show the last 7 lines of a file.

Searching for files :

find command

find -name aaa.txt Finds all the files named aaa.txt in the current directory or

any subdirectory tree.

find/ -name vimrc Find all the files named 'vimrc' anywhere on the system.

find /usr/local/games -name "*xpilot*"

Find all files whose names contain the string 'xpilot' which

exist within the '/usr/local/games' directory tree.

You can find out what shell you are using by the command:

echo $SHELL

If file exists then send email with attachment.

if [[ -f $your_file ]]; then


uuencode $your_file $your_file|mailx -s "$your_file exists..." your_email_address
fi

Below line is the first line of the script

67

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

#!/usr/bin/sh

Or

#!/bin/ksh

What does #! /bin/sh mean in a shell script?

It actually tells the script to which interpreter to refer. As you know, bash shell has some specific functions
that other shell does not have and vice-versa. Same way is for perl, python and other languages.

It's to tell your shell what shell to you in executing the following statements in your shell script.

Interactive History

A feature of bash and tcsh (and sometimes others) you can use the up-arrow keys to access your previous
commands, edit them, and re-execute them.

Basics of the vi editor

Opening a file

Vi filename

Creating text

Edit modes: These keys enter editing modes and type in the text

of your document.

i Insert before current cursor position

I Insert at beginning of current line

a Insert (append) after current cursor position

A Append to end of line

r Replace 1 character

R Replace mode

<ESC> Terminate insertion or overwrite mode

Deletion of text

x Delete single character

68

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

dd Delete current line and put in buffer

:w Write the current file.

:w new.file Write the file to the name 'new.file'.

:w!existing.file Overwrite an existing file with the file currently being edited.

:wqWrite the file and quit.

:q Quit.

:q! Quit with no changes.

Shell Script Senario:


“How we can loop informatica workflows when we have to run the same jobs multiple times”. A common
scenario for this is a History load. In order to minimize system load and achieve better performance, we
can split the history load by weekly or monthly time periods. Thus in that case we have to run same
workflow “n” numbers of time.

Solution:

1. Creating a Workflow List file:

Create a Workflow list file with “.lst” extension. Add all the workflows you might want to run in the
appropriate sequence.

File Format: <Informatica_Folder_name>, <Informatica_Workflow_name>

Example: wfl_ProcessName.lst
Folder_name, Workflow_Name1
Folder_name, Workflow_Name2
Folder_name, Workflow_Name3

2. Creating a Looping file:

Create a Data File with the Workflow list and Number of Loops (in other words number of re-runs needed for
the Workflow list) as a comma separated file.

File Format: <Workflow List file Without Extension>, <Number of loops>

Example: EDW_ETLLOOP.dat

wfl_ProcessName1, 5
wfl_ProcessName2, 10
wfl_ProcessName3, 2

69

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

3. Call Script W_WorkflowLooping.ksh:


This script is use to execute the workflow list in required number of loops. For example to process a
History load we have to run the same sequence of workflow’s “n” numbers of time.
This process will run the workflow list required number of times.

An added feature in the script is an Optional Termination file which can be created into the given
directory to force full termination the looping process. The advantage of Optional termination File is
that users can stop the looping process in case other Processes are being affected due to the looping
jobs.

Processing Step’s
1. Read the Parameter values and assign them to Variables.
2. Validate that Parameter is not null or empty string. If empty exit.
3. If Data file exist read the List file and Number of Loops.
1. If Job termination file exist then Exit
2. Else
1. Call W_CallWorkflow.sh script and pass <workflow list> as variable.
4. Loop previous step till the ‘n’ number of loops.
5. Remove the data file.

#!/bin/ksh
###########################################################################
# FileName: W_WorkflowLooping.ksh
# Parameters: Parm $1 = Looping File name (no extension)

# Description: Performs looping of given Workflow List. Optional File can


# be used to terminate the looping process.
# Warnings: Program name variable must be defined.
#
# Date: 15-Aug-2008

########################## MODIFICATIONS LOG ###########################


# Changed By Date Description
# ------------- -------------- --------------------
#Manish Kothari 08/15/2008 Initial Version
###########################################################################
# sets the Environment variables and functions used in the script.
###########################################################################
#. /scripts/env_variable.ksh

###########################################################################
# Defines program variables
###########################################################################

DATA_FILE=$1
DATA_FILE_EXTN='dat'
LOG_FILE_SUFF='Log.log'
TERM_FILE_SUFF='TerminationInd.dat'

70

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

###########################################################################
# Check if the Data File Name is passed as a Parameter
###########################################################################
if [ -z $DATA_FILE ]
then
echo "!!! W_WorkflowLooping: $DATE ERROR - Data File Name Parameter not provided..!!!"
exit 1
fi

DATA_FILE_NAME=$DATA_FILE.$DATA_FILE_EXTN
LOG_FILE_NAME=$DATA_FILE$LOG_FILE_SUFF
JOB_TERMINATION_IND_FILE_NAME=$DATA_FILE$TERM_FILE_SUFF

DATA_FILE=/informatica/Loop_Dir/$DATA_FILE_NAME
LOG_FILE=/inofrmatica/Unix_Log/$LOG_FILE_NAME
JOB_TERMINATION_IND_FILE=/informatica/Loop_Dir/$JOB_TERMINATION_IND_FILE_NAME

###########################################################################
# Update the status and log file - script is starting.
###########################################################################
echo "***** Starting script $0 on `date`." >> $LOG_FILE

###########################################################################
# Check whether the data files exists
###########################################################################
if [ -s $DATA_FILE ]
then
while read member
do
wf_list_file_name=`echo $member | awk -F"," '{print $1}'`
loop_count=`echo $member | awk -F"," '{print $2}'`
while [ $loop_count -gt 0 ]
do
if [ -f $JOB_TERMINATION_IND_FILE ]
then
rm $JOB_TERMINATION_IND_FILE
# rm $DATA_FILE
echo "Indicator file for terminating the load found in /informatica/Loop_Dir/ on `date`" >>
$LOG_FILE
exit 0
fi

#############################################################################
# Executing the workflows
#############################################################################
informatica/Scripts/W_CallWorkflow.sh $wf_list_file_name
PMRETCODE=$?

71

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

if [ "$PMRETCODE" -ne 0 ]
then
echo "Error in $wf_name Load on `date`" >> $LOG_FILE
exit 1
fi
loop_count=`expr $loop_count - 1`
done
done<$DATA_FILE
else
echo "Source Parameter file $DATA_FILE is missing on `date`" >> $LOG_FILE
exit 1
fi

###########################################################################
# Updates the status and log file - script is ending.
###########################################################################
echo "***** Ending script $0 with no errors on `date`.\n" >> $LOG_FILE
rm $DATA_FILE
exit 0

6. Call Script W_CallWorkFlow.sh:


This script is used to execute the workflows from the .lst file. In case of any error it creates a Restart file
and starts from this point when the process is re run.

This script requires Workflow list file name (without extension) as a parameter.

Processing Steps:
7. Read the Parameter values and assign them to Variables.
8. Validate that Parameter is not null or an empty string. If empty exit.
9. Validate that workflow list file exists and is not a zero byte. If yes then exit.
10. Assigning a name to restart and Workflow List Log file.
11. Read the folder name and Worklflow name from .lst file
1. If Restart file is not a zero byte
1. Then loop till the restarting workflow name matches the workflow list and then
execute the workflow with pmcmd command.
2. Else
1. Run the workflow with pmcmd command.
1. If any error occurs create a restart file and exit.

12. Loop Previous step till all the workflow from the lst file have been executed.

Ex: W_CallWorkFlow.sh <workflow list file name without .lst>

#!/bin/ksh

72

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

###########################################################################
# FileName: W_CallWorkFlow.sh
# Parameters: Parm $1 = Workflow List Filename (no extension)
#
# Purpose: Provides the ability to call the PMCMD command from the enterprise
# Scheduler or from Informatica Command Tasks.
#
# Warnings:
#
# Date: 08/28/2007
###########################################################################
########################### MODIFICATIONS LOG #############################
# Changed By Date Description
# ---------- -------- -----------
# Manish Kothari 08/15/2008 Initial Version
###########################################################################
#Include the environment file if any.
#. /scripts/xxx_env.ksh

###########################################################################
# Define Variables.
###########################################################################
DATE=`date '+ %Y-%m-%d %H:%M:%S'`
WORKFLOW_LIST_FILE=$1
WF_LIST_EXTN='lst'
WORKFLOW_LIST_DIR='informatica/WORKFLOW_LISTS/'
UNIXLOG_DIR='informatica/UNIXLOG_DIR'
INFA_REP='infarep:4400'
INFA_USER='USER_NAME'
INFA_PWD='INFA_PWD'
WF_LOG_FILE='informatica/LOGFILES_DIR'

###########################################################################
# Check if the WorkFlow List File Name is Passed as a Parameter
###########################################################################
if [ -z $WORKFLOW_LIST_FILE ]
then
echo "!!! W_CallWorkFlow: $DATE ERROR - Workflow List Parameter not provided..!!!"
exit 1
fi
WORKFLOW_LIST=$WORKFLOW_LIST_DIR/$WORKFLOW_LIST_FILE.$WF_LIST_EXTN

###########################################################################
# Make sure that the WorkFolw List File is a Valid File and is
# Not Zero Bytes
###########################################################################

if [ ! -s $WORKFLOW_LIST ]

73

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

then
echo "!!! W_CallWorkFlow: $DATE ERROR - Workflow List File does not exist or is Zero Bytes!!!!"
exit 1
fi
###########################################################################
# Define the Variables that will be used in the Script
###########################################################################

RESTART_FILE=$UNIXLOG_DIR/$WORKFLOW_LIST_FILE.rst
WF_LOG_FILE=$UNIXLOG_DIR/$WORKFLOW_LIST_FILE.log
RESTART_WF_FLAG=1

while read WF_LIST_FILE_LINE


do
###########################################################################
# Get the INFA Folder and WF Name from the WorkStream File.
# This file is Comma Delimited and has a .lst extension
###########################################################################

INFA_FOLDER=`echo $WF_LIST_FILE_LINE|cut -f1 -d','`


WF_NAME=`echo $WF_LIST_FILE_LINE|cut -f2 -d','`

###########################################################################
# Check if a Re-Start File Exists. If it does it means that the script has
# started after failing on a previous run. Be Careful while modifying the
# contents of this file. The script if restarted will start running WF's from POF
###########################################################################

if [ -s $RESTART_FILE ]
then
###########################################################################
# If re-start file exists use the WF Name in the Re-start file to determine
# which Failed workflow from a previous run needs to be re-started
# Already completed WF's in a Workstream will be skipped
###########################################################################

RESTART_WF=`cat $RESTART_FILE|cut -f2 -d','`


if [ $WF_NAME != $RESTART_WF ]
then
echo "!!! W_CallWorkFlow: $DATE RESTART DETECTED - Skipping $WF_NAME" >> $WF_LOG_FILE
continue
else
if [ $RESTART_WF_FLAG -eq 1 ]
then
echo "!!! W_CallWorkFlow: $DATE RESTART DETECTED - Restarting at Workflow Name $WF_NAME \n"
echo "!!! W_CallWorkFlow: $DATE RESTART DETECTED - Restarting at Workflow Name $WF_NAME \n" >>
$WF_LOG_FILE
RESTART_WF_FLAG=0

74

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

fi
fi
fi
echo "W_CallWorkFlow: $DATE STARTING execution of Workflows $WF_NAME in $INFA_FOLDER using
$WORKFLOW_LIST_FILE" >> $WF_LOG_FILE
echo "\n" >> $WF_LOG_FILE

#-------------------------------------------------------------------------
# Call Informatica pmcmd command with defined parameters.
#-------------------------------------------------------------------------
pmcmd startworkflow -u $INFA_USER -p $INFA_PWD -s $INFA_REP -f $INFA_FOLDER -wait
$WF_NAME
PMRETCODE=$?

if [ "$PMRETCODE" -ne 0 ]
then
echo "!!! W_CallWorkFlow: $DATE ERROR encountered in $WF_NAME in $INFA_FOLDER \n" >>
$WF_LOG_FILE
echo "!!! W_CallWorkFlow: $DATE Restart file for this workstream is $RESTART_FILE \n" >>
$WF_LOG_FILE
###########################################################################
# Incase a WorkFlow Fails the WF Name and the INFA Folder are logged into
# the re-start File. If the script starts again the WF mentioned in
# this file will be started
###########################################################################
echo "$INFA_FOLDER,$WF_NAME" > $RESTART_FILE
exit 1
fi
rm $RESTART_FILE
done< $WORKFLOW_LIST

if [ -f $RESTART_FILE ]
then
echo "!!! Problem either in Restart File or Workflow List. Please make sure WorkFlow Names are
correct in both Places" >> $WF_LOG_FILE
exit 1
fi

echo "************Ending Script W_CallWorkFlow.sh for the WorkStream


$WORKFLOW_LIST_FILE.lst************" >> $WF_LOG_FILE
echo "\n" >> $WF_LOG_FILE
echo "\n" >> $WF_LOG_FILE
exit 0

Calling a stored proc from a command OR shell script

75

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

sqlplus -s user@connection_string/password<<END
execute procedure UPD_WORKER_ATTR_FLAG;
exit;
END

INFORMATICA TRANSFORMATIONS
New features of INFORMATICA 9 compared to INFORMATICA 8.6

Informatica 9 empowers line-of-business managers and business analysts to identify bad data and fix it
faster. Architecture wise there are no differences between Informatica 8 and 9 but there are some new
features added in powercenter 9.

New Client tools


Informatica 9 includes the Informatica Developer and Informatica Analyst client tools.
The Informatica Developer tool is eclipse-based and supports both data integration and data quality for
enhanced productivity. From here you can update/refine those same rules, and create composite data
objects - e.g. Get customer details from a number of different sources and aggregate these up to a
Customer Data Object.
The Informatica Analyst tool is a browser-based tool for analysts, stewards and line of business
managers. This tool supports data profiling, specifying and validating rules (Scorecards), and monitoring
data quality.

Informatica Administrator
The powercenter Administration Console has been renamed the Informatica Administrator.
The Informatica Administrator is now a core service in the Informatica Domain that is used to configure and
manage all Informatica Services, Security and other domain objects (such as connections) used by the
new services.
The Informatica Administrator has a new interface. Some of the properties and configuration tasks from the
powercenter Administration Console have been moved to different locations in Informatica Administrator.
The Informatica Administrator is expanded to include new services and objects.
Cache Update in Lookup Transformation
You can update the lookup cache based on the results of an expression. When an expression is true, you
can add to or update the lookup cache. You can update the dynamic lookup cache with the results of an
expression.
Database deadlock resilience

76

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

In previous releases, when the Integration Service encountered a database deadlock during a lookup, the
session failed. Effective in 9.0, the session will not fail. When a deadlock occurs, the Integration Service
attempts to run the last statement in a lookup. You can configure the number of retry attempts and time
period between attempts.
Multiple rows return
Lookups can now be configured as an Active transformation to return Multiple Rows.We can configure the
Lookup transformation to return all rows that match a lookup condition. A Lookup transformation is an
active transformation when it can return more than one row for any given input row.
Limit the Session Log
You can limit the size of session logs for real-time sessions. You can limit the size by time or by file size. You
can also limit the number of log files for a session.
Auto-commit
We can enable auto-commit for each database connection. Each SQL statement in a query defines a
transaction. A commit occurs when the SQL statement completes or the next statement is executed,
whichever comes first.
Passive transformation
We can configure the SQL transformation to run in passive mode instead of active mode. When the SQL
transformation runs in passive mode, the SQL transformation returns one output row for each input row.
Connection management
Database connections are centralized in the domain. We can create and view database connections in
Informatica Administrator, Informatica Developer, or Informatica Analyst. Create, view, edit, and grant
permissions on database connections in Informatica Administrator.
Monitoring
We can monitor profile jobs, scorecard jobs, preview jobs, mapping jobs, and SQL Data Services for each
Data Integration Service. View the status of each monitored object on the Monitoring tab of Informatica
Administrator.
Deployment
We can deploy, enable, and configure deployment units in the Informatica Administrator. Deploy
Deployment units to one or more Data Integration Services. Create deployment units in Informatica
Developer.
Model Repository Service
Application service that manages the Model repository. The Model repository is a relational database that
stores the metadata for projects created in Informatica Analyst and Informatica Designer. The Model
repository also stores run-time and configuration information for applications deployed to a Data.
Data Integration Service
Application service that processes requests from Informatica Analyst and Informatica Developer to preview
or run data profiles and mappings. It also generates data previews for SQL data services and runs SQL
queries against the virtual views in an SQL data service. Create and enable a Data Integration Service on
the Domain tab of Informatica Administrator.
XML Parser
The XML Parser transformation can validate an XML document against a schema. The XML Parser
transformation routes invalid XML to an error port. When the XML is not valid, the XML Parser transformation
routes the XML and the error messages to a separate output group that We can connect to a target.
Enforcement of licensing restrictions
Powercenter will enforce the licensing restrictions based on the number of CPUs and repositories.

77

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Also Informatica 9 supports data integration for the cloud as well as on premise. You can integrate the
data in cloud applications, as well as run Informatica 9 on cloud infrastructure.

Informatica Transformations

A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set
of transformations that perform specific functions. For example, an Aggregator transformation performs
calculations on groups of data.
Transformations can be of two types:
1. Active Transformation
An active transformation can change the number of rows that pass through the transformation,
change the transaction boundary, can change the row type. For example, Filter, Transaction
Control and Update Strategy are active transformations.
The key point is to note that Designer does not allow you to connect multiple active
transformations or an active and a passive transformation to the same downstream transformation
or transformation input group because the Integration Service may not be able to concatenate
the rows passed by active transformations However, Sequence Generator transformation(SGT) is
an exception to this rule. A SGT does not receive data. It generates unique numeric values. As a
result, the Integration Service does not encounter problems concatenating rows passed by a SGT
and an active transformation.
2. Passive Transformation.
A passive transformation does not change the number of rows that pass through it, maintains the
transaction boundary, and maintains the row type.
The key point is to note that Designer allows you to connect multiple transformations to the same
downstream transformation or transformation input group only if all transformations in the upstream
branches are passive. The transformation that originates the branch can be active or passive.
Transformations can be Connected or UnConnected to the data flow.
3. Connected Transformation
Connected transformation is connected to other transformations or directly to target table in the mapping.
4. UnConnected Transformation
An unconnected transformation is not connected to other transformations in the mapping. It is called
within another transformation, and returns a value to that transformation.

Aggregator Transformation
Aggregator transformation performs aggregate funtions like average, sum, count etc. on multiple rows or
groups. The Integration Service performs these calculations as it reads and stores data group and row data
in an aggregate cache. It is an Active & Connected transformation.

Difference b/w Aggregator and Expression Transformation?


Expression transformation permits you to perform calculations row by row basis only. In Aggregator you can
perform calculations on groups.
Aggregator transformation has following ports State, State_Count, Previous_State and State_Counter.
Components: Aggregate Cache, Aggregate Expression, Group by port, Sorted input.
Aggregate Expressions: are allowed only in aggregate transformations. can include conditional clauses
and non-aggregate functions. can also include one aggregate function nested into another aggregate
function.
Aggregate Functions: AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE

Application Source Qualifier Transformation


Represents the rows that the Integration Service reads from an application, such as an ERP source, when it
runs a session.It is an Active & Connected transformation.
78

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Custom Transformation
It works with procedures you create outside the designer interface to extend PowerCenter functionality.
calls a procedure from a shared library or DLL. It is active/passive & connected type.
You can use CT to create T. that require multiple input groups and multiple output groups.
Custom transformation allows you to develop the transformation logic in a procedure. Some of the
PowerCenter transformations are built using the Custom transformation. Rules that apply to Custom
transformations, such as blocking rules, also apply to transformations built using Custom transformations.
PowerCenter provides two sets of functions called generated and API functions. The Integration Service
uses generated functions to interface with the procedure. When you create a Custom transformation and
generate the source code files, the Designer includes the generated functions in the files. Use the API
functions in the procedure code to develop the transformation logic.
Difference between Custom and External Procedure Transformation? In Custom T, input and output
functions occur separately.The Integration Service passes the input data to the procedure using an input
function. The output function is a separate function that you must enter in the procedure code to pass
output data to the Integration Service. In contrast, in the External Procedure transformation, an external
procedure function does both input and output, and its parameters consist of all the ports of the
transformation.

Data Masking Transformation


Passive & Connected. It is used to change sensitive production data to realistic test data for non
production environments. It creates masked data for development, testing, training and data mining. Data
relationship and referential integrity are maintained in the masked data.
For example: It returns masked value that has a realistic format for SSN, Credit card number, birthdate,
phone number, etc. But is not a valid value. Masking types: Key Masking, Random Masking, Expression
Masking, Special Mask format. Default is no masking.

Expression Transformation
Passive & Connected. are used to perform non-aggregate functions, i.e to calculate values in a single row.
Example: to calculate discount of each product or to concatenate first and last names or to convert date
to a string field.
You can create an Expression transformation in the Transformation Developer or the Mapping Designer.
Components: Transformation, Ports, Properties, Metadata Extensions.

External Procedure
Passive & Connected or Unconnected. It works with procedures you create outside of the Designer
interface to extend PowerCenter functionality. You can create complex functions within a DLL or in the
COM layer of windows and bind it to external procedure transformation. To get this kind of extensibility, use
the Transformation Exchange (TX) dynamic invocation interface built into PowerCenter. You must be an
experienced programmer to use TX and use multi-threaded code in external procedures.

Filter Transformation
Active & Connected. It allows rows that meet the specified filter condition and removes the rows that do
not meet the condition. For example, to find all the employees who are working in NewYork or to find out
all the faculty member teaching Chemistry in a state. The input ports for the filter must come from a single
transformation. You cannot concatenate ports from more than one transformation into the Filter
transformation. Components: Transformation, Ports, Properties, Metadata Extensions.

HTTP Transformation
Passive & Connected. It allows you to connect to an HTTP server to use its services and applications. With
an HTTP transformation, the Integration Service connects to the HTTP server, and issues a request to retrieves
data or posts data to the target or downstream transformation in the mapping.
Authentication types: Basic, Digest and NTLM. Examples: GET, POST and SIMPLE POST.

79

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Java Transformation
Active or Passive & Connected. It provides a simple native programming interface to define transformation
functionality with the Java programming language. You can use the Java transformation to quickly define
simple or moderately complex transformation functionality without advanced knowledge of the Java
programming language or an external Java development environment.

Joiner Transformation
Active & Connected. It is used to join data from two related heterogeneous sources residing in different
locations or to join data from the same source. In order to join two sources, there must be at least one or
more pairs of matching column between the sources and a must to specify one source as master and the
other as detail. For example: to join a flat file and a relational source or to join two flat files or to join a
relational source and a XML source.
The Joiner transformation supports the following types of joins:
1. Normal
Normal join discards all the rows of data from the master and detail source that do not match,
based on the condition.
2. Master Outer
Master outer join discards all the unmatched rows from the master source and keeps all the rows
from the detail source and the matching rows from the master source.
3. Detail Outer
Detail outer join keeps all rows of data from the master source and the matching rows from the
detail source. It discards the unmatched rows from the detail source.
4. Full Outer
Full outer join keeps all rows of data from both the master and detail sources.
Limitations on the pipelines you connect to the Joiner transformation:
*You cannot use a Joiner transformation when either input pipeline contains an Update Strategy
transformation.
*You cannot use a Joiner transformation if you connect a Sequence Generator transformation directly
before the Joiner transformation.

Lookup Transformation
Default Passive (can be configured active) & Connected or UnConnected. It is used to look up data in a
flat file, relational table, view, or synonym. It compares lookup transformation ports (input ports) to the
source column values based on the lookup condition. Later returned values can be passed to other
transformations. You can create a lookup definition from a source qualifier and can also use multiple
Lookup transformations in a mapping.
You can perform the following tasks with a Lookup transformation:
*Get a related value. Retrieve a value from the lookup table based on a value in the source. For example,
the source has an employee ID. Retrieve the employee name from the lookup table.
*Perform a calculation. Retrieve a value from a lookup table and use it in a calculation. For example,
retrieve a sales tax percentage, calculate a tax, and return the tax to a target.
*Update slowly changing dimension tables. Determine whether rows exist in a target.
Lookup Components: Lookup source, Ports, Properties, Condition.
Types of Lookup:
1) Relational or flat file lookup.
2) Pipeline lookup.

80

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

3) Cached or uncached lookup.


4) connected or unconnected lookup.

Normalizer Transformation
Active & Connected. The Normalizer transformation processes multiple-occurring columns or multiple-
occurring groups of columns in each source row and returns a row for each instance of the multiple-
occurring data. It is used mainly with COBOL sources where most of the time data is stored in de-
normalized format.
You can create following Normalizer transformation:
*VSAM Normalizer transformation. A non-reusable transformation that is a Source Qualifier transformation
for a COBOL source. VSAM stands for Virtual Storage Access Method, a file access method for IBM
mainframe.
*Pipeline Normalizer transformation. A transformation that processes multiple-occurring data from relational
tables or flat files. This is default when you create a normalizer transformation.
Components: Transformation, Ports, Properties, Normalizer, Metadata Extensions.

Rank Transformation
Active & Connected. It is used to select the top or bottom rank of data. You can use it to return the largest
or smallest numeric value in a port or group or to return the strings at the top or the bottom of a session sort
order. For example, to select top 10 Regions where the sales volume was very high or to select 10 lowest
priced products. As an active transformation, it might change the number of rows passed through it. Like if
you pass 100 rows to the Rank transformation, but select to rank only the top 10 rows, passing from the Rank
transformation to another transformation. You can connect ports from only one transformation to the Rank
transformation. You can also create local variables and write non-aggregate expressions.

Router Transformation
Active & Connected. It is similar to filter transformation because both allow you to apply a condition to test
data. The only difference is, filter transformation drops the data that do not meet the condition whereas
router has an option to capture the data that do not meet the condition and route it to a default output
group.
If you need to test the same input data based on multiple conditions, use a Router transformation in a
mapping instead of creating multiple Filter transformations to perform the same task. The Router
transformation is more efficient.

Sequence Generator Transformation


Passive & Connected transformation. It is used to create unique primary key values or cycle through a
sequential range of numbers or to replace missing primary keys.
It has two output ports: NEXTVAL and CURRVAL. You cannot edit or delete these ports. Likewise, you cannot
add ports to the transformation. NEXTVAL port generates a sequence of numbers by connecting it to a
transformation or target. CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By value.
You can make a Sequence Generator reusable, and use it in multiple mappings. You might reuse a
Sequence Generator when you perform multiple loads to a single target.
For non-reusable Sequence Generator transformations, Number of Cached Values is set to zero by default,
and the Integration Service does not cache values during the session.For non-reusable Sequence
Generator transformations, setting Number of Cached Values greater than zero can increase the number
of times the Integration Service accesses the repository during the session. It also causes sections of skipped
values since unused cached values are discarded at the end of each session.
For reusable Sequence Generator transformations, you can reduce Number of Cached Values to minimize
discarded values, however it must be greater than one. When you reduce the Number of Cached Values,
you might increase the number of times the Integration Service accesses the repository to cache values
during the session.

81

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Sorter Transformation
Active & Connected transformation. It is used sort data either in ascending or descending order according
to a specified sort key. You can also configure the Sorter transformation for case-sensitive sorting, and
specify whether the output rows should be distinct. When you create a Sorter transformation in a mapping,
you specify one or more ports as a sort key and configure each sort key port to sort in ascending or
descending order.

Source Qualifier Transformation


Active & Connected transformation. When adding a relational or a flat file source definition to a mapping,
you need to connect it to a Source Qualifier transformation. The Source Qualifier is used to join data
originating from the same source database, filter rows when the Integration Service reads source data,
Specify an outer join rather than the default inner join and to specify sorted ports.
It is also used to select only distinct values from the source and to create a custom query to issue a special
SELECT statement for the Integration Service to read source data .

SQL Transformation
Active/Passive & Connected transformation. The SQL transformation processes SQL queries midstream in a
pipeline. You can insert, delete, update, and retrieve rows from a database. You can pass the database
connection information to the SQL transformation as input data at run time. The transformation processes
external SQL scripts or SQL queries that you create in an SQL editor. The SQL transformation processes the
query and returns rows and database errors.

Stored Procedure Transformation


Passive & Connected or UnConnected transformation. It is useful to automate time-consuming tasks and it
is also used in error handling, to drop and recreate indexes and to determine the space in database, a
specialized calculation etc. The stored procedure must exist in the database before creating a Stored
Procedure transformation, and the stored procedure can exist in a source, target, or any database with a
valid connection to the Informatica Server. Stored Procedure is an executable script with SQL statements
and control statements, user-defined variables and conditional statements.

Transaction Control Transformation


Active & Connected. You can control commit and roll back of transactions based on a set of rows that
pass through a Transaction Control transformation. Transaction control can be defined within a mapping or
within a session.
Components: Transformation, Ports, Properties, Metadata Extensions.

Union Transformation
Active & Connected. The Union transformation is a multiple input group transformation that you use to
merge data from multiple pipelines or pipeline branches into one pipeline branch. It merges data from
multiple sources similar to the UNION ALL SQL statement to combine the results from two or more SQL
statements. Similar to the UNION ALL statement, the Union transformation does not remove duplicate rows.
Rules
1) You can create multiple input groups, but only one output group.
2) All input groups and the output group must have matching ports. The precision, datatype, and scale
must be identical across all groups.
3) The Union transformation does not remove duplicate rows. To remove duplicate rows, you must add
another transformation such as a Router or Filter transformation.
4) You cannot use a Sequence Generator or Update Strategy transformation upstream from a Union
transformation.
5) The Union transformation does not generate transactions.
Components: Transformation tab, Properties tab, Groups tab, Group Ports tab.

82

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Unstructured Data Transformation


Active/Passive and connected. The Unstructured Data transformation is a transformation that processes
unstructured and semi-structured file formats, such as messaging formats, HTML pages and PDF documents.
It also transforms structured formats such as ACORD, HIPAA, HL7, EDI-X12, EDIFACT, AFP, and SWIFT.
Components: Transformation, Properties, UDT Settings, UDT Ports, Relational Hierarchy.

Update Strategy Transformation


Active & Connected transformation. It is used to update data in target table, either to maintain history of
data or recent changes. It flags rows for insert, update, delete or reject within a mapping.

XML Generator Transformation


Active & Connected transformation. It lets you create XML inside a pipeline. The XML Generator
transformation accepts data from multiple ports and writes XML through a single output port.

XML Parser Transformation


Active & Connected transformation. The XML Parser transformation lets you extract XML data from
messaging systems, such as TIBCO or MQ Series, and from other sources, such as files or databases. The XML
Parser transformation functionality is similar to the XML source functionality, except it parses the XML in the
pipeline.

XML Source Qualifier Transformation


Active & Connected transformation. XML Source Qualifier is used only with an XML source definition. It
represents the data elements that the Informatica Server reads when it executes a session with XML
sources. has one input or output port for every column in the XML source.

External Procedure Transformation


Active & Connected/UnConnected transformation. Sometimes, the standard transformations such as
Expression transformation may not provide the functionality that you want. In such cases External
procedure is useful to develop complex functions within a dynamic link library (DLL) or UNIX shared library,
instead of creating the necessary Expression transformations in a mapping.

Advanced External Procedure Transformation


Active & Connected transformation. It operates in conjunction with procedures, which are created outside
of the Designer interface to extend PowerCenter/PowerMart functionality. It is useful in creating external
transformation applications, such as sorting and aggregation, which require all input rows to be processed
before emitting any output rows.

83

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Informatica Lookups

Lookups are expensive in terms of resources and time.


A set of tips about how to setup lookup transformations would dramatically improve the main constrains
such as time and performance.
In this article you will learn about the following topics:
- Lookup cache
- Persistent lookup cache
- Unconnected lookup
- Order by clause within SQL

Lookup Cache
Problem:
For non-cached lookups, Informatica hits the database and bring the entire set of rows for each
record coming from the source. There is an impact in terms of time and resources. If there are 2
Million rows from the source qualifier, Informatica hits 2 Million times the database for the same
query.
Solution:
When a lookup is cached: Informatica queries the database, brings the whole set of rows to the
Informatica server and stores in a cache file. When this lookup is called next time, Informatica uses
the file cached. As a result, Informatica saves the time and the resources to hit the database
again.
When to cache a lookup?
As a general rule, we will use lookup cache when the following condition is satisfied:
N>>M
N is the number of records from the source
M is the number of records retrieved from the lookup
Note: Remember to implement database index on the columns used in the lookup condition to
provide better performance in non-cached lookups.

Persistent Lookup Cache


Problem:
Informatica cache the lookups by default. Let’s consider the following scenario: A lookup table is
used many times in different mappings. In each Lookup transformation, Informatica builds the
same lookup cache table over and over again. Do we need to build the lookup cache every time
for each lookup?
Solution:
It is possible to build the cache file once instead of creating the same cache file N-times.
Just using persistent cache option will allow Informatica to save resources and time for something
done before.
Check the following parameters in the transformation to use Persistent Lookup cache:
- Lookup caching enabled
- Lookup cache persistent

84

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

The image part with relationship ID rId37 was not found in the file.

Figure 1: Cache Persistent Enabled

From now onwards, the same cache file will be used in all the consecutive runs, saving time
building the cache file. However, the lookup data might change and then the cache must be
refreshed by either deleting the cache file or checking the option “Re-cache from lookup source”.

Figure 2:Re-cache from Lookup Source Enabled

In case of using a lookup reusable in multiple mappings we will have 1 mapping with “Re-cache”
option enabled while others will remain with the “Re-cache” option disabled. Whenever the cache
needs to be refreshed we just need to run the first mapping.
Note:Take into account that it is necessary to ensure data integrity in long run ETL process when
underlying tables change frequently. Furthermore, Informatica Power Center is not able to create
larger files than 2GB. In case of a file exceeds 2GB, Informatica will create multiple cache files.
Using multiple files will decrease the performance. Hence, we might consider joining the lookup
source table in the database.

Unconnected lookup
Problem:
85

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Imagine the following mapping with 1,000,000 records retrieved from the Source Qualifier:
The image part with relationship ID rId37 was not found in the file.

Figure 3: Connected Lookup Transformation


Suppose out of a million records, the condition is satisfied 10% of the amount of records. In case of
connected lookup, the lookup will be called 900,000 times even there isn’t any match.
Solution:
It is possible calling the Lookup transformation only when the condition is satisfied. As a result, in our
scenario the transformation will be called and executed only 100,000 of times out of 1M. The
solution is using an Expression transformation that calls the lookup transformation that is not
connected to the dataflow:

Figure 4: Unconnected Lookup Transformation

For instance, an Expression transformation will contain a port with the following expression:
IIF (ISNULL (COUNTRY),
:LKP.LKP_COUNTRY (EMPLOYEE_ID), COUNTRY)

If the COUNTRY is null, then the lookup named LKP_COUNTRY is called with the parameter
EMPLOYEE_ID.
The ports in the look up transformation are COUNTRY and EMPLOYEE_ID, as well as the input port.

Order by clause within SQL


Informatica takes the time (and the effort) to bring all the data for each port within the lookup
transformation. Thereby, it is recommended to get rid of those ports that are not used to avoid
additional processing.
It is also a best practice to perform “ORDER BY” clause on the columns which are being used in
the join condition. This “ORDER BY” clause is done by default and helps Informatica to save time
and space to create its own index. Informatica sorts the query for each column on the SELECT
statement. Hence, redundant or unnecessary columns should not be here.
To avoid any sort, just add a comment at the end of the SQL override:

86

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Figure 5: To Avoid ORDER BY in SQL Override

To sum up, it is possible to enhance Informatica lookups by using different set of configurations in
order to increase performance as well as save resources and time. However, before applying any
of the mentioned features, an analysis of the tables and the SQL queries involved needs to be
done.

Performance Tuning Methodology


Session Logcontains a wealth of information provided we know what we want, timings for each of the
activities can be derived.
For every session INFA creates
1. One Reader Thread
2. One Writer Thread
3. One Transformation Thread

Thread Statistics

1. Thread statistics reveal important information regarding how the session was run and how the
reader, writer and transformation threads were utilized.
2. Busy Percentage of each thread are published by INFA at the end of the file in the session log.
3. By adding partition points judiciously and running it again we can slowly zero in on the
transformation bottleneck.
4. Number of threads cannot be increased for reader and writer, but through partition points we can
increase the number of transformation threads

MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ] has completed: Total Run
.Time = [858.151535] secs, Total Idle Time = [842.536136] secs, Busy Percentage = [1.819655].

MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ] has completed:
Total Run Time = [857.485609] secs, Total Idle Time = [0.485609] secs, Busy Percentage = [100].

MASTER> PETL_24022 Thread [WRITER_1_1_1] created for the write stage of partition point(s) [Target Tables] has completed:
Total Run Time = [1573.351240] secs, Total Idle Time = [1523.193522] secs, Busy Percentage = [3.187954].

Bottleneck

87

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

1. The transformation thread is 100 % busy meaning that the bottleneck for performance lies in the
transformations.
2. If the Reader or Writer threads show 100% then there is a bottleneck in reading data from source or
writing data to targets
3. 100% is only relative. Always FOCUS on the thread that is the greatest among the Reader,
Transformation and Writer.
4. If the Busy percentage of all threads is less than 50% there may not be much of a bottleneck in the
mapping.

IMPROVING MAPPING PERFORMANCE - TIPS

1. Aggregator Transformation
You can use the following guidelines to optimize the performance of an Aggregator transformation.
1. Use Sorted Input to decrease the use of aggregate caches:
The Sorted Input option reduces the amount of data cached during the session and improves session
performance. Use this option with the Source Qualifier Number of Sorted Ports option to pass sorted data to
the Aggregator transformation.

88

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

2. Limit connected input/output or output ports:


Limit the number of connected input/output or output ports to reduce the amount of data the
Aggregator transformation stores in the data cache.

3. Filter before aggregating:


If you use a Filter transformation in the mapping, place the transformation before the
Aggregator transformation to reduce unnecessary aggregation.

2. Filter Transformation

The following tips can help filter performance:

4. Use the Filter transformation early in the mapping:

89

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

To maximize session performance, keep the Filter transformation as close as possible to the
sources in the mapping. Rather than passing rows that you plan to discard through the mapping, you
can filter out unwanted data early in the flow of data from sources to targets.

5. Use the Source Qualifier to filter:


The Source Qualifier transformation provides an alternate way to filter rows. Rather than filtering
rows from within a mapping, the Source Qualifier transformation filters rows when read from a source.
The main difference is that the source qualifier limits the row set extracted from a source, while the Filter
transformation limits the row set sent to a target. Since a source qualifier reduces the number of rows
used throughout the mapping, it provides better performance.
However, the source qualifier only lets you filter rows from relational sources, while the Filter
transformation filters rows from any type of source. Also, note that since it runs in the database, you
must make sure that the source qualifier filter condition only uses standard SQL. The Filter transformation
can define a condition using any statement or transformation function that returns either a TRUE or
FALSE value.

3. Joiner Transformation

The following tips can help improve session performance:

6. Perform joins in a database:


Performing a join in a database is faster than performing a join in the session. In some cases,
this is not possible, such as joining tables from two different databases or flat file systems. If you want to
perform a join in a database, you can use the following options:
1. Create a pre-session stored procedure to join the tables in a database before running the
mapping.
2. Use the Source Qualifier transformation to perform the join.

3. Designate as the master source the source with the smaller number of records:
For optimal performance and disk storage, designate the master source as the source with the
lower number of rows. With a smaller master source, the data cache is smaller, and the search time is
shorter.

4. LookUp Transformation

Use the following tips when you configure the Lookup transformation:

4. Add an index to the columns used in a lookup condition:


If you have privileges to modify the database containing a lookup table, you can improve
performance for both cached and uncached lookups. This is important for very large lookup tables.
Since the Informatica Server needs to query, sort, and compare values in these columns, the index
needs to include every column used in a lookup condition.

5. Place conditions with an equality operator (=) first:

90

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

If a Lookup transformation specifies several conditions, you can improve lookup performance
by placing all the conditions that use the equality operator first in the list of conditions that appear
under the Condition tab.

6. Cache small lookup tables:


Improve session performance by caching small lookup tables. The result of the Lookup query
and processing is the same, regardless of whether you cache the lookup table or not.

7. Join tables in the database:


If the lookup table is on the same database as the source table in your mapping and caching
is not feasible, join the tables in the source database rather than using a Lookup transformation.

8. Un Select the cache look-up option in Look Up transformation if there is no look up over-ride. This
improves performance of session.

BEST PRACTICES OF DEVELOPING MAPPINGS IN INFORMATICA

1. Provide the join condition in Source Qualifier Transformation itself as far as possible. If it is
compulsory, use a Joiner Transformation.

2. Use functions in Source Qualifier transformation itself as far as possible (in SQL Override.)

3. Don’t bring all the columns into the Source Qualifier transformation. Take only the necessary
columns and delete all unwanted columns.

4. Too many joins in Source Qualifier can reduce the performance. Take the base table and the first
level of parents into one join condition, base table and next level of parents into another and so
on. Similarly, there can be multiple data flows, which can either insert or insert as well as update.

5. Better to use the sorted ports in Source Qualifier Transformation to avoid the Sorter transformation.

6. If it is compulsory, use Aggregator Transformation. Otherwise, calculate the aggregated values in


Source Qualifier Transformation in SQL override.

7. In case of Aggregators, ensure that proper Group By clause is used.

8. Do not Implement the Error Logic (if applicable) in Aggregator Transformation.

9. Data must be sorted on key columns before passing to Aggregator

10. Minimize aggregate function calls: SUM(A+B) will perform better than SUM(A)+SUM(B)

11. If you are using Aggregator & Lookup transformations, try to use Lookup after aggregation.

12. Don’t bring all the columns into the look up transformation. Take only the necessary columns and
delete all unwanted columns.

91

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

13. Using the more no of lookups reduces the performance. If there are 2 lookups, try to club them into
one using SQL override.

14. Use the Reusable Lookups on Dimensions for getting the Keys

15. Cache lookup rows if the number of rows in the lookup table is significantly less than the typical
number of source rows.

16. Share caches if several lookups are based on the same data set

17. Index the columns in the lookup condition if lookup is unavoidable

18. If you use a Filter transformation in mapping, keep it as close as possible to the sources in mapping
and before the Aggregator transformation

19. Avoid using Stored Procedure Transformation as far as possible.

20. Try to use proper index for the columns used in where conditions while searching.

21. Call the procedures in pre-session or post-session command.

22. Be careful while selecting the bulk load option. If bulk load is used, disable all constraints in pre-
session and enable them in post-session. Ensure that the mapping does not allow null, duplicates,
etc...

23. As far as possible try to convert procedures (functions) into informatica transformations.

24. Do not create multiple groups in Router (Like Error, Insert, Update etc), Try to Utilize the Default
Group.

25. Don't take two instances of the target table for insert/Update. Use Update Strategy Transformation
to achieve the same.

26. In case of joiners ensure that smaller tables are used as master tables

27. Configure the sorted input to the Joiner transformation to improve the session performance.

28. Use operators instead of functions since the Informatica Server reads expressions written with
operators faster than those with functions. For example, use the || operator instead of the
CONCAT () function.

29. Make sure data types should be Unique across mapping from Source to target

30. If you are using the bulk load option increase the commit interval

31. Check the source queries at the backend while developing mapping and store all queries used
during development separately so that it can be re-use during unit testing which saves time.

92

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

SESSION LOGS
Information that reside in a session log:
- Allocation of system shared memory
- Execution of Pre-session commands/ Post-session commands
- Session Initialization
- Creation of SQL commands for reader/writer threads
- Start/End timings for target loading
- Error encountered during session
- Load summary of Reader/Writer/ DTM statistics
Other Information
- By default, the server generates log files based on the server code page.
Thread Identifier
Ex: CMN_1039
Reader and Writer thread codes have 3 digit and Transformation codes have 4 digits. The number
following athread name indicate the following:
(a) Target load order group number
(b) Source pipeline number
(c) Partition number
(d) Aggregate/ Rank boundary number
Log File Codes
Error Codes Description
BR - Related to reader process, including ERP, relational and flat file.
CMN - Related to database, memory allocation
DBGR - Related to debugger
EP- External Procedure
LM - Load Manager
TM - DTM
REP - Repository
WRT - Writer
Load Summary
(a) Inserted
(b) Updated
(c) Deleted
(d) Rejected
Statistics details
(a) Requested rows shows the no of rows the writer actually received for the specified operation
(b) Applied rows shows the number of rows the writer successfully applied to the target (Without
Error)
(c) Rejected rows show the no of rows the writer could not apply to the target
(d) Affected rows shows the no of rows affected by the specified operation
Detailed transformation statistics
The server reports the following details for each transformation in the mapping
(a) Name of Transformation
(b) No of I/P rows and name of the Input source
(c) No of O/P rows and name of the output target
(d) No of rows dropped

93

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Tracing Levels
Normal - Initialization and status information, Errors encountered, Transformation errors, rows
skipped,
summarize session details (Not at the level of individual rows)
Terse - Initialization information as well as error messages, and notification of rejected data
Verbose Init - Addition to normal tracing, Names of Index, Data files used and detailed
transformation
statistics.
Verbose Data - Addition to Verbose Init, Each row that passes in to mapping detailed
transformation statistics.
NOTE
When you enter tracing level in the session property sheet, you override tracing levels configured
for
transformations in the mapping.
Session Failures and Recovering Sessions
Two types of errors occurs in the server
- Non-Fatal
- Fatal
(a) Non-Fatal Errors
It is an error that does not force the session to stop on its first occurrence. Establish the error
threshold in the
session property sheet with the stop on option. When you enable this option, the server counts Non-
Fatal errors
that occur in the reader, writer and transformations.
Reader errors can include alignment errors while running a session in Unicode mode.
Writer errors can include key constraint violations, loading NULL into the NOT-NULL field and
database errors.
Transformation errors can include conversion errors and any condition set up as an ERROR,. Such as
NULL
Input.
(b) Fatal Errors
This occurs when the server can not access the source, target or repository. This can include loss of
connectionor target database errors, such as lack of database space to load data.
If the session uses normalizer (or) sequence generator transformations, the server can not update
thesequence values in the repository, and a fatal error occurs.
© Others
Usages of ABORT function in mapping logic, to abort a session when the server encounters a
transformation
error.
Stopping the server using pmcmd (or) Server Manager
Performing Recovery
- When the server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and notes the
rowed of the last row commited to the target database. The server then reads all sources again
and startsprocessing from the next rowid.
- By default, perform recovery is disabled in setup. Hence it won’t make entries in
OPB_SRVR_RECOVERYtable.
- The recovery session moves through the states of normal session schedule, waiting to run,
Initializing,running, completed and failed. If the initial recovery fails, you can run recovery as many
times.
- The normal reject loading process can also be done in session recovery process.
- The performance of recovery might be low, if
o Mapping contain mapping variables

94

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

o Commit interval is high


Un recoverable Sessions
Under certain circumstances, when a session does not complete, you need to truncate the target
and run the
session from the beginning.
Commit Intervals
A commit interval is the interval at which the server commits data to relational targets during a
session.
(a) Target based commit
- Server commits data based on the no of target rows and the key constraints on the target table.
The commitpoint also depends on the buffer block size and the commit pinterval.
- During a session, the server continues to fill the writer buffer, after it reaches the commit interval.
When thebuffer block is full, the Informatica server issues a commit command. As a result, the
amount of datacommitted at the commit point generally exceeds the commit interval.
- The server commits data to each target based on primary –foreign key constraints.
(b) Source based commit
- Server commits data based on the number of source rows. The commit point is the commit
interval youconfigure in the session properties.
- During a session, the server commits data to the target based on the number of rows from an
active sourcein a single pipeline. The rows are referred to as source rows.
- A pipeline consists of a source qualifier and all the transformations and targets that receive data
fromsource qualifier.
- Although the Filter, Router and Update Strategy transformations are active transformations, the
server doesnot use them as active sources in a source based commit session.
- When a server runs a session, it identifies the active source for each pipeline in the mapping. The
servergenerates a commit row from the active source at every commit interval.
- When each target in the pipeline receives the commit rows the server performs the commit.
Reject Loading
During a session, the server creates a reject file for each target instance in the mapping. If the
writer of the targetrejects data, the server writers the rejected row into the reject file.
You can correct those rejected data and re-load them to relational targets, using the reject
loading utility. (Youcannot load rejected data into a flat file target)
Each time, you run a session, the server appends a rejected data to the reject file.
Locating the BadFiles$PMBadFileDir
Filename.bad
When you run a partitioned session, the server creates a separate reject file for each partition.
Reading Rejected data
Ex: 3,D,1,D,D,0,D,1094345609,D,0,0.00
To help us in finding the reason for rejecting, there are two main things.
(a) Row indicator
Row indicator tells the writer, what to do with the row of wrong data.
Row indicator Meaning Rejected By
0 Insert Writer or target
1 Update Writer or target
2 Delete Writer or target
3 Reject Writer
If a row indicator is 3, the writer rejected the row because an update strategy expression marked it
for reject.
(b) Column indicator
Column indicator is followed by the first column of data, and another column indicator. They
appears after every
column of data and define the type of data preceding it

95

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Column Indicator Meaning Writer Treats as


D Valid Data Good Data. The target acceptsit unless a database erroroccurs, such as finding
duplicate key.
O Overflow Bad Data.
N Null Bad Data.
T Truncated Bad Data
NOTE
NULL columns appear in the reject file with commas marking their column.
Correcting Reject File
Use the reject file and the session log to determine the cause for rejected data.
Keep in mind that correcting the reject file does not necessarily correct the source of the reject.
Correct the mapping and target database to eliminate some of the rejected data when you run
the session again.
Trying to correct target rejected rows before correcting writer rejected rows is not recommended
since they maycontain misleading column indicator.
For example, a series of “N” indicator might lead you to believe the target database does not
accept NULL values,
so you decide to change those NULL values to Zero.
However, if those rows also had a 3 in row indicator. Column, the row was rejected b the writer
because of anupdate strategy expression, not because of a target database restriction.
If you try to load the corrected file to target, the writer will again reject those rows, and they will
contain inaccurate 0
values, in place of NULL values.
Why writer can reject ?
- Data overflowed column constraints
- An update strategy expression
Why target database can Reject ?
- Data contains a NULL column
- Database errors, such as key violations
Steps for loading reject file:
- After correcting the rejected data, rename the rejected file to reject_file.in
- The rejloader used the data movement mode configured for the server. It also used the code
page ofserver/OS. Hence do not change the above, in middle of the reject loading
- Use the reject loader utility
Pmrejldr pmserver.cfg [folder name] [session name]
Other points
The server does not perform the following option, when using reject loader
(a) Source base commit
(b) Constraint based loading
(c) Truncated target table
(d) FTP targets
(e) External Loading
Multiple reject loaders
You can run the session several times and correct rejected data from the several session at once.
You can correctand load all of the reject files at once, or work on one or two reject files, load then
and work on the other at a latertime.
External Loading
You can configure a session to use Sybase IQ, Teradata and Oracle external loaders to load
session target files
into the respective databases.
The External Loader option can increase session performance since these databases can load
information directly

96

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

from files faster than they can the SQL commands to insert the same data into the database.
Method:
When a session used External loader, the session creates a control file and target flat file. The
control file contains
information about the target flat file, such as data format and loading instruction for the External
Loader. The control
file has an extension of “*.ctl “ and you can view the file in $PmtargetFilesDir.
For using an External Loader:
The following must be done:
- configure an external loader connection in the server manager
- Configure the session to write to a target flat file local to the server.
- Choose an external loader connection for each target file in session property sheet.
Issues with External Loader:
- Disable constraints
- Performance issues
o Increase commit intervals
o Turn off database logging
- Code page requirements
- The server can use multiple External Loader within one session (Ex: you are having a session with
the two
target files. One with Oracle External Loader and another with Sybase External Loader)
Other Information:
- The External Loader performance depends upon the platform of the server
- The server loads data at different stages of the session
- The serve writes External Loader initialization and completing messaging in the session log.
However,
details about EL performance, it is generated at EL log, which is getting stored as same target
directory.
- If the session contains errors, the server continues the EL process. If the session fails, the server
loadspartial target data using EL.
- The EL creates a reject file for data rejected by the database. The reject file has an extension of
“*.ldr”reject.
- The EL saves the reject file in the target file directory
- You can load corrected data from the file, using database reject loader, and not through
Informatica reject
load utility (For EL reject file only)
Configuring EL in session
- In the server manager, open the session property sheet
- Select File target, and then click flat file options
Caches
- Server creates index and data caches in memory for aggregator ,rank ,joiner and Lookup
transformation in a
mapping.
- Server stores key values in index caches and output values in data caches : if the server requires
more
memory ,it stores overflow values in cache files .
- When the session completes, the server releases caches memory, and in most circumstances, it
deletes
the caches files .
- Caches Storage overflow :
- releases caches memory, and in most circumstances, it deletes the caches files .
Caches Storage overflow :

97

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Transformation index cache data cache


Aggregator stores group values stores calculations
As configured in the based on Group-by ports
Group-by ports.
Rank stores group values as stores ranking information
Configured in the Group-by based on Group-by ports .
Joiner stores index values for stores master source rows .
The master source table
As configured in Joiner condition.
Lookup stores Lookup condition stores lookup data that’s
Information. Not stored in the index cache.
Determining cache requirements
To calculate the cache size, you need to consider column and row requirements as well as
processing
overhead.
- server requires processing overhead to cache data and index information.
Column overhead includes a null indicator, and row overhead can include row to key information.
Steps:
- first, add the total column size in the cache to the row overhead.
- Multiply the result by the no of groups (or) rows in the cache this gives the minimum cache
requirements .
- For maximum requirements, multiply min requirements by 2.
Location:
-by default , the server stores the index and data files in the directory $PMCacheDir.
-the server names the index files PMAGG*.idx and data files PMAGG*.dat. if the size exceeds
2GB,you may find
multiple index and data files in the directory .The server appends a number to the end of
filename(PMAGG*.id*1,id*2,etc).
Aggregator Caches
- when server runs a session with an aggregator transformation, it stores data in memory until it
completes
the aggregation.
- when you partition a source, the server creates one memory cache and one disk cache and one
and disk
cache for each partition .It routes data from one partition to another based on group key values of
the
transformation.
- server uses memory to process an aggregator transformation with sort ports. It doesn’t use cache
memory
.you don’t need to configure the cache memory, that use sorted ports.
Index cache:
#Groups ((∑ column size) + 7)
Aggregate data cache:
#Groups ((∑ column size) + 7)
Rank Cache
- when the server runs a session with a Rank transformation, it compares an input row with rows with
rows
in data cache. If the input row out-ranks a stored row,the Informatica server replaces the stored
row with the
input row.
- If the rank transformation is configured to rank across multiple groups, the server ranks
incrementally for

98

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

each group it finds .


Index Cache :
#Groups ((∑ column size) + 7)
Rank Data Cache:
#Group [(#Ranks * (∑ column size + 10)) + 20]
Joiner Cache:
- When server runs a session with joiner transformation, it reads all rows from the master source and
builds
memory caches based on the master rows.
- After building these caches, the server reads rows from the detail source and performs the joins
- Server creates the Index cache as it reads the master source into the data cache. The server uses
the
Index cache to test the join condition. When it finds a match, it retrieves rows values from the data
cache.
- To improve joiner performance, the server aligns all data for joiner cache or an eight byte
boundary.
Index Cache :
#Master rows [(∑ column size) + 16)
Joiner Data Cache:
#Master row [(∑ column size) + 8]
Lookup cache:
- When server runs a lookup transformation, the server builds a cache in memory, when it process
the first
row of data in the transformation.
- Server builds the cache and queries it for the each row that enters the transformation.
- If you partition the source pipeline, the server allocates the configured amount of memory for
each partition.
If two lookup transformations share the cache, the server does not allocate additional memory for
the
second lookup transformation.
- The server creates index and data cache files in the lookup cache drectory and used the server
code page
to create the files.
Index Cache :
#Rows in lookup table [(∑ column size) + 16)
Lookup Data Cache:
#Rows in lookup table [(∑ column size) + 8]
Mapplets
When the server runs a session using a mapplets, it expands the mapplets. The server then runs the
session as it
would any other sessions, passing data through each transformations in the mapplet.
If you use a reusable transformation in a mapplet, changes to these can invalidate the mapplet
and every mapping
using the mapplet.
You can create a non-reusable instance of a reusable transformation.
Mapplet Objects:
(a) Input transformation
(b) Source qualifier
(c) Transformations, as you need
(d) Output transformation
Mapplet Won’t Support:
- Joiner

99

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

- Normalizer
- Pre/Post session stored procedure
- Target definitions
- XML source definitions
Types of Mapplets:
(a) Active Mapplets - Contains one or more active transformations
(b) Passive Mapplets - Contains only passive transformation
Copied mapplets are not an instance of original mapplets. If you make changes to the original,
the copy does not inherit
your changes
You can use a single mapplet, even more than once on a mapping.
Ports
Default value for I/P port - NULL
Default value for O/P port - ERROR
Default value for variables - Does not support default values
Session Parameters
This parameter represent values you might want to change between sessions, such as DB Connection or
source file.
We can use session parameter in a session property sheet, then define the parameters in a session
parameter file.
The user defined session parameter are:
(a) DB Connection
(b) Source File directory
(c) Target file directory
(d) Reject file directory
Description:
Use session parameter to make sessions more flexible. For example, you have the same type of
transactional data
written to two different databases, and you use the database connections TransDB1 and TransDB2 to
connect to the
databases. You want to use the same mapping for both tables.
Instead of creating two sessions for the same mapping, you can create a database connection
parameter, like
$DBConnectionSource, and use it as the source database connection for the session.
When you create a parameter file for the session, you set $DBConnectionSource to TransDB1 and run the
session.
After it completes set the value to TransDB2 and run the session again.
NOTE:
You can use several parameter together to make session management easier.
Session parameters do not have default value, when the server can not find a value for a session
parameter, it fails to
initialize the session.
Session Parameter File
- A parameter file is created by text editor.
- In that, we can specify the folder and session name, then list the parameters and variables used in the
session and assign each value.
- Save the parameter file in any directory, load to the server
- We can define following values in a parameter
o Mapping parameter
o Mapping variables
o Session parameters

100

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

- You can include parameter and variable information for more than one session in a single parameter file
by
creating separate sections, for each session with in the parameter file.
- You can override the parameter file for sessions contained in a batch by using a batch parameter file. A
batch parameter file has the same format as a session parameter file
Locale
Informatica server can transform character data in two modes
(a) ASCII
a. Default one
b. Passes 7 byte, US-ASCII character data
(b) UNICODE
a. Passes 8 bytes, multi byte character data
b. It uses 2 bytes for each character to move data and performs additional checks at session level, to
ensure data integrity.
Code pages contains the encoding to specify characters in a set of one or more languages. We can
select a code page,
based on the type of character data in the mappings.
Compatibility between code pages is essential for accurate data movement.
The various code page components are
- Operating system Locale settings
- Operating system code page
- Informatica server data movement mode
- Informatica server code page
- Informatica repository code page
Locale
(a) System Locale - System Default
(b) User locale - setting for date, time, display
© Input locale
Mapping Parameter and Variables
These represent values in mappings/mapplets.
If we declare mapping parameters and variables in a mapping, you can reuse a mapping by altering the
parameter
and variable values of the mappings in the session.
This can reduce the overhead of creating multiple mappings when only certain attributes of mapping
needs to be
changed.
When you want to use the same value for a mapping parameter each time you run the session.
Unlike a mapping parameter, a mapping variable represent a value that can change through the session.
The
server saves the value of a mapping variable to the repository at the end of each successful run and used
that value
the next time you run the session.
Mapping objects:
Source, Target, Transformation, Cubes, Dimension
Debugger
We can run the Debugger in two situations
(a) Before Session: After saving mapping, we can run some initial tests.
(b) After Session: real Debugging process
MEadata Reporter:
- Web based application that allows to run reports against repository metadata
- Reports including executed sessions, lookup table dependencies, mappings and source/target schemas.

101

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Repository
Types of Repository
(a) Global Repository
a. This is the hub of the domain use the GR to store common objects that multiple developers can use
through shortcuts. These may include operational or application source definitions, reusable
transformations, mapplets and mappings
(b) Local Repository
a. A Local Repository is with in a domain that is not the global repository. Use4 the Local Repository for
development.
© Standard Repository
a. A repository that functions individually, unrelated and unconnected to other repository
NOTE:
- Once you create a global repository, you can not change it to a local repository
- However, you can promote the local to global repository
Batches
- Provide a way to group sessions for either serial or parallel execution by server
- Batches
o Sequential (Runs session one after another)
o Concurrent (Runs sessions at same time)

Nesting Batches
Each batch can contain any number of session/batches. We can nest batches several levels deep,
defining batcheswithin batches
Nested batches are useful when you want to control a complex series of sessions that must run sequentially
orconcurrently

Scheduling
When you place sessions in a batch, the batch schedule override that session schedule by default.
However, wecan configure a batched session to run on its own schedule by selecting the “Use Absolute
Time Session” Option.

Server Behavior
Server configured to run a batch overrides the server configuration to run sessions within the batch. If you
havemultiple servers, all sessions within a batch run on the Informatica server that runs the batch.
The server marks a batch as failed if one of its sessions is configured to run if “Previous completes” and that
previous session fails.
Sequential Batch
If you have sessions with dependent source/target relationship, you can place them in a sequential batch,
so that
Informatica server can run them is consecutive order.
They are two ways of running sessions, under this category
(a) Run the session, only if the previous completes successfully
(b) Always run the session (this is default)
Concurrent Batch
In this mode, the server starts all of the sessions within the batch, at same time
Concurrent batches take advantage of the resource of the Informatica server, reducing the time it takes to
run thesession separately or in a sequential batch.
Concurrent batch in a Sequential batch
If you have concurrent batches with source-target dependencies that benefit from running those batches
in aparticular order, just like sessions, place them into a sequential batch.
Stopping and aborting a session
- If the session you want to stop is a part of batch, you must stop the batch

102

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

- If the batch is part of nested batch, stop the outermost batch


- When you issue the stop command, the server stops reading data. It continues processing and writing
dataand committing data to targets
- If the server cannot finish processing and committing data, you can issue the ABORT command. It is similar
to stop command, except it has a 60 second timeout. If the server cannot finish processing and committing
data within 60 seconds, it kills the DTM process and terminates the session.
Recovery:
- After a session being stopped/aborted, the session results can be recovered. When the recovery is
performed, the session continues from the point at which it stopped.
- If you do not recover the session, the server runs the entire session the next time.
- Hence, after stopping/aborting, you may need to manually delete targets before the session runs again.
NOTE:ABORT command and ABORT function, both are different.
When can a Session Fail
- Server cannot allocate enough system resources
- Session exceeds the maximum no of sessions the server can run concurrently
- Server cannot obtain an execute lock for the session (the session is already locked)
- Server unable to execute post-session shell commands or post-load stored procedures
- Server encounters database errors
- Server encounter Transformation row errors (Ex: NULL value in non-null fields)
- Network related errors
When Pre/Post Shell Commands are useful
- To delete a reject file
- To archive target files before session begins
Session Performance
- Minimum log (Terse)
- Partitioning source data.
- Performing ETL for each partition, in parallel. (For this, multiple CPUs are needed)
- Adding indexes.
- Changing commit Level.
- Using Filter trans to remove unwanted data movement.
- Increasing buffer memory, when large volume of data.
- Multiple lookups can reduce the performance. Verify the largest lookup table and tune the expressions.
- In session level, the causes are small cache size, low buffer memory and small commit interval.
- At system level,
o WIN NT/2000-U the task manager.
o UNIX: VMSTART, IOSTART.
Hierarchy of optimization
- Target.
- Source.
- Mapping
- Session.
- System.
Optimizing Target Databases:
- Drop indexes /constraints
- Increase checkpoint intervals.
- Use bulk loading /external loading.
- Turn off recovery.
- Increase database network packet size.
Source level
- Optimize the query (using group by, group by).
- Use conditional filters.
- Connect to RDBMS using IPC protocol.

103

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Mapping
- Optimize data type conversions.
- Eliminate transformation errors.
- Optimize transformations/ expressions.
Session:
- concurrent batches.
- Partition sessions.
- Reduce error tracing.
- Remove staging area.
- Tune session parameters.
System:
- improve network speed.
- Use multiple preservers on separate systems.
- Reduce paging.

Improving Performance at Session level


Optimizing the Session
Once you optimize your source database, target database, and mapping, you can focus on optimizing
the session. Youcan perform the following tasks to improve overall performance:
 Run concurrent batches.
 Partition sessions.
 Reduce errors tracing.
 Remove staging areas.
 Tune session parameters.
Table lists the settings and values you can use to improve session performance:

Example Walkthrough

1. Go to Mappings Tab, Click Parameters and Variables Tab, Create a NEW port as below.

104

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

$$LastRunTime Variable date/time 19 0 Max


Give an Initial Value. For example 1/1/1900.

2. IN EXP Transformation, Create Variable as below:

SetLastRunTime (date/time) = SETVARIABLE ($$LastRunTime, SESSSTARTTIME)

3. Go to SOURCE QUALIFIER Transformation,

Click Properties Tab, In Source Filter area, ENTER the following Expression.
UpdateDateTime (Any Date Column from source) >= '$$LastRunTime'
AND
UPDATEDATETIME < '$$$SESSSTARTTIME'

4. Handle Nulls in DATE

iif(isnull(AgedDate),to_date('1/1/1900','MM/DD/YYYY'),trunc(AgedDate,'DAY'))

5. LOOK UP AND UPDATE STRATEGY EXPRESSION

First, declare a Look Up condition in Look Up Transformation.


For example,

EMPID_IN (column coming from source) = EMPID (column in target table)

Second, drag and drop these two columns into UPDATE Strategy Transformation.

Check the Value coming from source (EMPID_IN) with the column in the target table (EMPID). If both are
equal this means that the record already exists in the target. So we need to update the record
(DD_UPDATE). Else will insert the record coming from source into the target (DD_INSERT). See below for
UPDATE Strategy expression.

IIF ((EMPID_IN = EMPID), DD_UPDATE, DD_INSERT)

Note:Always the Update Strategy expression should be based on Primary keys in the target table.

6. EXPRESSION TRANSFORMATION

1. IIF (ISNULL (ServiceOrderDateValue1),


TO_DATE ('1/1/1900','MM/DD/YYYY'), TRUNC (ServiceOrderDateValue1,'DAY'))
2. 2.IIF (ISNULL (NpaNxxId1) or LENGTH (RTRIM (NpaNxxId1))=0 or TO_NUMBER (NpaNxxId1) <= 0,'UNK',
NpaNxxId1)
3. IIF (ISNULL (InstallMethodId),0,InstallMethodId)
4. Date_Diff(TRUNC(O_ServiceOrderDateValue),TRUNC(O_ServiceOrderDateValue), 'DD')

7. FILTER CONDITION

105

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

To pass only NOT NULL AND NOT SPACES VALUES THROUGH TRANSFORMATION.

IIF ( ISNULL(LENGTH(RTRIM(LTRIM(ADSLTN)))),0,LENGTH(RTRIM(LTRIM(ADSLTN))))>0

SECOND FILTER CONDITION [Pass only NOT NULL FROM FILTER]

iif(isnull(USER_NAME),FALSE,TRUE)

106

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

SENARIOS
1. Using indirect method we can load files with same structure. How to load file name in the
database.
Input files
File1.txt
Andrew|PRES|Addline1|NJ|USA
Samy|NPRS|Addline1|NY|USA
File2.txt
Bharti|PRES|Addline1|KAR|INDIA
Ajay|PRES|Addline1|RAJ|INDIA
Bhawna|NPRS|Addline1|TN|INDIA
In database want to load the file name
File Name Name Type Address Line State Country
File1.txt Andrew PRES Addline1 NJ USA
File1.txt Samy NPRS Addline1 NY USA

File2.txt Bharti PRES Addline1 KAR INDIA


File2.txt Ajay PRES Addline1 RAJ INDIA
File2.txt Bhawna NPRS Addline1 TN INDIA
Ans:
This can be done by enabling CurrentlyProcessedRow
Do this in the source analyzer while create the source definition

107

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Then this Currently ProcessedRows column will be enabled

2. How to separate the duplicate in 1 target and unique only to another target

1|Piyush|Patra|
2|Somendra|Mohanthy
3|Santhosh|bishoyi
1|Piyush|Patra|
2|Somendra|Mohanthy

108

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

O/P
File1
1|Piyush|Patra|
2|Somendra|Mohanthy
File2
3|Santhosh|bishoyi

Solution:
This can be done this the help of Aggregator.
Group by the columns on which you want to decide duplicate or unique
Port Expression Group by
ID Yes
FName Yes
LName Yes
Count count(id)

In router condition for duplicate: Count >1 (or unique: Count =1)

3. How to load n number of records equally to 4 targets


Sol:
You can do this using the sequence generator and router

109

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Sequence Generator
Use this to generate the record number from 1 to 4
Set following properties

Expression:
Use this to get the next value from Sequence generator
Router:
Use this to redirect output to 4 targets based on the group property

110

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

111

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

4. How to insert First_Name,Last_Name for below scenario


Source
First_Name Last_Name
Vijay
Jaiswal
Kuldeep
Solanki
Lalsingh
Bharti
Poornaya
Cherukmala
Rajeev
TK
Target (o/p)
First_Name Last_Name
Vijay Jaiswal
Kuldeep Solanki

Lalsingh Bharti

Poornaya Cherukmala

Rajeev TK

Solution:
Option1
You can assign serial number to Source data,then group by serial number and take write to target

Expression
Use expression to assign the same serial number to first_name and last_name

112

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Now data will be as below


Sl First_Name Last_Name
1 Vijay
1 Jaiswal
2 Kuldeep
2 Solanki
3 Lalsingh
3 Bharti
4 Poornaya
4 Cherukmala
5 Rajeev
5 TK
Aggregator
Group by SL.As per aggregator property null values are ignored, use max or min function to get name
combination

113

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

5. Customer records entered in the OLTP system by different agent as below

First Name Last Name Address Entry_Date


Srini Reddy Cegedim, Bangalore 01-01-2011 10:05

Tarun Tanwar Capgemini, US 01-01-2011 10:15

Devashis Jain Symphony, Bangalore 01-01-2011 10:25

Srini Reddy Cegedim ,Bangalore 01-01-2011 11:20

In Data mart records are loaded on the same date 4:00 PM, records should be loaded as below
First Name Last Name Address Effective date End date
Srini Reddy Cegedim, Bangalore 01-01-2011 14:00:00 01-01-2011 14:00:10

Tarun Tanwar Capgemini, US 01-01-2011 14:00:02

Devashis Jain Symphony, Bangalore 01-01-2011 14:00:05

Srini Reddy Cegedim Pvt Ltd ,Bangalore 01-01-2011 14:00:10

114

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Solution:
If you u use static lookup then for srini Reddy only one record will be loaded. Because lookup will return
only one value
First Name Last Name Address Effective date End date
Srini Reddy Cegedim, Bangalore 01-01-2011 14:00:00

Tarun Tanwar Capgemini, US 01-01-2011 14:00:02

Dev Jain Symphony, Bangalore 01-01-2011 14:00:05

This can be done using dynamic lookup cache


Configure lookup cache as Dynamic

First Name Last Name Address NewLookupRow


Srini Reddy Cegedim, Bangalore 1

Tarun Tanwar Capgemini, US 1


Dev Jain Symphony, Bangalore 1

Srini Reddy Cegedim Pvt Ltd ,Bangalore 2

Use router to route for Insert and Insert-Update


Then for Insert and Update strategy to insert
For insert Update use sequence generator to create surrogate key
Use update strategy to end old record with system date

6. Needs to calculate the contribution to family income

Family ID Person ID Person Name Earning


100 1 Vijay 20000
100 2 Ajay 30000
200 3 Bharat 60000
200 4 Akash 60000
300 5 Sanjay 50000
O/P
Family ID Person ID Person Name %Contribution to family
100 1 Vijay 40
100 2 Ajay 60
200 3 Bharat 50
200 4 Akash 50
300 5 Sanjay 100

Solution1:
Using SQL Query
Select a.Family_ID,a.Person_Id,a.Person_name,(a.Earning/b.Tot_Sal)*100 as cont_To_Family from
(Select sum (Earning) as Tot_Sal,Family_ID from family_income group by Family_ID)b,family_income
a where a.Family_ID=b.Family_ID

Solution2:
This can be done this the help of Joiner and Aggregator.
Port Expression Group by

115

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Family_id1 Yes
Sal
Tot_sal Sum(sal)

Use the joiner to join the records based on the family id

Port Master/Detail
Family_id Detail
Tot_Sal Detail
Person_id Master
Family_id Master
Person_name Master
Sal Master
Use the Join condtion
Family_id1=family_id
Use the expression to get the calculation
Port Epression
Contribution (sal/Tot_sal)*100

116

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

7. To produce normalized output or Convert rows into columns

I/P
ID Month Sales
1 Jan 100
1 Feb 120
1 March 135
2 Jan 110
2 Feb 130
2 March 120

O/P
ID Jan Feb March
1 100 120 135
2 110 130 120

Prac_110_7 table source


Solution1: Using SQL Query

Select
ID,
Max (decode (Month, 'Jan', Sales)) as Jan,
Max (decode (Month, 'Feb', Sales)) as Feb,
Max (decode (Month, 'March', Sales)) as March
From (Select ID, Sales, Month from Normalized)
group by ID

Solution2:

Use the aggregator group by ID and use First function


FIRST (AMOUNT, MONTH='JAN')
FIRST(AMOUNT, MONTH='FEB')
FIRST(AMOUNT, MONTH='MAR')

117

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

8. Data Scenario
When multiple records are coming from the source when join with another table for a single input.

Sample Logic:

1. The logic has been used to find a single valid crcust number from the source.

2. The source tables are TRAN, CUSTMR and CUSTOMER.

3. The crcust number will be pulled from the CUSTMR table by the normal join on TRAN table based
on the fields gtkey, gtcmpy and gtbrch.

4. If multiple records are coming from the custmr table for a single combination of gtkey, gtcmpy and
gtbrch then we can do lookup on the customer table based on crcust number from custmr table
and outlet_status in customer table should be in (‘1’ or spaces) or if we are getting only one crcust
number for a single combination then we can use that valid crcust number.

118

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

5. If we are getting only one crcust number from customer table then we can process that crcust
number or if we are getting multiple crcust numbers then we have to use filter, sorter and
aggregator transformations to get the valid crcust number without max or min on the multiple
records in the customer table.

The following query that has used to retrieve the source records from the first source table. (AS400
Environment)

Select

tran.gtdvsn,

tran.gtamnt,

tran.gtpsdt,

tran.gtpsmt,

tran.gtpsyr,

tran.gtlact,

Substr (digits (tran.gtkey), 1, 7) as gtkey,

Digits (tran.gtcmpy) as gtcmpy,

tran.gtbrch

From

Npc.tran Tran

Where

tran.gtcmpy = 300

And Tran.Gtamnt = 115.5

Source data:

GTDVSN GTAMNT GTPSDT GTPSMT GTPSYR GTLACT GTKEY GTCMPY GTBRCH


101 115.50 1090210 2 109 422000-0001 0002463 300

119

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

The following source query has used to retrieve the second table source records (AS400 Environment):

When there is a single input from the TRAN table, CUSTMR table populating multiple records (Normal Join).

SELECT
CUSTMR.CRCUST as CRCUST,
CUSTMR.CRPCUS as CRPCUS,
SUBSTR (CUSTMR.CREXT1, 1, 3) as CREXT1,
CUSTMR.CRPBRC as CRPBRC,
A.COUNT
FROM
NMC.CUSTMR , (SELECT COUNT (*) COUNT,
CUSTMR.CRPCUS as CRPCUS,
SUBSTR (CUSTMR.CREXT1, 1, 3) as CREXT1,
CUSTMR.CRPBRC as CRPBRC
FROM
NMC.CUSTMR CUSTMR
WHERE
SUBSTR (CUSTMR.CREXT1, 1, 3) = '300'
GROUP BY
CUSTMR.CRPCUS,
SUBSTR (CUSTMR.CREXT1, 1, 3),
CUSTMR.CRPBRC) A
WHERE
CUSTMR.CRPCUS=A.CRPCUS and
SUBSTR (CUSTMR.CREXT1, 1, 3) =A.CREXT1 AND
CUSTMR.CRPBRC=A.CRPBRC AND
SUBSTR (CUSTMR.CREXT1, 1, 3) = '300'
AND CUSTMR.CRCUST IN ('0045907','0014150')

Source data from custmr table:

CRCUST CRPCUS CREXT1 CRPBRC COUNT


0014150 0002463 300 2
0045907 0002463 300 2

The below detail outer join on customer table has been used to get the valid crcust number when we are
getting multiple crcust numbers after the normal join.

The master table CUSTMR has joined with the detail table CUSTOMER based on CRCUST field (detail outer
join).

SELECT
DISTINCT LPAD (AR_NUM, 7,'0') AS AR_NUM
FROM
120

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

CUSTOMER
WHERE
LPAD (AR_NUM, 7,'0') IN ('0014150','0045907')

Source data from customer table:

AR_NUM
0014150

The valid crcust number '0014150' will be processed and populated to the target.
We will see the source data set when multiple records have found in customer table, and when we should
not use max or min on the crcust number.
Source data set:

Source query for TRAN table:

SELECT
TRAN.GTDVSN,
TRAN.GTAMNT,
TRAN.GTPSDT,
TRAN.GTPSMT,
TRAN.GTPSYR,
TRAN.GTLACT,
SUBSTR (DIGITS (TRAN.GTKEY), 1, 7) AS GTKEY,
DIGITS (TRAN.GTCMPY) AS GTCMPY,
TRAN.GTBRCH
FROM
NMP.TRAN TRAN
WHERE
TRAN.GTCMPY = 300
AND TRAN.GTAMNT IN (1030.5, 558.75, 728)

Source data from TRAN table:

GTDVSN GTAMNT GTPSDT GTPSMT GTPSYR GTLACT GTKEY GTCMPY GTBRCH


101 1030.50 1090210 2 109 422000-0001 0006078 300
101 558.75 1090210 2 109 2550-1043001 0006078 300
101 728.00 1090210 2 109 2550-422000 0006078 300

Source query for CUSTMR table:

121

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

SELECT CUSTMR.CRCUST,
CUSTMR.CRPCUS as CRPCUS,
SUBSTR (CUSTMR.CREXT1, 1, 3) as CREXT1,
CUSTMR.CRPBRC as CRPBRC
FROM
NMC.CUSTMR CUSTMR
WHERE
SUBSTR (CUSTMR.CREXT1, 1, 3) = '300'
AND CUSTMR.CRPCUS= '0006078'

Source data from CUSTMR table:

CRCUST CRPCUS CREXT1 CRPBRC


0001877 0006078 300
0002392 0006078 300
0041271 0006078 300

Source query for CUSTOMER table:

SELECT
DISTINCT LPAD (AR_NUM, 7,'0') AS AR_NUM
FROM
CUSTOMER
WHERE
LPAD (AR_NUM, 7,'0') IN ('0001877','0002392','0041271')

Source data from CUSTOMER table:

AR_NUM
0001877
0041271

The crcust numbers '0001877' and '0041271' are valid among those three crcust numbers. But the mapping
should populate only one crcust number among these two valid crcust numbers.

The below logic has been used to select a one valid crcust number.

122

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Filter transformation:

The inputs for the filter transformation are coming from the joiner transformation which have done the detail
outer join on the master table custmr and the detail table customer table based on the crcust number.

Filter condition:

COUNT_CRCUST=1 OR (COUNT_CRCUST<>1 AND NOT ISNULL (AR_NUM))

COUNT_CRCUST=1 represents that the records from custmr table which have only one valid crcust number
for a single combination of gtkey, gtcmpy and gtbrch fields and that crcust number may not be present in
customer table.

COUNT_CRCUST<>1 AND NOT ISNULL (AR_NUM) represents that the records which have more than one
crcust number for a single combination of gtkey, gtcmpy and gtbrch fields and the ar_num from customer
table should not be null(It means that the multiple crcust numbers should have present in customer table
also).

The filter transformation is used to filter the records which are the records that have the valid crcust
numbers.

123

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Sorter Transformation:

The sorter transformation is used to sort the records in descending order based on crcust number, crpcus,
crext1 and crpbrc. In the next aggregator transformation we are going to use min on outlet_status field, we
will loose the multiple records for that we are sorting the records. Till this sorter transformation we have
processed the records from customer table when all kinds of outlet_status.

Aggregator transformation:

The aggregator transformation has used to eliminate the crcust numbers which has the outlet_status not in
(‘1’ or spaces) and has used to group by the crcust numbers based on the source fields crpcus, crext1 and
crpbrc.

124

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

The outlet_status has transformed as ‘1’ when outlet_status is in ‘1’ or spaces else that will be transformed as
‘9’. From this we are taking a min outlet_status to consider only the records which have the outlet_status is
in ‘1’ or spaces for the target.

125

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Then we can join these records with the source table TRAN to get the valid crcust number.

126

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

This joiner transformation is used to join the records from both the inputs from the aggregator transformation
and TRAN source qualifier transformation.

127

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Using this logic, we can find a single crcust number for a single combination of inputs from the source TRAN.

128

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

6. Data Scenario

When more than two continuous joiner transformations have to be used in the mapping and when the
source tables are belongs to the same database for those joins.

Sample Logic:

1. The tables that have used in the logic are SALES, PURCXREF and SALES_INDEX. All these tables are
belongs to the same database.

2. The fields to the target from source are historical_business_unit, item_code, order_qty, ship_date,
grp_code and event_order_ind.

3. This logic is applicable for only for the records which have the SALES.historical_business_unit is in
‘BH’ and ‘BW’.

4. Optionally join to PURCXREF where PURCXREF.ITEM_ID = SALES.ITEM_ID

5. If join to PURCXREF is successful, then perform required join to SALES_INDEX table when
PURCXREF.ITEM_OD = SALES_INDEX.ITEM_ID.

6. If join to PURCXREF is not successful then perform required join to SALES where SALES_INDEX.ITEM_ID
= SALES.ITEM_ID.

7. The logic that have used for item_code is that concatenation of style_num, color_code,
attribution_code and size_num from SALES_INDEX when join to PURCXREF is successful or not.

8. The logic that have used for order_qty is that SUM (PURCXREF.ZCOMP_QTY *
SALES.ORINGINAL_ORDER_UNITS) when the join to PURCXREF is successful and SUM
(SALES.ORIGINAL_ORDER_UNITS) when the join to PURCXREF is not successful.

The below sql query is used to reduce the number of joiners in the mapping.

Select

S.HISTORICAL_BUSINESS_UNIT,

(SI.STYLE_NUM || SI.COLOR_CODE || SI.ATTRIBUTION_CODE || SI.SIZE_NUM) AS ITEMCODE,

ROUND (SUM (CASE WHEN S.HISTORICAL_BUSINESS_UNIT IN ('CH','CW') AND

PCX.ZITEMID IS NOT NULL THEN

S.ORIGINAL_ORDER_UNITS * PCX.ZCOMP_QTY

ELSE
129

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

S.ORIGINAL_ORDER_UNITS

END),0) ORDER_QTY,

S. SHIP_DATE,

S.GRP_CDE,

S.EVENT_ORDER_IND

FROM

OPC.SALES S

LEFT OUTER JOIN (SELECT MAX (p.plant),

p.af_grid,

p.zcmp_grid,

p.zcomp_qty,

p.material,

p.component,

p.zitemid,

p.compitemid

FROM OPC.PURCXREF p

GROUP BY p.af_grid,

p.zcmp_grid,

p.zcomp_qty,

p.material,

p.component,

p.zitemid,

p.compitemid) PCX

ON S.ITEM_ID = PCX.ZITEMID

JOIN OPC.SALES INDEX SI

ON (CASE WHEN S.HISTORICAL_BUSINESS_UNIT IN ('BH','BW') THEN


130

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

CASE WHEN PCX.ZITEMID IS NULL THEN

S.ITEM_ID

ELSE PCX.COMPITEMID END

ELSE S.ITEM_ID END) = SI.ITEM_ID

GROUP BY

S.HISTORICAL_BUSINESS_UNIT,

S.REQUESTED_SHIP_DATE,

S.GRP_CDE,

S.EVENT_ORDER_IND,

SI.STYLE_NUM || SI.COLOR_CODE || SI.ATTRIBUTION_CODE || SI.SIZE_NUM

9. How to populate 1st record to 1st target ,2nd record to 2nd target ,3rd record to 3rd target and 4th
record to 1st target through informatica?
We can do using sequence generator by setting end value=3 and enable cycle option.then in the router
take 3 goups
In 1st group specify condition as seq next value=1 pass those records to 1st target simillarly
In 2nd group specify condition as seq next value=2 pass those records to 2nd target
In 3rd group specify condition as seq next value=3 pass those records to 3rd target.
Since we have enabled cycle option after reaching end value sequence generator will start from 1,for the
4th record seq.next value is 1 so it will go to 1st target.

10. How to do Dymanic File generation in Informatica?


We want to generate the separate file for every State (as per state, it should generate file).It has to
generate 2 flat files and name of the flat file is corresponding state name that is the requirement.
Below is my mapping.
Source (Table) -> SQ -> Target (FF)
Source:
State Transaction City
AP 2 HYD
AP 1 TPT
KA 5 BANG
KA 7 MYSORE
KA 3 HUBLI
This functionality was added in informatica 8.5 onwards earlier versions it was not there.
We can achieve it with use of transaction control and special "FileName" port in the target file .

131

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

In order to generate the target file names from the mapping, we should make use of the special "FileName"
port in the target file. You can't create this special port from the usual New port button. There is a special
button with label "F" on it to the right most corner of the target flat file when viewed in "Target Designer".
When you have different sets of input data with different target files created, use the same instance, but
with a Transaction Control transformation which defines the boundary for the source sets.
in target flat file there is option in column tab i.e filename as column.
when you click that one non editable column gets created in metadata of target.
in transaction control give condition as iif(not isnull(emp_no),tc_commit_before,continue) else
tc_commit_before
map the emp_no column to target's filename column
ur mapping will be like this
source -> squlf-> transaction control-> target
run it ,separate files will be created by name of Ename

11. How to concatenate row data through informatica?m_124


Source:
Ename EmpNo
stev 100
methew 100
john 101
tom 101

Target:
Ename EmpNo
Stev methew 100
John tom 101

Approach1: Using Dynamic Lookup on Target table:

Lookup (Dynamic):
Condition: EMPNO=IN_EMPNO
Click on “Output Old Value On Update “

Filter:
Filter Condition: NewLookupRow=2

132

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Expression:
V_ENAME=LKP_ENAME || ' ' || IN_ENAME
O_ENAME= V_ENAME

Approch2: Using Var port:

Expression:
V_ENAME=IIF (EMPNO = V_PREV_EMPNO, V_PREV_ENAME || ' ' || ENAME)
O_ENAME= V_ENAME
V_PREV_ENAME= ENAME
V_PREV_EMPNO= EMPNO

Filter:
Filter Condition: O_ENAME != ''

ONE MORE SOLUTION I HAVA SEE MAPPING M_PG124 but only for two records

12. How to send Unique (Distinct) records into One target and duplicates into another tatget?
M_125_12
Source:
Ename EmpNo
stev 100
Stev 100
john 101
Mathew 102

Output:
Target_1:
Ename EmpNo
Stev 100
John 101
Mathew 102

Target_2:

133

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Ename EmpNo
Stev 100

Approch 1: Using Dynamic Lookup on Target table:

Lookup (Dynamic):
Condition: EmpNo= IN_EmpNo

Router:
If it is first time then sent it to first target_1
Group filter condition: NewLookupRow=1

If it is already inserted then send it to Tartget_2.


Default Group

Approch2: Using Var port :

134

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Sorter:
Sort EmpNo column as Ascending

Expression:
V_Seq_ID=1
V_ EmpNo=IIF (EmpNo! = V_Prev_ EmpNo, 1, V_Seq_ID+1)
V_Prev_ EmpNo= EmpNo
O_ EmpNo= V_ EmpNo

Router:
If it is first time then sent it to first target_1
Group filter condition: O_ EmpNo=1

If it is already inserted then send it to Tartget_2.


Default Group

13. How to process multiple flat files to single target table through informatica if all files are same
structure?
We can process all flat files through one mapping and one session using list file.
First we need to create list file using unix script for all flat file the extension of the list file is .LST.
The list files it will have only flat file names.
At session level we need to set source file directory as list file path
And source file name as list file name
And file type as indirect.

14. How to populate file name to target while loading multiple files using list file concept.
In informatica 8.6 by selecting Add currently processed flatfile name option in the properties tab of source
definition after import source file defination in source analyzer.It will add new column as currently
processed file name.we can map this column to target to populate filename.

135

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

15. If we want to run 2 workflow one after another(how to set the dependence between wf’s)
1. If both workflow exists in same folder we can create 2 worklet rather than creating 2 workfolws.
2. Finally we can call these 2 worklets in one workflow.
3. There we can set the dependency.
4. If both workflows exists in different folders or repository then we cannot create worklet.
5. We can set the dependency between these two workflow using shell script is one approach.
6. The other approach is event wait and event rise.
If both workflow exists in different folrder or different rep then we can use below approaches.
Using shell script
1. As soon as first workflow get completes we are creating zero byte file (indicator file).
2. If indicator file is available in particular location. We will run second workflow.
3. If indicator file is not available we will wait for 5 minutes and again we will check for the indicator.
Like this we will continue the loop for 5 times i.e 30 minutes.
4. After 30 minutes if the file does not exists we will send out email notification.
Event wait and Event rise approach
We can put event wait before actual session run in the workflow to wait a indicator file if file available then
it will run the session other event wait it will wait for infinite time till the indicator file is available.

5. How to load cumulative salary in to target ?


Solution:
Using var ports in expression we can load cumulative salary into target.

136

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

6. I have 1000 records in my source table, the same i have in target ,but a new column added in
target as "batchno", and this column adds no 10 for 1st 100 records and 20 for next 100 records and
30 next 100 records and vice versa. How to acheive this? M_test_seq
Firstly it should have a primary key(unique key in target) on the target table then use update startegy t/r to
update the targert.
Either use Seq or a variable as counter
Option1:
In Sequence Generator:
Give Start Value = 1
End value = 100
Check the Properties as CYCLE.
Give the following condition In Expression Transformation:
O_SEQUENCE= NEXT_VAL
v_COUNT = IIF (O_SEQUENCE = 1, v_COUNT+10)

Option2: Sequence Generator without cycle


iif(seq<100,10,iif(seq>100 and seq<=200,20,iif(seq>200 and seq<=300,30,40)))

7. How can we call a stored procedure at session level under Target post SQL
We can call the procedures in a anonymous block within pre-sql /post -sql using the following syntax:
begin procedure_name()\; end\;

8. Consider the input records to be


Trade_ID AMOUNT STATUS CURRENCY APPLIED_DATE
100 78.86889 TRUE USD 10/25/2013 5:13
100 6.864 TRUE USD 10/25/2013 7:37
100 865.97 TRUE USD 10/25/2013 10:01
100 -0.879 FALSE USD 10/25/2013 12:25
200 8.99 FALSE EUR 10/25/2013 14:49
200 78 TRUE EUR 10/25/2013 17:13
200 53.98 TRUE EUR 10/25/2013 19:37

The data is ordered based on applied_date. When the latest record status is FALSE for a trad_id the very
next previous record values should be passed to target.
Output required:
200 53.98 TRUE EUR 10/25/2013 19:37

100 865.97 TRUE USD 10/25/2013 10:01

137

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Solution1:

Sorter:
TRADE_ID=Ascending
STATUS= Ascending

Aggregator:
TRADE_ID=GroupBy
O_AMOUNT= IIF (LAST
LAST (STATUS) = 'FALSE', LAST (AMOUNT, STATUS! = 'FALSE'), LAST (AMOUNT))

Solution2:

SQ:
Source Filter=LATEST_RECORD.STATUS='TRUE'

Aggregator:
TRADE_ID=GroupBy

9. I have a input column which has alpha numeric and special character of data type string, I want
only numbers [0-9] from it, in my output portm_130_9

REPLACECHR(0, REG_REPLACE(COLUMN_NAME, '[^0-9]', '/'), '/', NULL)


The reason why I have used a replace character function after using a regular expression was because,
REG_REPLACE did not allow me to replace the unwanted character with a NULL.

10. Input flatfile1 has 5 columns, flatfile2 has 3 columns (no common column) output should contain
merged data(8 columns) How to achieve ?
Take first file's 5 column to a expressiontransformation,adda output column in it let say 'A'. Create
a mapping variable let say 'countA' having datatypeinteger.Now in port 'A' you put expression like
setcountvariable(countA).

138

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Take second file's 3 column to a expression transformation,add a output column in it let say 'B'.create a
mappingvariable let say 'countB' having datatype integer.Now inport 'B' you put expression like
setcountvariable(countB).

The above two is only meant for creating common fields withcommon data in two pipeline.
Now join two pipe line in a joiner transformation upon the
condition A=B.
Now join the required 8 ports to the target.

11. How will restrict values in 0-9 and A-Z and a-z and special character. Only allowed these chars
otherwise we will reject the records? what is the function we used to restrict...
IIF(reg_match(in_String,’^[a-zA-Z][\w\.-]*[a-zA-Z0-9]@[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-
zA-Z]$’),’Valid’,'Invalid’)
This function is usually usedto validate the email.

139

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

12. File SOURCE-1


NO NAME
1 SATISH
2 KARTHIK
3 SWATHI
4 KEERTHI
File SOURCE-2
NO NAME
1 SATISH
2 KARTHIK
5 SANTHOSE
6 VASU

TARGET
3 SWATHI
4 KEERTHI
5 SANTHOSE
6 VASU
Here as the source meta data is same we can use UNION t/r, after that use AGGRIGATOR t/r where you
GroupBy : NO ,count(NO) after that keep the filter t/r in that the condition is count=1.then connect to the
target
The flow is as fallows
src--->sq--->union--->agg---->filter---->trg

13. Below
Source
name sal
aaaa 2000
bbbb 3000
abcd 5000

Target
name sal
aaaa 2000
bbbb 3000
abcd 5000
total 10000 ---AS NEW ROW

SRC==>EXP==>AGGR==>UNION

SRC==> ==>TRG

In AGGR take one output port that as name field with constant value(total) and the sum sal field as sal
after that UNION with the other pipeline.

140

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

14. We have 20 records in source system, when we run for the 1st time, it should load only 10 records
into the target, when you run for the second time it should load another 10 record which are not
loaded. How do we do that? Can we write a SQL query in source qualifier to do it.

If u r source is Relational,then u can write sqloverride.

M_PG130_14

SQ:

SELECT EMP.EMPNO, EMP.ENAME, EMP.JOB, EMP.MGR, EMP.SAL, EMP.COMM, EMP.DEPTNO, ROWNUM as


NUM FROM EMP

FLT: (NUM >= $$Var1 and NUM <=$$Var2)

NOTE: $$Var1 and $$Var2 are mapping parameter the initial values of these parameter 1 and 10 with
respectively.

After completion of first run the parameter are increased 11and 20 respectively (When set initial values of
these parameter 11 and 20 with respectively)

15. I have a wf like wf-->s1-->s2-->s3-->s4; first start s1 later s2 later s3 here my session s3 have to run
3 times later start s4?

Option1: Wf-->s1-->s2-->s3-->Instance_of_s3--->Instance_of_s3-->s4

141

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

It can be achieved through Assignment task set a variable ‘$$Var = $$Var+1’& check ‘mod ($$Var, 3) =0’ in
the link condition and enable on persistent ‘$$Var’ port.

Option2:create S3 in another workflow & call it from PMCMD command. OR in write a shell script to call
session s1 and after successful completion s2; later in a loop call s2 three times and later s4.

16. Following source


Name gender

Ramya female

Ram male

deesha female

david male

kumar male

I want the target

male female

ram ramya

david deesha

kumar

Anybody give solution above question?

Create the mapping as below:

EXP1 ---> SRT1

/ \

142

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Src --> Router Joiner ---> EXP3 ---> Targ

\ /

EXP2 ---> SRT2

Router: Filter on the basis of gender. All records with gender male wil go into EXP1 and the records with
gender female will go into EXP2.

In EXP1 and EXP2, create a variable dummy port (Seq_Var) with value Seq_Var+1 and an output port
Seq_Out (value-Seq_Var).

Pass the name and sequence ports from EXP1 and EXP2 to SRT1 and SRT2 (sorter) respectively.

In SRT1 and SRT2, sort on the basis of sequence.

In Joiner, use 'Full Outer Join' and keep 'Sorted Input' checked in properties.

Pass the male name values and female name values from joiner to EXP3 (Gather all the data). Pass all the
data to Target.

17. We have 30 wf with 3 batches means 1batch=10 wf, 2 batch=10-20 wf, 3batch=20-30wf. First you
have to complete batch1 after that batch2and batch3 can have to run concurrently. How can you
do in unix?
Write three shell scripts for each batch of workflows using pmcmd commad to invoke workflows. Write
another shell script to call first batch shell script which contains wfs the after the in script to schedule next
batch of shell for the next minute using crontab.

18. How to display session logs based upon particular dates. If I want to display session logs for 1 week
from a particular date how can I do it without using unix
Open the session properties of a session,Go to CONFIG OBJECT Tab then Log Option settings , in this look for
Save
session log by and then select the option Session Timestamp.

19. How to add source flat file header with output field names into target file?
When we use target as a flatfile then in the mapping tab of a session click the flatfile target then we have a
header option, in that select output field names

20. What is Test load plan? Let us assume 10 records in source, how many rows will be loaded into
target?
NO rows will be loaded in to the target,bcoz this only TESTING

143

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Source records should be load into the target as per the number of records set at the session level after
enabling test load plan.

21. How do schedule a workflow in Informatica thrice in a day? Like run the workflow at 3am, 5am and
4pm?
In workflow designer > workflow > schedulers ... you can set 3 different daily schedulers. Each one has its
execution time (3, 4, 5 pm)

22. Can we insert,update, delete in target tables with oneupdate strategy tran.?
Yes we can do all these insert,update,delete also reject by using single update strategy.

In update strategy expression can be:

iif(somexvalue=xx,dd_insert,iif(somexvalue=yy,dd_update,iif (somexvalue=zz,dd_delete,dd_reject)))

23. If our source containing 1 terabyte data so while loading data into target what are the thing we
keep in mind?
1tera data is huge amount of data so if we use normal load type it takes so much of time....to overcome
this better to use bulk loading...and also taking the large commit intervals we can get good
performance....and one more technique is external loading for databases like netezza and teradat

Use partition option: Works for both relational and file source systems

If source is file: Instead of loading huge amount of data at a time, we can always split the source if it is a file
using unix and extract the data from all the source files using indirect type and load into target.

24. Can we insert duplicate data with dynamic look up cache, if yes than why and if no why?
Duplicate data cannot be inserted using dynamic look up cache..bcoz dynamic look up cache performs
update and insert function based on the keys it gets in the target table.

25. If there are 3 workflows are running and if 1st workflow fails then how could we start 2nd workflow or
if 2nd workflow fails how could we start 3rd workflow?
Option 1: Use worklet for each workflow and create and workflow with these worklets and create
dependency.
Option2: Use a scheduling tools such as control M or Autosys to create dependency among workflows.

26. My source having the records like


Name Occurs
ram 3
sam 5
144

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

tom 8

and I want to load into target like ram record 3 times, sam record 5 times, tom record 8 times.

Option1:
SQ SQLTran Target
In SQLTran : INSERT INTO TABLE (c1) select ?p_name? From dual connect by level<=?p_occurs?

Option2:
SQ Stored ProcTrans Target
Stored Proc: Call stored procedure within the pipeline SP_INSERT(p_name,p_occurs) within the proc use
p_occurs as loop counter and within in the loop insert records

27. SOURECE

Name id dept sal


1 a1 A 100
2 b1 B 200
3 c1 C 300
4 d1 D 400

TARGET:

Name id dept sal


1 a1 A 100
2 b1 B 200
3 WER1 567 300
4 d1 D 400

I have source and target. How to validate data? Tell me difference in data between steps above table?
There are many ways to check this:
1. Do column level check meaning SRC.ID=TGT.ID but otherscolumns are not matching on data.
2. You can do MINUS (SRC MINUS TGT) and (TGT MINUS SRC) forparticular day's load.
3. You can do referential integrity check by checking any idin SRC but not in TGT or any id in TGT but not in
SRC

28. We have 10 records in source in that good records go to relational target and bad records go to
target bad file ? Here if any bad records you need to send an email, if no bad records means no
need to send email
Option1:
Pre session command: remove the existing file from infa_shared/BadFiles/XXXXX.txt
In post session command of the session you can call a shell script which checks the bad file for non zero
records and send email accordingly.
Option 2:
Link Condition if $Session.TgtFailedsRows>0 then call a email task

29. Why should we use star schema in datawarehouse design?


1) Query performance is high.

2) It consume less space compare to SnowFlake.

145

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

30. When dynamic lookup; How do wegenerate surrogate keys using dynamic lookup? Can we use
itfor scd type 2 mapping and why?
Get surrogate key in as a lookup port and the set the associated port to sequence-ID.
When the Integration Service updates (old records) rows in the lookup cache it uses the primary key
(PK_PRIMARYKEY) values for rows in the cache and the target table. The Integration Service uses the
sequence ID to generate a primary key for the customer that it does not find in the cache. The Integration
Service inserts the primary key value into the lookup cache and it returns the value to the lookup/output
port.

146

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

31. I have the source like


col1 col2
a l
b p
a m
a n
b q
x y
How to get the target data like below
col1 col2
a l,m,n
b p,q
x y

src->sorter->exp->agg->tgt

Sorter:-
select col1 key as sorter

Exp:-
var1=iff(var2=col1,var1||','||col2,col2)
var2=col1
output_port=var1

Agg;-
Select group by col1

32. I have Relational source like his.


JAN FEB MAR APR
100 200 300 400
500 600 700 800
900 100 200 300

I need to convert these rows into columns to the target


MONTH TOTAL
JAN 1500
FEB 900
MAR 1200
APR 1500

Source qualifier --> Normalizer --> Expr --> Agg --> target

Take a normalizer transformation. Create a normalized port named "detail" with occurence 4
. Connect input ports from source qualifier to each detailport in normalizer. Next take an expression
transformation. In that create anoutput port named month. And in expression editor write the
logic as

DECODE (GCID_DETAIL, 1,'JAN', 2,'FEB', 3,'MARCH','APRIL')

In agg group by on month and get sum (numeric value)

147

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

33. Why the UNION TRANSFERMATION is an Active transformation


Reason1: In Union Transformation we may combine the data from two (or) more sources. Assume Table-1
contains '10' rows and Table-2 contains '20' rows. If we combine the rows of Table-1 and Table-2 we will get
a total of '30' rows in the Target. So it is definitely an Active Transformation.

Reason2: That is UNION transformation is derived from custom transformation which is active transformation

34. Write sql query following table

City Gender No

chennai male 40

chennai female 35

bangalore male 25

bangalore female 25

mumbai female 15

I want the required output

City Male Female

chennai40 35

bangalore25 25

mumbai15

Solution1: When want sum of NO of Gender


SELECT CITY,
SUM (DECODE (GENDER,'MALE', NO, 0)) MALE,
SUM (DECODE (GENDER,'FEMALE', NO, 0)) FEMALE
FROM TABLE_NAME GROUP BY CITY

Solution2: When want max of NO of Gender


SELECT CITY,
MAX (DECODE (GENDER,'MALE', NO, 0)) MALE,
MAX (DECODE (GENDER,'FEMALE', NO, 0)) FEMALE

148

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

FROM TABLE_NAME GROUP BY CITY

149

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Solution3:

Agg:
O_MALE= SUM (DECODE
DECODE (GENDER, 'male', NO, 0))
O_FEMALE= SUM (DECODE
DECODE (GENDER, 'female', NO, 0))

35. Write sql query following source table

jan feb mar apr


100 200 300 400
500 600 700 800
900 100 200 300
I want the output format like

month total
jan 1500
feb 900
mar 1200
apr 1500
Solution1:
Using UNION ALL, You can achieve it, here is Your Query

SELECT 'JAN' AS MONTH, SUM (JAN) AS TOTAL FROM SRC_MONTHS


UNION ALL
SELECT 'FEB' AS MONTH, SUM (FEB) AS TOTAL FROM SRC_MONTHS
UNION ALL
SELECT 'MAR' AS MONTH, SUM (MAR) AS TOTAL FROM SRC_MONTHS
UNION ALL
SELECT 'APR' AS MONTH, SUM (APR) AS TOTAL FROM SRC_MONTHS

Solution2:

150

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

36. Write sql query following table amount year quarter

AMOUNTYEAR QUARTER
1000 2003 first
2000 2003 second
3000 2003 third
4000 2003 fourth
5000 2004 first
6000 2004 second
7000 2004 third
8000 2004 fourth

I want the output


YEAR Q1_AMOUNT Q2_AMOUNT Q3_AMOUNT Q4_AMOUNT
2003 1000 2000 3000 4000
2004 5000 6000 7000 8000

Solution1: Using SQL query


SELECT YEAR,
SUM (DECODE (QUARTER,'FIRST', AMOUNT)) Q1_AMOUNT,
SUM (DECODE (QUARTER,'SECOND', AMOUNT)) Q2_AMOUNT,
SUM (DECODE (QUARTER,'THIRD', AMOUNT)) Q3_AMOUNT,
SUM (DECODE (QUARTER,'FOURTH', AMOUNT)) Q4_AMOUNT
FROM TABLE_NAME GROUP BY YEAR
Or
SELECT YEAR,
MAX (DECODE (QUARTER,'FIRST', AMOUNT)) Q1_AMOUNT,
MAX (DECODE (QUARTER,'SECOND', AMOUNT)) Q2_AMOUNT,
MAX (DECODE (QUARTER,'THIRD', AMOUNT)) Q3_AMOUNT,
MAX (DECODE (QUARTER,'FOURTH', AMOUNT)) Q4_AMOUNT
FROM TABLE_NAME GROUP BY YEAR
Solution2: Using mapping

151

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

37. If I have 10 records in my source, if we use router t/r and given the condition as i>2,i=5 and i<2in
the different groups. what is the o/p in the target
Router

I>2 3,5...10 (Tgt1)

I=5 ONLY 5 (Tgt2)

I<2 only 1 (Tgt3)

38. A flat file.dat is having 1 lakh columns. Can I have and excelled file format as a target.
No, A excel sheet can hold having 65536 colums but flat files one lak columns. The only option is to save it in
as .CSV (comma separated value)

39. Write a query to remove null value follwing table?


col1 col2 col3
dinesh null null
null suresh null
null null prakesh

i want the output


col1 col2 col3
dinesh suresh prakesh

SELECT MAX (COL1), MAX (COL2), MAX (COL3) FROM TABLE_NAME

40. write a sql query following table some duplicate present


1
1
2
2
3
3
4
5

152

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

I want the output unique one column duplicate another column following format like

Unique duplicate
1 1
2 2
3 3
4
5

Solution1:Using SQL query


SELECT DISTINCT (DEPTNO) UNIQ, E.DUP FROM EMP
LEFT OUTER JOIN
(SELECT DEPTNO DUP FROM EMP GROUP BY DEPTNO HAVING COUNT
(DEPTNO)>1) E
ON (EMP.DEPTNO=E.DUP)
OR
SELECT DISTINCT (DEPTNO) UNIQ, E.DUP FROM EMP,
(SELECT DEPTNO DUP FROM EMP GROUP BY DEPTNO HAVING COUNT
(DEPTNO)>1) E
Where EMP.DEPTNO=E.DUP (+)

Solution2:Using mapping

Agg:
DEPTNO=GroupBy
O_Count=Count (DEPTNO)

Sorter:
In properties click on distinct

Filter:
Filter Condition=O_Count >1

Joiner:
Condition: Dup_DEPTNO=Uniq_DEPTNO
Join Type=Master Outer Join
In properties click on sorted Input

153

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

41. How will u get 1 and 3rd and 5th records in table what is the query in oracle
Display odd records
Select * from emp where (rowid, 1) in (select
rowid,mod(rownum,2) from emp)

Display even records


Select * from emp where (rowid, 0) in (select
rowid,mod(rownum,2) from emp)

42. How i will stop my workflow after 10 errors


sesseion level property error handling mention condition

stop on errors :10

43. To improve the performance of aggregator we use sorted input option and use sorter t/r before
aggregator. But here we are increasing one more cache in our mapping i.e; sorter. So how can u
convince that you are increasing the performance?
To do aggregation calculations function it needs some time.. We can reduce that time by providing sorted
I/P. The time taken to forward the rows from sorter to aggregator and then to downstream transformations
is less than the time to do aggregation calculations without sorted data

44. In Router transformation I created two groups .


One is Passthrough=> True
Second one is CorrectId’s => Invest>50000
Default
Is there any difference between default group and Passthrough group in this scenario?
Yes there is a diff in this scenario.
The first group (pass through) will pass all the records with all invests
Second group will pass invests > 50000 and
Finally if you want to use the default group instead of first group then,
default group will pass all the records with invest <= 50000

45. I have one flatfile as target in a mapping. When i am trying to load data second time into it.
The records already is in flatfile is getting override. I don't want to override existing records.
Note : we can do this by implementing CDC / Incremental pool logic if target is relational .
But this is flatfile. So, even i use this same technique it will override only
So what is the solution?
Double click on session-->mapping tab-->>Target properties--
>Append if exists (check this option).

154

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Theory Question
1. What is sql override?
Overriding SQL in source qualifier or lookup for additional logic

2. Can we have multiple conditions in a Lookup?


Yes

3. Can we have multiple conditions in a Filter?


Yes

4. How the flags are called in Update strategy?


0 - DD_INSERT, 1- DD_UPDATE, 2- DD_DELETE, 3- DD_REJECT

5. Is it possible to run the session other than Server manager? If so how?


YES USING PMCMD

6. What is diff. Things you can do using PMCMD?


Start, Stop and abort the session

7. What is the use of power plug?


For 3rd party connectors to sap, mainframe, Peoplesoft

8. What kind of Test plan? What kind of validation you do?


In Informatica we create some test SQL to compare the number or records and validate scripts if the data
in the warehouse is loaded for the logic incorporated.

9. What is the usage of unconnected/connected look up?


We use a lookup for connecting to a table in the source or a target. There are 2 ways in which a
lookup can be configured i.e. connected or unconnected

10. What is the difference between Connected and Unconnected Lookups ?


1. Connected Lookup Receives input values directly from the pipeline.
2. Unconnected Lookup Receives input values from the result of a :LKP expression in another
transformation.
3. Connected Lookup You can use a dynamic or static cache.
4. Unconnected Lookup You can use a static cache.
5. Connected Lookup Cache includes all lookup columns used in the mapping (that is, lookup table
columns included in the lookup condition and
6. lookup table columns linked as output ports to other transformations).
7. Unconnected Lookup Cache includes all lookup/output ports in the lookup condition and the
lookup/return port.
8. Connected Lookup Can return multiple columns from the same row or insert into the dynamic
lookup cache.
9. Unconnected Lookup The dynamic lookup cache, Designate one return port (R). Returns one
column from each row.

155

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

10. Connected Lookup If there is no match for the lookup condition, the Informatica Server returns the
default value for all output ports. If you configure dynamic caching, the Informatica Server inserts
rows into the cache.
11. Unconnected Lookup If there is no match for the lookup condition, the Informatica Server returns
NULL.
12. Connected Lookup Pass multiple output values to another transformation. Link lookup/output ports
to another transformation.
13. Unconnected Lookup pass one output value to another transformation. The lookup/output/return
port passes the value to the transformation calling: LKP expression.
14. Connected Lookup Supports user-defined default values.
15. Unconnected Lookup Does not support user-defined default values

16. If u have data coming from diff. sources what transformation will u use in your designer?
Joiner Transformation

17. What are different ports in Informatica?


Input, Output, Variable, Return/Rank, Lookup and Master.

18. What is a Variable port? Why it is used?


Variable port is used to store intermediate results. Variable ports can reference input ports and variable
ports, but not output ports.

19. Diff between Active and passive transformation?


Transformation can be active or passive, active transformation can change the no of records passed
through it, a passive transformation can never change the record count.
Active transformation that might change the record count are advanced ext proc, aggregator, filter,
joiner, normalizer, rank , update strategy, source qualifier, if u use powerconnect to access ERP sources, ERP
source qualifier is also an active transformation
Passive tranf :- lookup, expression, external procedure, seq generator, stored procedure
You can connect only 1 active transformation to the same transformation or target can connect
any number of passive transformations.

20. What are Mapplet?


A mapplet is a reusable object that represents a set of transformations. It allows you to reuse transformation
logic and can contain as many transformations as you need.

21. What is Aggregate transformation?


An aggregator transformation allows you to perform aggregate calculations, such as average and sums.
The Aggregator transformation is unlike the Expression transformation, in that you can use the Aggregator
transformation to perform calculations on groups.

22. What is Router Transformation? How is it different from Filter transformation?


A Router transformation is similar to a Filter transformation because both transformations allow you
to use a condition to test data. A Filter transformation tests data for one condition and drops the
rows of data that do not meet the condition. However, a router transformation tests data for one or
more conditions and gives you the option to route rows of data that do not meet any of the
conditions to default output group.

156

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

23. What are connected and unconnected transformations?


Connected transformations are the transformation, which are in the data flow, whereas unconnected
transformation will not be in the data flow. These are dealt in Lookup and Stored procedure
transformations.

24. What is Normalizer transformation?


Normalizer transformation normalizes records from COBOL and relational sources allowing you to organize
the data according to your needs. A normalizer transformation can appear anywhere in a data flow when
you normalize a relational source.

25. How to use a sequence created in Oracle in Informatica?

By using Stored procedure transformation

26. What are source qualifier transformations?


The source qualifier represents the records that the Informatica Server reads when it runs a session.

27. Significance of Source Qualifier Transformation?


When you add a relational or a flat file source definition to a mapping, you need to connect it to a Source
Qualifier transformation. The Source Qualifier represents the records that the Informatica Server reads when
it runs a session. · To join data originating from the same DB.
· Filter records in the Source itself.
· To specify an outer join instead of a default inner join.
· To specify sorter ports.
· To select distinct values from the source.
· To create a custom query to issue a special select statement for the Informatica server to read
source data. For example, we might use a custom query to perform aggregate calculations or
execute a stored procedure.

28. What are cache and their types in Informatica?


The Informatica server creates index and data cache for aggregator, Rank, joiner and Lookup
transformations in a mapping. The Informatica server stores key values in the index cache and output
values in the data cache.

29. What is an incremental aggregation?


In Incremental aggregation, you apply captured changes in the source to aggregate calculations in a
session. If the source changes only incrementally and you can capture changes, you can configure the
session to process only those changes. This allows the Informatica server to update your target
incrementally, rather than forcing it to process the entire source and recalculate the same calculation
each time you run the session.

30. What is Reject loading?


During a session, the Informatica server creates a reject file for each target instance in the mapping. If the
writer or the target rejects data, the Informatica server writes the rejected row into reject file. The reject file
and session log contain information that helps you determine the cause of the reject. You can correct
reject files and load them to relational targets using the Informatica reject load utility. The reject loader also
creates another reject file for the data that the writer or target reject during the reject loading.

157

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

31. WHAT IS SESSION and BATCHES?


SESSION - A Session Is A set of instructions that tells the Informatica Server How And When To Move Data
From Sources To Targets. After creating the session, we can use either the server manager or the command
line program pmcmd to start or stop the session.
BATCHES - It Provides A Way to Group Sessions For Either Serial Or Parallel Execution By The Informatica
Server.
There Are Two Types OfBatches :
1. SEQUENTIAL - Run Session One after the Other.
2. CONCURRENT - Run Session At The Same Time.

32. What are 2 modes of data movement in Informatica Server?


The data movement mode depends on whether Informatica Server should process single byte or
multi-byte character data. This mode selection can affect the enforcement of code page
relationships and code page validation in the Informatica Client and Server.
a) Unicode –IS allows 2 bytes for each character and uses additional byte for each non-ascii
character (such as Japanese characters)
b) ASCII – IS holds all data in a single byte

33. Whywe use lookup transformations?


Lookup Transformations can access data from relational tables that are not sources in mapping. With
Lookup transformation, we can accomplish the following tasks:
a) Get a related value - Get the Employee Name from the Employee table based on the
Employee ID
b) Perform Calculation
Update slowly changing dimension tables - We can use unconnected lookup transformation to
determine whether the records already exist in the target or not.

34. What are confirmed dimensions?


Confirmed dimensions are linked to multiple fact tables

35. What is Data warehousing?


A DW is a DB used for query,analysis and reporting . By definition DW is a subject oriented, intergrated, non
volatile and time variant
Subject Oriented:- Represents a subject Aread like sales, Mktg
Integrated :- Data Colleted from multiple source systems integrated into a user readable unique format.Ex:-
male, female ,0,1, M,F, T, F
Non Volatile :- Dw stores historical data
Time Variant :- Stores data timewise like weekly,monthly,quarterly, yearly

36. What is a reusable transformation? What is a mapplet . Explain the difference between them
Reusable tranformation:- if u want to create transformation that perform common tasks such as avg sal in a
dept
Mapplet:- Is a reusable object that represents a set of transformations

37. What happens when you use the delete or update or reject or insert statement in your update
strategy?
Inserts:- treats all records as inserts , while inserting if the record violates primary, foreign key or
foreign key in the database it rejects the record

38. Where do you design your mappings?


Designer

158

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

39. Where do you define users and privileges in Informatica?


Repository manager

40. How do you debug the data in Informatica?


Use debugger in designer

41. When you run the session does debugger loads the data to target?
If you select the option discard target data then it will not load to target

42. Can you use flat file and table (relational) as source together?
Yes

43. Suppose I need to separate the data for delete and insert to target depending on the condition,
which transformation you use?
Router or filter

44. What is the difference between lookup Data cache and Index cache.
Index cache:Contains columns used in condition
Data cache: :Contains other output columns than the condition columns.

45. What is an indicator file and how it can be used?


Indicator file is used for Event Based Scheduling when you don’t know when the Source Data is availaible.,
A shell command ,script or a batch file creates and send this indicator file to the directory local to the
Informatica Server. Server waits for the indicator file to appear before running the session.

46. Different Tools in Designer


· Source Analyzer
· Warehouse designer
· Transformation Developer
· Maplet designer
· Mapping designer

47. Components of Informatica


· Designer
· Workflow Manager
· Workflow Monitor

48. Different Tools in Workflow Manager


· Task Developer
· Worklet designer
· Workflow Designer

49. What is overview window? Why it is used?


It’s a window in which you can see all the transformations that are used for a mapping.

50. While using Debugger, how will you find out which transformation is currently running?
The left hand corner of the transformation that has an arrow looks like moving.

51. How do you load the data using Informatica?

159

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Using workflow manager

52. What is a Filter Transformation? or what options you have in Filter Transformation?
The Filter transformation provides the means for filtering records in a mapping. You pass all the rows
from a source transformation through the Filter transformation, then enter a filter condition for the
transformation. All ports in a Filter transformation are input/output and only records that meet the
condition pass through the Filter transformation.

53. What happens to the discarded rows in Filter Transformation?


Discarded rows do not appear in the session log or reject files

54. What are the two programs that communicate with the Informatica Server?
Informatica provides Server Manager and pmcmd programs to communicate with the Informatica
Server:
Server Manager. A client application used to create and manage sessions and batches, and to
monitor and stop the Informatica Server. You can use information provided through the Server
Manager to troubleshoot sessions and improve session performance.
pmcmd. A command-line program that allows you to start and stop sessions and batches, stop the
Informatica Server, and verify if the Informatica Server is running.

55. What you can do with Designer?


The Designer client application provides five tools to help you create mappings:
Source Analyzer. Use to import or create source definitions for flat file, Cobol, ERP, and relational
sources.
Warehouse Designer. Use to import or create target definitions.
Transformation Developer. Use to create reusable transformations.
Mapplet Designer. Use to create mapplets.
Mapping Designer. Use to create mappings.

56. What are different types of Tracing Levels you have in Transformations?
Tracing Levels in Transformations :-
Level Description
Terse Indicates when the Informatica Server initializes the session and its components. Summarizes
session results, but not at the level of individual records. Normal Includes initialization information as
well as error messages and notification of rejected data.
Verbose initialization Includes all information provided with the Normal setting plus more extensive
information about initializing transformations in the session.
Verbose data Includes all information provided with the Verbose initialization setting.
Note: By default, the tracing level for every transformation is Normal.
To add a slight performance boost, you can also set the tracing level to Terse, writing the minimum
of detail to the session log when running a session containing the transformation.

57. What is Mapplet and how do you create Mapplet?


A mapplet is a reusable object that represents a set of transformations. It allows you to reuse transformation
logic and can contain as many transformations as you need. Create a mapplet when you want to use a
standardized set of transformation logic in several mappings. For example, if you have a several fact tables
that require a series of dimension keys, you can create a mapplet containing a series of Lookup

160

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

transformations to find each dimension key. You can then use the mapplet in each fact table mapping,
rather than recreate the same lookup logic in each mapping.

58. If data source is in the form of Excel Spread sheet then how do use?
PowerMart and PowerCenter treat a Microsoft Excel source as a relational database, not a flat file.
Like relational sources, the Designer uses ODBC to import a Microsoft Excel source. You do not
need database permissions to import Microsoft Excel sources.
To import an Excel source definition, you need to complete the following tasks:
· Install the Microsoft Excel ODBC driver on your system.
· Create a Microsoft Excel ODBC data source for each source file in the ODBC 32-bit Administrator.
· Prepare Microsoft Excel spreadsheets by defining ranges and formatting columns of numeric
data.
· Import the source definitions in the Designer.
Once you define ranges and format cells, you can import the ranges in the Designer. Ranges
display as source definitions when you import the source.

59. When do u use connected lookup and when do you use unconnected lookup?
A connected Lookup transformation is part of the mapping data flow. With connected lookups,
you can have multiple return values. That is, you can pass multiple values from the same row in the
lookup table out of the Lookup transformation.
Common uses for connected lookups include:
=> Finding a name based on a number ex. Finding a Dname based on deptno
=> Finding a value based on a range of dates
=> Finding a value based on multiple conditions
Unconnected Lookups : -
An unconnected Lookup transformation exists separate from the data flow in the mapping. You
write an expression using the :LKP reference qualifier to call the lookup within another
transformation.
Some common uses for unconnected lookups include:
=> Testing the results of a lookup in an expression
=> Filtering records based on the lookup results
=> Marking records for update based on the result of a lookup (for example, updating slowly
changing dimension tables)
=> Calling the same lookup multiple times in one mapping

60. How many values it (informatica server) returns when it passes thru Connected Lookup n
Unconncted Lookup?
Connected Lookup can return multiple values where as Unconnected Lookup will return only one
values that is Return Value.

61. What kind of modifications you can do/perform with each Transformation?
Using transformations, you can modify data in the following ways:
----------------- ------------------------
Task Transformation
----------------- ------------------------
Calculate a value Expression
Perform an aggregate calculations Aggregator
Modify text Expression
Filter records Filter, Source Qualifier

161

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Order records queried by the Informatica Server Source Qualifier


Call a stored procedure Stored Procedure
Call a procedure in a shared library or in the External Procedure COM layer of Windows NT
Generate primary keys Sequence Generator
Limit records to a top or bottom range Rank
Normalize records, including those read Normalizerfrom COBOL sources
Look up values Lookup
Determine whether to insert, delete, update, Update Strategy or reject records
Join records from different databases Joiner or flat file systems

62. Expressions in Transformations, Explain briefly how do you use?


Expressions in Transformations
To transform data passing through a transformation, you can write an expression. The most obvious
examples of these are the Expression and Aggregator transformations, which perform calculations
on either single values or an entire range of values within a port. Transformations that use
expressions include the following:
--------------------- ------------------------------------------
Transformation How It Uses Expressions
--------------------- ------------------------------------------
Expression calculates the result of an expression for each row passing through the transformation,
using values from one or more ports.
Aggregator Calculates the result of an aggregate expression, such as a sum or average, based on
all data passing through a port or on groups within that data.
Filter Filters records based on a condition you enter using an expression.
Rank Filters the top or bottom range of records, based on a condition you enter using an
expression.
Update Strategy assigns a numeric code to each record based on an expression, indicating
whether the Informatica Server should insert/update/delete/reject.

63. In case of Flat files (which comes thru FTP as source) has not arrived then what happens
You get a fatal error which cause server to fail/stop the session.

64. What does a load manager do ?


The Load Manager is the primary PowerCenter Server process. It accepts requests from the PowerCenter
Client and from pmcmd. The Load Manager runs and monitors the workflow. It performs the following tasks:
1. Starts the session, creates DTM process and sends pre & post session emails.
2. Manages the session and batch scheduling
3. Locks the session and reads the session properties.
4. Expands the session and server variables and parameters
5. Validates the source and target code pages
6. Verifies the permissions and privileges
7. Creates session log file
8. Creates DTM process which executes the session
9. What is a cache?
Temporary memory area used to store intermediate results. Operations like sorting and grouping requires
cache.

10. What is an Expression transformation?


Expression transformation is used to calculate expressions on a row by row basis. Total_sal = Com * sal

162

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

11. I have two sources S1 having 100 records and S2 having 10000 records, I want to join them, using
joiner transformation. Which of these two sources (S1,S2) should be master to improve my
performance? Why?
S1 should be the master as it contains few records so that the usage of cache can be reduced, S2
should be detail.

12. I have a source and I want to generate sequence numbers using mappings in informatica. But I
don’t want to use sequence generator transformation. Is there any other way to do it?
YES, Use an unconnected lookup to get max key value and there on increment by 1 using an
expression variable OR write a stored procedure and use Stored Procedure Transformation.

13. What is a bad file?


Bad file is the file which contains the data rejected by the writer or target.

14. What is the first column of the bad file?


Record / Row indicator 0,1,2,3
0 – insert -- Rejected by writer/target
1- update -- Rejected by writer/target
2- delete -- Rejected by writer/target
3-reject -- Rejected by writer --- coz update statement. has marked it for reject.

15. What are the contents of the cache directory in the server?
Index cache files and Data caches

16. Is lookup an Active transformation or Passive transformation?


Passive by default and can be configured to be active

17. Is SQL transformation an Active transformation or Passive transformation?


Active by default and can be configured to be passive

18. What is a Mapping?


Mapping represents the data flow between source and target

19. What are the types of transformations?


Passive and active

20. If a sequence generator (with increment of 1) is connected to (say) 3 targets and each target uses
the NEXTVAL port, what value will each target get?
Each target will get the value in multiple of 3

21. Difference between Source Based Commit Vs Target Based Commit


Target Based Commit
During a target-based commit session, the Informatica Server continues to fill the writer buffer after
it reaches the commit interval. When the buffer block is filled, the Informatica Server issues a
commit command. As a result, the amount of data committed at the commit point generally
exceeds the commit interval.
For example, a session is configured with target-based commit interval of 10,000. The writer buffers
fill every 7,500 rows. When the Informatica Server reaches the commit interval
of 10,000, it continues processing data until the writer buffer is filled. The second buffer fills at 15,000
rows, and the Informatica Server issues a commit to the target. If the session completes
successfully, the Informatica Server issues commits after 15,000, 22,500, 30,000, and 40,000 rows.
Source Based Commit

163

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

During a source-based commit session, the Informatica Server commits data to the target based
on the number of rows from an active source in a single pipeline. These rows are referred to as
source rows. A pipeline consists of a source qualifier and all the transformations and targets that
receive data from the source qualifier. An active source can be any of the following active
transformations:
Advanced External Procedure
Source Qualifier
Normalizer
Aggregator
Joiner
Rank
Sorter
Mapplet, if it contains one of the above transformations
Note: Although the Filter, Router, and Update Strategy transformations are active transformations,
the Informatica Server does not use them as active sources in a source-based commit session.

22. Have you used the Abort, Decode functions?


Abort can be used to Abort / stop the session on an error condition. If the primary key column
contains NULL, and you need to stop the session from continuing then you may use ABORT function
in the default value for the port. It can be used with IIF and DECODE function to Abort the session.

23. What do you know about the Informatica server architecture? Load Manager, DTM, Reader, Writer,
Transformer
o Load Manager is the first process started when the session runs. It checks for validity of mappings,
locks sessions and other objects.
o DTM process is started once the Load Manager has completed its job. It starts a thread for each
pipeline.
o Reader scans data from the specified sources.
o Writer manages the target/output data.
o Transformer performs the task specified in the mapping.

24. What are the default values for variables?


String = Null, Number = 0, Date = 1/1/1753

25. How many ways you can filter the records?


1. Source Qualifier
2. Filter transformation
3. Router transformation
4. Ranker
5. Update strategy

26. How do you identify the bottlenecks in Mappings?


Bottlenecks can occur in
1. Targets
The most common performance bottleneck occurs when the informatica server writes to a target
database. You can identify target bottleneck by configuring the session to write to a flat file target.
If the session performance increases significantly when you write to a flat file, you have a target
bottleneck.
Solution :
Drop or Disable index or constraints
Perform bulk load (Ignores Database log)
Increase commit interval (Recovery is compromised)

164

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Tune the database for RBS, Dynamic Extension etc.,

2. Sources
Set a filter transformation after each SQ and see the records are not through.
If the time taken is same then there is a problem.
You can also identify the Source problem by
Read Test Session – where we copy the mapping with sources, SQ and remove all transformations
and connect to file target.
If the performance is same then there is a Source bottleneck.
Using database query – Copy the read query directly from the log. Execute the query against the
source database with a query tool. If the time it takes to execute the query and the time to fetch
the first row are significantly different, then the query can be modified using optimizer hints.
Solutions:
Optimize Queries using hints.
Use indexes wherever possible.

3. Mapping
If both Source and target are OK then problem could be in mapping.
Add a filter transformation before target and if the time is the same then there is a problem.
(OR) Look for the performance monitor in the Sessions property sheet and view the counters.
Solutions:
If High error rows and rows in lookup cache indicate a mapping bottleneck.
Optimize Single Pass Reading

27. How to improve the Session performance?


1 Run concurrent sessions
2 Partition sessions (Power center)
3. Tune Parameter – DTM buffer pool, Buffer block size, Index cache size, data cache size, Commit
Interval, Tracing level (Normal, Terse, Verbose Init, Verbose Data)
The session has memory to hold 83 sources and targets. If it is more, then DTM can be
increased.
The informatica server uses the index and data caches for Aggregate, Rank, Lookup and
Joiner transformation. The server stores the transformed data from the above
transformation in the data cache before returning it to the data flow. It stores group
information for those transformations in index cache.
If the allocated data or index cache is not large enough to store the date, the server
stores the data in a temporary disk file as it processes
the session data. Each time the server pages to the disk the performance slows. This can
be seen from the counters .
Since generally data cache is larger than the index cache, it has to be more than the
index.
4. Remove Staging area
5. Tune off Session recovery
6. Reduce error tracing

28. What are Business components? Where it exists?


It is available in navigator inside the folder.

29. What are Short cuts? Where it is used?


Shortcuts allow you to use metadata across folders without making copies, ensuring uniform
metadata. A shortcut inherits all properties of the object to which it points. Once you create a
shortcut, you can configure the shortcut name and description.

165

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

When the object the shortcut references changes, the shortcut inherits those changes. By using a
shortcut instead of a copy, you ensure each use of the shortcut matches the original object. For
example, if you have a shortcut to a target definition, and you add a column to the definition, the
shortcut inherits the additional column.

· Scenario1
Here is a table with Single Row, in a target table the same row should be populated 10 times.
Using Normalizer, we can do it. Hint : Normalizer / Occurs make it 10 and
Have 10 inputs and a output. You will get 10 rows.
30. While importing the relational source definition from database, what are the metadata of source
you import?
Source name
Database location
Column names
Data types
Key constraints

31. How many ways U can update a relational source definition and what are they?
Two ways
1. Edit the definition
2. Re-import the definition

32. How many ways u create ports?


Two ways
1. Drag the port from another transformation
2. Click the add button on the ports tab.

33. What r the unsupported repository objects for a mapplet?


COBOL source definition
Joiner transformations
Normalizer transformations
Non reusable sequence generator transformations.
Pre or post session stored procedures
Target definitions
Power mart 3.5 style Look Up functions
XML source definitions
IBM MQ source definitions

34. What are the mapping parameters and mapping variables?


Mapping parameter represents a constant value that you can define before running a session. A
mapping parameter retains the same value throughout the entire session.
When you use the mapping parameter in a mapping or maplet, then define the value of
parameter in a parameter file for the session.
Unlike a mapping parameter, a mapping variable represents a value that can change throughout
the session. The informatica server saves the value of mapping variable to the repository at the end
of session run and uses that value next time you run the session.

35. Can you use the mapping parameters or variables created in one mapping into another mapping?
NO.
We can use mapping parameters or variables in any transformation of the same mapping or
mapplet in which U have created mapping parameters or variables.

166

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

36. Can u use the mapping parameters or variables created in one mapping into any other reusable
transformation?
Yes. Because reusable transformation is not contained with any maplet or mapping.

37. How can U improve session performance in aggregator transformation?


Use sorted input.

38. What is the difference between joiner transformation and source qualifier transformation?
U can join heterogeneous data sources in joiner transformation which we cannot achieve in
source qualifier transformation.
U need matching keys to join two relational sources in source qualifier transformation. Whereas u
doesn't need matching keys to join
two sources.
Two relational sources should come from same data source in sourcequalifier. U can join relational
sources which are coming from different sources also.
39. In which conditions we can/cannot use joiner transformation(Limitations of joiner transformation)?
Ideally in joiner transformation, below are the limitations
1. Both input pipelines originate from the same Source Qualifier transformation.
2. Both input pipelines originate from the same Normalizer transformation.
3. Both input pipelines originate from the same Joiner transformation.
4. Either input pip`elines contains an Update Strategy transformation.
5. Either input pipelines contains a connected or unconnected Sequence Generator
transformation.
But you can join data using joiner from single pipeline by selecting sorted input option in Joiner
transformation.

6. What are the settings that you use to configure the joiner transformation?
Master and detail source
Type of join
Condition of the join

7. What are the join types in joiner transformation? Normal (Default)


Master outer
Detail outer
Full outer

8. How the informatica server sorts the string values in Rank transformation?
When the informatica server runs in the ASCII data movement mode it sorts session data using
Binary sort order. If you configure the session to use a binary sort order, the informatica server
calculates the binary value of each string and returns the specified number of rows with the
highest binary values for the string.

9. What is the Rank index in Rank transformation?


The Designer automatically creates a RANKINDEX port for each Rank transformation. The
Informatica Server uses the Rank Index port to store the ranking position for each record in a group.
For example, if you create a Rank transformation that ranks the top 5 salespersons for each
quarter, the rank index numbers the salespeople from 1 to 5.

10. What is the Router transformation?


Input group
Output group

167

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

A Router transformation is similar to a Filter transformation because both transformations allow you
to use a condition to test data.
However, a Filter transformation tests data for one condition and drops the rows of data that do
not meet the condition. A Router transformation tests data for one or more conditions and gives
you the option to route rows of data that do not meet any of the conditions to a default output
group.
If you need to test the same input data based on multiple conditions, use a Router Transformation
in a mapping instead of creating multiple Filter transformations to perform the same task.

11. What are the types of groups in Router transformation?


The designer copies property information from the input ports of the input group to create a set of
output ports for each output group.
Two types of output groups
User defined groups
Default group
U cannot modify or delete default groups.

12. Why we use stored procedure transformation?


For populating and maintaining data bases.
1. To perform calculation: There will be many well tested calculations which we implement using
expression. Instead of using expression we can use stored procedure to store these calculations
and then use them by using connected or unconnected stored procedure transformation
2. Dropping and recreating indexes: Whenever we have huge number of record to be loaded to
target its better to drop the existing indexes and recreate it.For dropping and recreation of
indexes we can make use of connected or unconnected stored procedure transformation
3. Check the status of a target table before loading data into it.
4. To check the space left in Database

5. What are the types of data that passes between informatica server and stored procedure?
3 types of data
Input/Out put parameters
Return Values
Status code.

6. What is the status code?


Status code provides error handling for the informatica server during the session. The stored
procedure issues a status code that notifies whether or not stored procedure completed
successfully. This value cannot be seen by the user. It only used by the informatica server to
determine whether to continue running the session or stop.

7. What are the tasks that source qualifier performs?


Join data originating from same source data base.
Filter records when the informatica server reads source data.
Specify an outer join rather than the default inner join specify sorted records.
Select only distinct values from the source.
Creating custom query to issue a special SELECT statement for the informatica server to read
source data.

8. What is the default join that source qualifier provides?


Inner equi join.

9. What are the basic needs to join two sources in a source qualifier?

168

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Two sources should have primary and foreign key relationships.


Two sources should have matching data types.

10. What is update strategy transformation?


Flagging rows within a mapping.
Within a mapping, we use the Update Strategy transformation to flag rows for insert, delete,
update, or reject.

Operation Constant Numeric Value


INSERT DD_INSERT 0

UPDATE DD_UPDATE 1

DELETE DD_DELETE 2

REJECT DD_REJECT 3

11. Describe two levels in which update strategy transformation sets?


Within a session: When you configure a session, you can instruct the Informatica Server to either
treat all records in the same way (for example, treat all records as inserts), or use instructions coded
into the session mapping to flag records for different database operations.
Within a mapping: Within a mapping, you use the Update Strategy transformation to flag records
for insert, delete, update, or reject.

12. What is the default source option for update strategy transformation?
Data driven

13. What is Data driven?


The informatica server follows instructions coded into update strategy transformations with in the
session mapping determine how to flag records for insert, update, delete or reject.
If you do not choose data driven option setting, the informatica server ignores all update strategy
transformations in the mapping.

14. What are the options in the target session of update strategy transformation?
Insert
Delete
Update
Update as update
Update as insert
Update else insert
Truncate table

15. What are the types of mapping wizards that are to be provided in Informatica?
The Designer provides two mapping wizards to help you create mappings quickly and easily. Both
wizards are designed to create mappings for loading and maintaining star schemas, a series of
dimensions related to a central fact table.
169

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Getting Started Wizard: Creates mappings to load static fact and dimension tables, as well as
slowly growing dimension tables.
Slowly Changing Dimensions Wizard:. Creates mappings to load slowly changing dimension tables
based on the amount of historical dimension data you want to keep and the method you choose
to handle historical dimension data.

16. What are the types of mapping in Getting Started Wizard?


Simple Pass through mapping :
Loads a static fact or dimension table by inserting all rows. Use this mapping when you want to
drop all existing data from your table before loading new data.
Slowly Growing target :
Loads a slowly growing fact or dimension table by inserting new rows. Use this mapping to load
new data when existing data does not require updates.

17. What are the mappings that we use for slowly changing dimension table?
Type1: Rows containing changes to existing dimensions are updated in the target by overwriting
the existing dimension. In the Type 1 Dimension mapping, all rows contain current dimension data.
Use the Type 1 Dimension mapping to update a slowly changing dimension table when you do not
need to keep any previous versions of dimensions in the table.
Type 2: The Type 2 Dimension Data mapping inserts both new and changed dimensions into the
target. Changes are tracked in the target table by versioning the primary key and creating a
version number for each dimension in the table.
Use the Type 2 Dimension/Version Data mapping to update a slowly changing dimension table
when you want to keep a full history of dimension data in the table. Version numbers and
versioned primary keys track the order of changes to each dimension.
Type 3: The Type 3 Dimension mapping filters source rows based on user-defined comparisons and
inserts only those found to be new dimensions to the target. Rows containing changes to existing
dimensions are updated in the target. When updating an existing dimension, the Informatica
Server saves existing data in different columns of the same row and replaces the existing data with
the updates

18. What are the different types of Type2 dimension mapping?


Type2 Dimension/Version Data Mapping: In this mapping the updated dimension in the source will
gets inserted in target along with a new version number. And newly added dimension in source will
insert into target with a primary key.
Type2 Dimension/Flag current Mapping: This mapping is also used for slowly changing dimensions.
In addition it creates a flag value for changed or new dimension. Flag indicates the dimension is
new or newly updated. Recent dimensions will gets saved with current flag value 1. And updated
dimensions are saved with the value 0.
Type2 Dimension/Effective Date Range Mapping: This is also one flavor of Type2 mapping used for
slowly changing dimensions. This mapping also inserts both new and changed dimensions in to the
target. And changes r tracked by the effective date range for each version of each dimension.

19. How can u recognize whether or not the newly added rows in the source r gets insert in the target?
In the Type2 mapping we have three options to recognize the newly added rows
Version number
Flag value
Effective date Range

20. What r two types of processes that informatica runs the session?
Load manager Process: Starts the session, creates the DTM process, and sends post-session email
when the session completes.

170

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

The DTM process. Creates threads to initialize the session, read, write, and transform data, and
handle pre- and post-session operations.

21. Can u generate reports in Informatica?


Yes. By using Metadata reporter we can generate reports in informatica.

22. What is metadata reporter?


It is a web based application that enables you to run reports against repository metadata. With a
meta data reporter, u can access information about Ur repository without having knowledge of sql,
transformation language or underlying tables in the repository.

23. Define mapping and sessions?


Mapping: It is a set of source and target definitions linked by transformation objects that define the
rules for transformation.
Session: It is a set of instructions that describe how and when to move data from source to targets.

24. Which tool U use to create and manage sessions and batches and to monitor and stop the
informatica server?
Informatica Workflow manager and monitor

25. Why we use partitioning the session in informatica?


Partitioning achieves the session performance by reducing the time period of reading the source
and loading the data into target.

26. To achieve the session partition what r the necessary tasks u have to do?
Configure the session to partition source data.
Install the informatica server on a machine with multiple CPU's.

27. How the informatica server increases the session performance through partitioning the source?
For relational sources informatica server creates multiple connections for each partition of a single
source and extracts separate range of data for each connection. Informatica server reads
multiple partitions of a single source concurrently; each partition is associated to a thread. Similarly
for loading also informatica server creates multiple connections to the target and loads partitions
of data concurrently.
For XML and file sources, informatica server reads multiple files concurrently. For loading the data
informatica server creates a separate file for each partition(of a source file).U can choose to
merge the targets.

28. Why u use repository connectivity?


When u edit, schedule the session each time, informatica server directly communicates the
repository to check whether or not the session and users r valid. All the metadata of sessions and
mappings will be stored in repository.

29. What are the tasks that Load manger process will do?
Manages the session and batch scheduling: When u start the informatica server the load manager
launches and queries the repository for a list of sessions configured to run on the informatica server.
When u configure the session the load manager maintains list of list of sessions and session start
times. When u start a session load manger fetches the session information from the repository to
perform the validations
and verifications prior to starting DTM process
Locking and reading the session: When the informatica server starts a session load manager locks
the session from the repository. Locking prevents U starting the session again and again.

171

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Reading the parameter file: If the session uses a parameter files, load manager reads the
parameter file and verifies that the session level parameters are declared in the file
Verifies permission and privileges: When the session starts load manger checks whether or not the
user have privileges to run the session.

30. What is DTM process?


After the load manger performs validations for session, it creates the DTM process. DTM is to create
and manage the threads that carry out the session tasks. I creates the master thread. Master
thread creates and manages all the other threads.

31. What r the different threads in DTM process?


Master thread: Creates and manages all other threads
Mapping thread: One mapping thread will be creates for each session. Fetches session and
mapping information.
Pre and post session threads: This will be created to perform pre and post session operations.
Reader thread: One thread will be created for each partition of a source. It reads data from
source.
Writer thread: It will be created to load data to the target.
Transformation thread: It will be created to transform data.

32. What r the data movement modes in informatica?


Data movement modes determines how informatica server handles the character data. U choose
the data movement in the informatica server configuration settings. Two types of data movement
modes available in informatica.
ASCII mode
Uni code mode.

33. What r the output files that the informatica server creates during the session running?
Informatica server log: Informatica server(on UNIX) creates a log for all status and error
messages(default name: pm.server.log).It also creates an error log for error messages. These files
will be created in informatica home directory.
Session log file: Informatica server creates session log file for each session. It writes information
about session into log files such as initialization process, creation of sql commands for reader and
writer threads, errors encountered and load summary. The amount of detail in session log file
depends on the tracing level that u set.
Session detail file: This file contains load statistics for each target in mapping. Session detail include
information such as table name, number of rows written or rejected. U can view this file by double
clicking on the session in monitor window
Performance detail file: This file contains information known as session performance details which
helps U where performance can be improved. To generate this file select the performance detail
option in the session property sheet.
Reject file: This file contains the rows of data that the writer does not write to targets.
Control file: Informatica server creates control file and a target file when U run a session that uses
the external loader. The control file contains the information Post session email: Post session email
allows U to automatically communicate information about a session run to designated recipients. U
can create two different Indicator file: If u use the flat file as a target, U can configure the
informatica server to create indicator file. For each target row, the indicator file contains output
file: If session writes to a target file, the informatica server creates the target file based on file
properties entered in the session property sheet.
Cache files: When the informatica server creates memory cache it also creates cache files. For the
following circumstances informatica server creates index Aggregator transformation

172

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Joiner transformation
Rank transformation
Lookup transformation

34. In which circumstances that informatica server creates Reject files?


When it encounters the DD_Reject in update strategy transformation.
Violates database constraint
Filed in the rows was truncated or overflowed.

35. What is polling?


It displays the updated information about the session in the monitor window. The monitor window
displays the status of each session when U poll the informatica server

36. Can u copy the session to a different folder or repository?


Yes. By using copy session wizard u can copy a session in a different folder or repository. But that
target folder or repository should consists of mapping of that session. If target folder or repository is
not having the mapping of copying session , u should have to copy that mapping first before u
copy the session.

37. What is batch and describe about types of batches?


Grouping of session is known as batch. Batches r two types
Sequential: Runs sessions one after the other
Concurrent: Runs session at same time.
If u have sessions with source-target dependencies u have to go for sequential batch to start the
sessions one after another. If u have several independent sessions u can use concurrent batches.
Which runs all the sessions at the same time.

38. Can u copy the batches?


NO

39. How many number of sessions that u can create in a batch?


Any number of sessions.

40. When the informatica server marks that a batch is failed?


If one of session is configured to "run if previous completes" and that previous session fails.

41. What is a command that used to run a batch?


pmcmd is used to start a batch.

42. What are the different options used to configure the sequential batches?
Two options
Run the session only if previous session completes successfully.
Always runs the session.

43. In a sequential batch can u run the session if previous session fails?
Yes. By setting the option always runs the session.

44. Can u start batches with in a batch?


You cannot. If u want to start batch that resides in a batch, create a new independent batch and
copy the necessary sessions into the new batch.

45. Can u start a session inside a batch individually?

173

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

We can start our required session only in case of sequential batch. in case of concurrent batch we
can’t do like this.
46. How can u stop a batch?
By using workflow monitor or pmcmd or forcefully cancel the.

47. What are the session parameters?


Session parameters are like mapping parameters, represent values U might want to change
between sessions such as database connections or source files.
Server manager also allows you to create user defined session parameters. Following are user
defined session parameters.
1. Database connections
2. Source file names: use this parameter when u want to change the name or location of session
source file between session runs
3. Target file name: Use this parameter when u want to change the name or location of session
target file between session runs.
4. Reject file name: Use this parameter when u want to change the name or location of session reject
files between session runs.

5. What is parameter file?


Parameter file is to define the values for parameters and variables used in a session. A parameter
file is a file created by text editor such as word pad or notepad. You can define the following
values in parameter file
Mapping parameters
Mapping variables
Session parameters

6. How can u access the remote source into your session?


Relational source: To access relational source which is situated in a remote place, you need to
configure database connection to the data source.
File Source : To access the remote source file you must configure the FTP connection to the host
machine before u create the session.
Heterogeneous : When your mapping contains more than one source type, the server manager
creates
a heterogeneous session that displays source options for all types.

7. What is difference between portioning of relational target and partitioning of file targets?
If u partition a session with a relational target informatica server creates multiple connections to the
target database to write target data concurrently. If u partition a session with a file target the
informatica server creates one target file for each partition. U can configure session properties to
merge these target files.

8. What are the transformations that restrict the partitioning of sessions?


Advanced External procedure transformation and External procedure transformation: This
transformation contains a check box on the properties tab to allow partitioning.
Aggregator Transformation: If you use sorted ports you cannot partition the associated source
Joiner Transformation: U cannot partition the master source for a joiner transformation
Normalizer Transformation
XML targets.

9. What is performance tuning in Informatica?


The goal of performance tuning is to optimize session performance so that the sessions run during
the available load window for the Informatica Server. Increase the session performance by

174

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

following.
Network: The performance of the Informatica Server is related to network connections. Data
generally moves across a network at less than 1 MB per second, whereas a local disk moves data
five to twenty times faster. Thus network connections often affect on session performance. So avoid
network connections.
Flat files: If your flat files stored on a machine other than the informatica server, move those files to
the machine that consists of informatica server.
Less Connections: Minimize the connections to sources ,targets and informatica server to improve
session performance. Moving target database into server system may improve session
performance.
Staging areas: If you use staging areas you force informatica server to perform multiple data
passes. Removing of staging areas may improve session performance. Use staging area only when
its mandatory
Informatica Servers: You can run the multiple informatica servers against the same repository.
Distributing the session load to multiple informatica servers may improve session performance.
Run the informatica server in ASCII data movement mode improves the session performance.
Because ASCII data movement mode stores a character value in one byte. Unicode mode takes 2
bytes to store a character.
Source qualifier: If a session joins multiple source tables in one Source Qualifier, optimizing the query
may improve performance. Also, single table select statements with an ORDER BY or GROUP BY
clause may benefit from optimization such as adding indexes.
Drop constraints: If target consists of key constraints and indexes it slows the loading of data. To
improve the session performance in this case drop constraints and indexes before we run the
session (while loading facts and dimensions) and rebuild them after completion of session.
Parallel sessions: Running parallel sessions by using concurrent batches will also reduce the time of
loading the data. So concurrent batches may also increase the session performance.
Partitioning: the session improves the session performance by creating multiple connections to
sources and targets and loads data in parallel pipe lines.
Incremental Aggregation: In some cases if a session contains an aggregator transformation, you
can use incremental aggregation to improve session performance.
Transformation Errors: Avoid transformation errors to improve the session performance. Before
saving the mapping validate it and see and if any transformation errors rectify it.
Lookup Transformations: If the session contained lookup transformation you can improve the
session performance by enabling the look up cache.The cache improves the speed by saving the
previous data and hence no need to load that again
Filter Transformations: If your session contains filter transformation, create that filter transformation
nearer to the sources or you can use filter condition in source qualifier.
Group transformations: Aggregator, Rank and joiner transformation may often decrease the session
performance .Because they must group data before processing it. To improve session performance
in this case use sorted ports option ie sort the data before using the transformation.
Packet size: We can improve the session performance by configuring the network packet size,
which allows data to cross the network at one time. To do this go to server manger, choose server
configure database connections.

10. Define informatica repository?


The Informatica repository is a relational database that stores information, or metadata, used by
the Informatica Server and Client tools.
Metadata can include information such as mappings describing how to transform source data,
sessions indicating when you want the Informatica Server to perform the transformations, and
connect strings for sources and targets.
The repository also stores administrative information such as usernames and passwords, permissions
and privileges, and product version.

175

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Use repository manager to create the repository. The Repository Manager connects to the
repository database and runs the code needed to create the repository tables. These tables stores
metadata in specific format the informatica server, client tools use.

11. What are the types of metadata that stores in repository?


Following r the types of metadata that stores in the repository
Database connections
Global objects
Mappings
Mapplets
Multidimensional metadata
Reusable transformations
Sessions and batches
Short cuts
Source definitions
Target definitions
Transformations

12. What is incremental aggregation?


When using incremental aggregation, you apply captured changes in the source to aggregate
calculations in a session. If the source changes only incrementally and you can capture changes,
you can configure the session to process only those changes. This allows the Informatica Server to
update your target incrementally, rather than forcing it to process the entire source and
recalculate the same calculations each time you run the session.

13. What are the scheduling options to run a session?


You can schedule a session to run at a given time or interval, or u can manually run the session.
Different options of scheduling
Run only on demand: Informatica server runs the session only when user starts session explicitly
Run once: Informatica server runs the session only once at a specified date and time.
Run every: Informatica server runs the session at regular intervals as u configured.
Customized repeat: Informatica server runs the session at the dates and times specified in the
repeat dialog box.

14. What is tracing level and what r the types of tracing level?
Tracing level represents the amount of information that informatica server writes in a log file.
Types of tracing level
Normal
Verbose
Verbose init
Verbose data

15. What is difference between stored procedure transformation and external procedure
transformation?
In case of stored procedure transformation procedure will be compiled and executed in a
relational data source. U need database connection to import the stored procedure in to your
mapping. Where as in external procedure transformation procedure or function will be executed
outside of data source. ie u need to make it as a DLL to access in u r mapping. No need to have
data base connection in case of external procedure transformation.

16. Explain about Recovering sessions?

176

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

If you stop a session or if an error causes a session to stop, refer to the session and error logs to
determine the cause of failure. Correct the errors, and then complete the session. The method you
use to complete the session depends on the properties of the mapping, session, and Informatica
Server configuration.
Use one of the following methods to complete the session:
· Run the session again if the Informatica Server has not issued a commit.
· Truncate the target tables and run the session again if the session is not recoverable.
· Consider performing recovery if the Informatica Server has issued at least one commit.

17. If a session fails after loading of 10,000 records in to the target. How can u load the records from
10001st record when u run the session next time?
As explained above informatica server has 3 methods to recovering the sessions. Use performing
recovery to load the records from where the session fails.

18. Explain about perform recovery?


When the Informatica Server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and
notes the row ID of the last row committed to the target database. The Informatica Server then
reads all sources again and starts processing from the next row ID.
For example, if the Informatica Server commits 10,000 rows before the session fails, when you run
recovery, the Informatica Server bypasses the rows up to 10,000 and starts loading with row 10,001.
By default, Perform Recovery is disabled in the Informatica Server setup. You must enable Recovery
in the Informatica Server setup before you run a session so the Informatica Server can create
and/or write entries in the OPB_SRVR_RECOVERY table.

19. How to recover the standalone session?


A standalone session is a session that is not nested in a batch. If a standalone session fails, you can
run recovery using a menu command or pmcmd. These options are not available for batched
sessions.
To recover sessions using the menu:
1. In the Server Manager, highlight the session you want to recover.
2. Select Server Requests-Stop from the menu.
3. With the failed session highlighted, select Server Requests-Start Session in Recovery Mode from
the menu.
To recover sessions using pmcmd:
1.From the command line, stop the session.
2. From the command line, start recovery.

20. How can u recover the session in sequential batches?


If you configure a session in a sequential batch to stop on failure, you can run recovery starting
with the failed session. The Informatica Server completes the session and then runs the rest of the
batch. Use the Perform Recovery session property To recover sessions in sequential batches
configured to stop on failure:
1.In the Server Manager, open the session property sheet.
2.On the Log Files tab, select Perform Recovery, and click OK.
3.Run the session.
4.After the batch completes, open the session property sheet.
5.Clear Perform Recovery, and click OK.
If you do not clear Perform Recovery, the next time you run the session, the Informatica Server
attempts to recover the previous session.
If you do not configure a session in a sequential batch to stop on failure, and the remaining
sessions in the batch complete, recover the failed session as a standalone session.

177

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

21. How to recover sessions in concurrent batches?


If multiple sessions in a concurrent batch fail, you might want to truncate all targets and run the
batch again. However, if a session in a concurrent batch fails and the rest of the sessions complete
successfully, you can recover the session as a standalone session.
To recover a session in a concurrent batch:
1.Copy the failed session using Operations-Copy Session.
2.Drag the copied session outside the batch to be a standalone session.
3.Follow the steps to recover a standalone session.
4.Delete the standalone copy.

22. How can u complete unrecoverable sessions?


Under certain circumstances, when a session does not complete, you need to truncate the target
tables and run the session from the beginning. Run the session from the beginning when the
Informatica Server cannot run recovery or when running recovery might result in inconsistent data.

23. What are the circumstances that informatica server results an unrecoverable session?
The source qualifier transformation does not use sorted ports.
If u change the partition information after the initial session fails.
Perform recovery is disabled in the informatica server configuration.
If the sources or targets changes after initial session fails.
If the mapping consists of sequence generator or normalizer transformation.
If a concurrent batch contains multiple failed sessions.

24. If I have done any modifications for my table in back end does it reflect in informatica warehouse
or mapping designer or source analyzer?
NO. Informatica is not at all concern with back end data base. It displays u all the information that
is to be stored in repository. If want to reflect back end changes to informatica screens, again u
have to import from back end to informatica by valid connection. And u have to replace the
existing files with imported files.

25. After dragging the ports of three sources(sql server,oracle,informix) to a single source qualifier, can
u map these three ports directly to target?
NO. Unless and until u join those three ports in source qualifier u cannot map them directly.

26. Informatica Server Variables


1. $PMRootDir
2. $PMSessionLogDir
3. $PMBadFileDir
4. $PMCacheDir
5. $PMTargetFileDir
6. $PMSourceFileDir
7. $PMExtProcDir
8. $PMTempDir
9. $PMSuccessEmailUser
10. $PMFailureEmailUser
11. $PMSessionLogCount
12. $PMSessionErrorThreshold
13. $PMWorkflowLogDir
14. $PMWorkflowLogCount

178

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

27. what are the main issues while working with flat files as source and as targets ?
We need to specify correct path in the session and mension either that file is 'direct' or 'indirect'.
Keep that file in exact path which you have specified in the session.
1. We cannot use SQL override. We have to use transformations for all our requirements
2. Testing the flat files is a very tedious job
3. The file format (source/target definition) should match exactly with the format of data file. Most
of the time erroneous
result come when the data file layout is not in sync with the actual file.
(i) Your data file may be fixed width but the definition is delimited----> truncated data
(ii) Your data file as well as definition is delimited but specifying a wrong delimiter
(a) a delimitor other than present inactual file or
(b) a delimiter that comes as a character in some field of the file--->wrong data again
(iii) Not specifying NULL character properly may result in wrong data
(iv) there are other settings/attributes while creating file definition which one should be very careful
4. If you miss link to any column of the target then all the data will be placed in wrong fields. That
missed column won’t exist in the target data file.

28. Explain about Informatica server process that how it works relates to mapping variables?
Informatica primarily uses load manager and data transformation manager(dtm) to perform
extracting transformation and loading. Load manager reads parameters and a variable related to
session mapping and server and passes the mapping parameters and variable information to the
DTM.DTM uses this information to perform the data movement from source to target
The PowerCenter Server holds two different values for a mapping variable during a session run:
1. Start value of a mapping variable
2. Current value of a mapping variable
Start Value
The start value is the value of the variable at the start of the session. The start value could be a
value defined in the parameter file for the variable a value saved in the repository from the
previous run of the session a user defined initial value for the variable or the default value based on
the variable datatype.
The PowerCenter Server looks for the start value in the following order:
1. Value in parameter file
2. Value saved in the repository
3. Initial value
4. Default value
Current Value
The current value is the value of the variable as the session progresses. When a session starts the
current value of a variable is the same as the start value. As the session progresses the
PowerCenter Server calculates the current value using a variable function that you set for the
variable. Unlike the start value of a mapping variable the current value can change as the
PowerCenter Server evaluates the current value of a variable as each row passes through the
mapping.

3. A query to retrieve the latest records from the table sorted by version(scd).
Select* from dimension a
where a.version (select max(b.version) from dimension b where a.dno b.dno);

select * from dimension where sysdate between begin_effective_date and end_effective_date;

179

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

4. Which one is better performance wise joiner or lookup


Are you lookuping flat file or database table? Generally sorted joiner is more effective on flat files
than lookup because sorted joiner uses merge join and cashes less rows. Lookup caches always
the whole file. If the file is not sorted it can be comparable. Lookups into database table can be
effective if the database can return sorted data fast and the amount of data is small because
lookup can create whole cash in memory. If database responses slowly or big amount of data are
processed lookup cache initialization can be really slow (lookup waits for database and stores
cashed data on discs).
Then it can be better use sorted joiner which throws data to output as reads them on input.

5. How many types of sessions are there in informatica


Reusable & nonusable session
Session is a type of workflow task and set of instructions that describe how to move Data from
Source to targets using a mapping
The sessions in Informatica can be configured as
1. Sequential: When Data moves one after another from source to target it is sequential
2.Concurrent: When whole data moves simultaneously from source to target it is Concurrent

6. How can we remove/optmize source bottlenecks using "query hints"


Create indexes for source table colums
First u must have proper indexes and the table must be analyzed to gather stats to use the cbo.
Use the hints after and it is powerful so be careful with the hints.

7. What is target load order ?


In a mapping if there are more than one target table then we need to give in which order the
target tables should be loaded
Example: suppose in our mapping there are 2 pipelines to load 2 target table
1. Customer
2. Audit table
First customer table should be populated than Audit table for that we use target load order

8. How did you handle errors?(ETL-Row-Errors)


If there is an error comes it stored it on target_table.bad file.
The error are in two type
1. row-based errors
2. column based errors
column based errors identified by
D-GOOD DATA N-NULL DATA O-OVERFLOW DATA R-REJECTED DATA
the data stored in .bad file
D1232234O877NDDDN23

9. What is the event-based scheduling?


In time based scheduling the jobs run at the specified time. In some situations we've to run a job
based on some events like if a file arrives then only the job has to run whatever the time it is. In such
cases event based scheduling is used.
.
10. What Bulk & Normal load? Where we use Bulk and where Normal?
When we try to load data in bulk mode there will be no entry in database log files so it will be
tough to recover data if session got failed at some point. where as in case of normal mode entry of
every record will be with database log file and with the informatica repository. so if the session got
failed it will be easy for us to start data from last committed point.

180

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

Bulk mode is very fast comparatively with normal mode.


Bulk mode is used for Oracle/SQLserver/Sybase. This mode improves performance by not writing to
the database log.

11. What is CDC?


Changed Data Capture (CDC) helps identify the data in the source system that has changed
since the last extraction.
With CDC data extraction takes place at the same time the insert update or delete operations
occur in the source tables and the change data is stored inside the database in change tables.
The change data thus captured is then made available to the target systems in a controlled
manner.

12. How do we do unit testing in informatica?how do we load data in informatica ?


Unit testing are of two types
1. Quantitaive testing
2.Qualitative testing
Steps.
1.First validate the mapping
2.Create session on themapping and then run workflow.
Once the session is succeeded the right click on session and go for statistics tab.
There you can see how many number of source rows are applied and how many number of rows
loaded in to targets and how many number of rows rejected.This is called Quantitative testing. If
once rows are successfully loaded then we will go for qualitative testing.
Steps
1.Take the DATM(DATM means where all business rules are mentioned to the corresponding source
columns) and check whether the data is loaded according to the DATM in to target table.If any
data is not loaded according to the DATM then go and check in the code and rectify it.

13. How can we store previous session logs


Just run the session in time stamp mode then automatically session log will not overwrite current
session log.
We can do this way also. using $PMSessionlogcount(specify the number of runs of the session log to
save)
Go to Session-->right click -->Select Edit Task then Goto -->Config Object then set the property
Save Session Log By --Runs
Save Session Log for These Runs --->To Number of Historical Session logs you want

GOOD LINKS

http://stackoverflow.com/questions/tagged/informatica
http://www.itnirvanas.com/2009/01/informatica-interview-questions-part-1.html
http://gogates.blogspot.in/2011/05/informatica-interview-questions.html
https://community.informatica.com/thread/38970
http://shan-informatica.blogspot.in/
181

@..Shashank Nikam..@
Tech-Zealous
Software
Solutions

http://www.info-etl.com/course-materials/informatica-powercenter-development-best-practices
http://informaticaramamohanreddy.blogspot.in/2012/08/final-interview-questions-etl.html
http://informaticaconcepts.blogspot.in/search/label/ScenaroBasedQuestions
http://baluinfomaticastar.blogspot.in/2011/06/dwh-material-with-informatica-material.html

182

@..Shashank Nikam..@

You might also like