Informatica Handbook
Informatica Handbook
Informatica Handbook
Software
Solutions
ORACLE STATEMENTS
Data Definition Language (DDL)
Create
Alter
Drop
Truncate
Syntaxes:
CREATE DATABASE LINK CAASEDW CONNECT TO ITO_ASA IDENTIFIED BY exact123 USING 'CAASEDW’
Case Statement:
Select NAME,
(CASE
WHEN (CLASS_CODE = 'Subscription')
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
THEN ATTRIBUTE_CATEGORY
ELSE TASK_TYPE
END) TASK_TYPE,
CURRENCY_CODE
From EMP
Decode ()
Select empname, Decode (address,’HYD’,’Hyderabad’,
‘Bang’, Bangalore’, address) as address from emp;
Procedure:
CREATE OR REPLACE PROCEDURE Update_bal (
cust_id_IN In NUMBER,
amount_IN In NUMBER DEFAULT 1) AS
BEGIN
Update account_tbl Set amount= amount_IN where cust_id= cust_id_IN
End
Trigger:
CREATE OR REPLACE TRIGGER EMP_AUR
AFTER/BEFORE UPDATE ON EMP
REFERENCING
NEW AS NEW
OLD AS OLD
FOR EACH ROW
DECLARE
BEGIN
IF (:NEW.last_upd_tmst <> :OLD.last_upd_tmst) THEN
-- Insert into Control table record
Insert into table emp_w values('wrk',sysdate)
ELSE
-- Exec procedure
Exec update_sysdate()
END;
ORACLE JOINS:
1. Equi join
2. Non-equi join
3. Self join
4. Natural join
5. Cross join
6. Outer join
1. Left outer
2
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
2. Right outer
3. Full outer
A join which contains an operator other than ‘=’ in the joins condition.
Ex: SQL> select empno,ename,job,dname,loc from emp e,dept d where e.deptno >d.deptno;
OR
Ex: SQL> select e.ename,e.sal,s.salgrade from emp e ,grade s where e.sal between s.losal and s.hisal
Self Join
Outer join gives the non-matching records along with matching records.
Left Outer Join
This will display the all matching records and the records which are in left hand side table those that are
not in right hand side table.
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Ex: SQL> select empno,ename,job,dname,loc from emp e left outer join dept d
on(e.deptno=d.deptno);
OR
SQL> select empno,ename,job,dname,loc from emp e,dept d where e.deptno=d.deptno(+);
This will display the all matching records and the records which are in right hand side table those that are
not in left hand side table.
Ex: SQL> select empno,ename,job,dname,loc from emp e right outer join dept d
on(e.deptno=d.deptno);
OR
SQL> select empno,ename,job,dname,loc from emp e,dept d where e.deptno(+) = d.deptno;
This will display the all matching records and the non-matching records from both tables.
Ex: SQL> select empno,ename,job,dname,loc from emp e full outer join dept d
on(e.deptno=d.deptno);
OR
Ex: SQL>select empno,ename,job,dname,loc from emp e, dept d where e.deptno = d.deptno(+)
Union
Select empno,ename,job,dname,loc from emp e, dept d where e.deptno (+) = d.deptno
View:
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
A view has a logical existence but a materialized view has a physical existence. Moreover a materialized
view can be indexed, analyzed and so on....that is all the things that we can do with a table can also be
done with a materialized view.
We can keep aggregated data into materialized view. We can schedule the MV to refresh but table can’t.
MV can be created based on multiple tables.
Materialized View:
In DWH materialized views are very essential because in reporting side if we do aggregate calculations as
per the business requirement report performance would be de graded. So to improve report
performance rather than doing report calculations and joins at reporting side if we put same logic in the
MV then we can directly select the data from MV without any joins and aggregations. We can also
schedule MV (Materialize View).
Inline view:
If we write a select statement in from clause that is nothing but inline view.
Ex:
Get dept wise max sal along with empname and emp no.
DELETE
The DELETE command is used to remove rows from a table. A WHERE clause can be used to only remove
some rows. If no WHERE condition is specified, all rows will be removed. After performing a DELETE
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
operation you need to COMMIT or ROLLBACK the transaction to make the change permanent or to undo
it.
TRUNCATE
TRUNCATE removes all rows from a table. The operation cannot be rolled back. As such, TRUCATE is faster
and doesn't use as much undo space as a DELETE.
DROP
The DROP command removes a table from the database. All the tables' rows, indexes and privileges will
also be removed. The operation cannot be rolled back.
ROWID
A globally unique identifier for a row in a database. It is created at the time the row is inserted into a
table, and destroyed when it is removed from a table.'BBBBBBBB.RRRR.FFFF' where BBBBBBBB is the block
number, RRRR is the slot(row) number, and FFFF is a file number.
ROWNUM
For each row returned by a query, the ROWNUM pseudo column returns a number indicating the order in
which Oracle selects the row from a table or set of joined rows. The first row selected has a ROWNUM of 1,
the second has 2, and so on.
You can use ROWNUM to limit the number of rows returned by a query, as in this example:
Rowid Row-num
Rowid is an oracle internal id that is allocated Row-num is a row number returned by a select
every time a new record is inserted in a table. statement.
This ID is unique and cannot be changed by the
user.
Rowid is permanent. Row-num is temporary.
Rowid is a globally unique identifier for a row in a The row-num pseudocoloumn returns a number
database. It is created at the time the row is indicating the order in which oracle selects the
inserted into the table, and destroyed when it is row from a table or set of joined rows.
removed from a table.
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
The WHERE clause cannot be used to restrict groups. you use theHAVING clause to restrict groups.
MERGE Statement
You can use merge command to perform insert and update in a single command.
Ex: Merge into student1 s1
Using (select * from student2) s2
On (s1.no=s2.no)
When matched then
Update set marks = s2.marks
When not matched then
Insert (s1.no, s1.name, s1.marks) Values (s2.no, s2.name, s2.marks);
What is the difference between sub-query & co-related sub query?
A sub query is executed once for the parent statement
Whereas the correlated sub query is executed once for each
row of the parent query.
Sub Query:
Example:
Select deptno, ename, sal from emp a where sal in (select sal from Grade where sal_grade=’A’ or
sal_grade=’B’)
Co-Related Sub query:
Example:
Find all employees who earn more than the average salary in their department.
SELECT last-named, salary, department_id FROM employees A
WHERE salary > (SELECT AVG (salary)
FROM employees B WHERE B.department_id =A.department_id
Group by B.department_id)
EXISTS:
The EXISTS operator tests for existence of rows in
the results set of the subquery.
7
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Indexes:
1. Bitmap indexes are most appropriate for columns having low distinct values—such as GENDER,
MARITAL_STATUS, and RELATION. This assumption is not completely accurate, however. In reality, a
bitmap index is always advisable for systems in which data is not frequently updated by many
concurrent systems. In fact, as I'll demonstrate here, a bitmap index on a column with 100-
percent unique values (a column candidate for primary key) is as efficient as a B-tree index.
2. When to Create an Index
3. You should create an index if:
4. A column contains a wide range of values
5. A column contains a large number of null values
6. One or more columns are frequently used together in a WHERE clause or a join condition
7. The table is large and most queries are expected to retrieve less than 2 to 4 percent of the rows
8. By default if u create index that is nothing but b-tree index.
It is a perfect valid question to ask why hints should be used. Oracle comes with an optimizer that
promises to optimize a query's execution plan. When this optimizer is really doing a good job, no hints
should be required at all.
Sometimes, however, the characteristics of the data in the database are changing rapidly, so that the
optimizers (or more accurately, its statistics) are out of date. In this case, a hint could help.
You should first get the explain plan of your SQL and determine what changes can be done to make the
code operate without using hints if possible. However, hints such as ORDERED, LEADING, INDEX, FULL, and
the various AJ and SJ hints can take a wild optimizer and give you optimal performance
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Hint categories:
Hints can be categorized as follows:
1. ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing systems.
(/*+ ALL_ROWS */)
2. FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.
(/*+ FIRST_ROWS */)
3. CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS, based on statistics
gathered.
1. Hints for Join Orders,
2. Hints for Join Operations,
3. Hints for Parallel Execution, (/*+ parallel(a,4) */) specify degree either 2 or 4 or 16
4. Additional Hints
5. HASH
Hashes one table (full scan) and creates a hash index for that table. Then hashes other table and
uses hash index to find corresponding records. Therefore not suitable for < or > join conditions.
/*+ use_hash */
Use Hint to force using index
10
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
What is your tuning approach if SQL query taking long time? Or how do u tune SQL query?
If query taking long time then First will run the query in Explain Plan, The explain plan process stores data in
the PLAN_TABLE.
it will give us execution plan of the query like whether the query is using the relevant indexes on the joining
columns or indexes to support the query are missing.
If joining columns doesn’t have index then it will do the full table scan if it is full table scan the cost will be
more then will create the indexes on the joining columns and will run the query it should give better
performance and also needs to analyze the tables if analyzation happened long back. The ANALYZE
statement can be used to gather statistics for a specific table, index or cluster using
ANALYZE TABLE employees COMPUTE STATISTICS;
If still have performance issue then will use HINTS, hint is nothing but a clue. We can use hints like
6. ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing systems.
(/*+ ALL_ROWS */)
7. FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.
(/*+ FIRST_ROWS */)
8. CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS, based on statistics
gathered.
9. HASH
Hashes one table (full scan) and creates a hash index for that table. Then hashes other table and
uses hash index to find corresponding records. Therefore not suitable for < or > join conditions.
/*+ use_hash */
Hints are most useful to optimize the query performance.
11
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Using storedprocedure we can access and modify data present in many tables.
Also a stored procedure is not associated with any particular database object.
But triggers are event-driven special procedures which are attached to a specific database object say
a table.
Stored procedures are not automatically run and they have to be called explicitly by the user. But
triggers get executed when the particular event associated with the event gets fired.
Packages:
Packages provide a method of encapsulating related procedures, functions, and associated cursors and
variables together as a unit in the database.
package that contains several procedures and functions that process related to same transactions.
A package is a group of related procedures and functions, together with the cursors and variables they
use,
Packages provide a method of encapsulating related procedures, functions, and associated cursors and
variables together as a unit in the database.
Triggers:
Oracle lets you define procedures called triggers that run implicitly when an INSERT, UPDATE, or DELETE
statement is issued against the associated table
Triggers are similar to stored procedures. A trigger stored in the database can include SQL and PL/SQL
Types of Triggers
This section describes the different types of triggers:
1. Row Triggers and Statement Triggers
2. BEFORE and AFTER Triggers
3. INSTEAD OF Triggers
4. Triggers on System Events and User Events
Row Triggers
A row trigger is fired each time the table is affected by the triggering statement. For example, if an
UPDATE statement updates multiple rows of a table, a row trigger is fired once for each row affected by
the UPDATE statement. If a triggering statement affects no rows, a row trigger is not run.
BEFORE and AFTER Triggers
When defining a trigger, you can specify the trigger timing--whether the trigger action is to be run before
or after the triggering statement. BEFORE and AFTER apply to both statement and row triggers.
BEFORE and AFTER triggers fired by DML statements can be defined only on tables, not on views.
12
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
A tablespace in an Oracle database consists of one or more physical datafiles. A datafile can be
associated with only one tablespace and only one database.
Table Space:
Oracle stores data logically in tablespaces and physically in datafiles associated with the corresponding
tablespace.
A database is divided into one or more logical storage units called tablespaces. Tablespaces are divided
into logical units of storage called segments.
Control File:
A control file contains information about the associated database that is required for access by an
instance, both at startup and during normal operation. Control file information can be modified only by
Oracle; no database administrator or user can edit a control file.
IMPORTANT QUERIES
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Other query:
select
emp_id,
max(decode(rank_id,1,address)) as add1,
max(decode(rank_id,2,address)) as add2,
max(decode(rank_id,3,address))as add3
from
(select emp_id,address,rank() over (partition by emp_id order by emp_id,address )rank_id from temp )
group by
emp_id
5. Rank query:
Select empno, ename, sal, r from (select empno, ename, sal, rank () over (order by sal desc) r from EMP);
The DENSE_RANK function works acts like the RANK function except that it assigns consecutive ranks:
Select empno, ename, Sal, from (select empno, ename, sal, dense_rank () over (order by sal desc) r from
emp);
Select empno, ename, sal,r from (select empno,ename,sal,dense_rank() over (order by sal desc) r from
emp) where r<=5;
OR
Select * from (select * from EMP order by sal desc) where rownum<=5;
8. 2 nd highest Sal:
Select empno, ename, sal, r from (select empno, ename, sal, dense_rank () over (order by sal desc) r from
EMP) where r=2;
14
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
9. Top sal:
Select * from EMP where sal= (select max (sal) from EMP);
SQL> select *from emp where (rowid, 0) in (select rowid,mod(rownum,2) from emp);
Starting at the root, walk from the top down, and eliminate employee Higgins in the result, but
process the child rows.
SELECT department_id, employee_id, last_name, job_id, salary
FROM employees
WHERE last_name! = ’Higgins’
START WITH manager_id IS NULL
CONNECT BY PRIOR employee_id = menagerie;
15
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
DWHCONCEPTS
What is BI?
Business Intelligence refers to a set of methods and techniques that are used by organizations for tactical
and strategic decision making. It leverages methods and technologies that focus on counts, statistics and
business objectives to improve business performance.
The objective of Business Intelligence is to better understand customers and improve customer service,
make the supply and distribution chain more efficient, and to identify and address business problems and
opportunities quickly.
Warehouse is used for high level data analysis purpose. It is used for predictions, timeseries analysis, financial
Analysis, what -if simulations etc. Basically it is used for better decision making.
In terms of design data warehouse and data mart are almost the same.
In general a Data Warehouse is used on an enterprise level and a Data Mart is used on a business
division/department level.
A data mart only contains data specific to a particular subject areas.
Data mart is usually sponsored at the department Data warehouse is a “Subject-Oriented, Integrated,
level and developed with a specific issue or subject Time-Variant, Nonvolatile collection of data in
in mind, a data mart is a data warehouse with a support of decision making”.
16
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
focused objective.
A data mart is used on a business division/ A data warehouse is used on an enterprise level
department level.
A Data Mart is a subset of data from a Data A Data Warehouse is simply an integrated
Warehouse. Data Marts are built for specific user consolidation of data from a variety of sources that
groups. is specially designed to support strategic and
tactical decision making.
By providing decision makers with only a subset of The main objective of Data Warehouse is to provide
data from the Data Warehouse, Privacy, an integrated environment and coherent picture of
Performance and Clarity Objectives can be the business at a point in time.
attained.
A fact table that contains only primary keys from the dimension tables, and that do not contain any
measures that type of fact table is called fact less fact table.
What is a Schema?
DRILL DOWN, DRILL ACROSS, Graphs, PI charts, dashboards and TIME HANDLING
To be able to drill down/drill across is the most basic requirement of an end user in a data warehouse.
Drilling down most directly addresses the natural end-user need to see more detail in an result. Drill down
should be as generic as possible becuase there is absolutely no good way to predict users drill-down
path.
In Data warehousing grain refers to the level of detail available in a given fact table as well as to the
level of detail provided by a star schema.
It is usually given as the number of records per key within the table. In general, the grain of the fact table
is the grain of the star schema.
Fact table contains primary keys from all the dimension tables and other numeric columns columns of
additive, numeric facts.
17
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Unlike Star-Schema, Snowflake schema contain normalized dimension tables in a tree like structure with
many nesting levels.
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
In star schema only one join establishes the In snow flake schema since there is relationship
relationship between the fact table and any one between the dimensions tables it has to do many
of the dimension tables. joins to fetch the data.
A star schema optimizes the performance by Snowflake schemas normalize dimensions to
keeping queries simple and providing fast response eliminated redundancy. The result is more complex
time. All the information about the each level is queries and reduced query performance.
stored in one row.
It is called a star schema because the diagram It is called a snowflake schema because the
resembles a star. diagram resembles a snowflake.
A "fact" is a numeric value that a business wishes to count or sum. A "dimension" is essentially an entry point
for getting at the facts. Dimensions are things of interest to the business.
A set of level properties that describe a specific aspect of a business, used for analyzing the factual
measures.
A Fact Table in a dimensional model consists of one or more numeric facts of importance to a
business. Examples of facts are as follows:
1. the number of products sold
2. the value of products sold
3. the number of products produced
4. the number of service calls received
Factless fact table captures the many-to-many relationships between dimensions, but contains no
numeric or textual facts. They are often used to record events or coverage information.
Types of facts?
There are three types of facts:
1. Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact
table.
2. Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions
in the fact table, but not the others.
19
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
3. Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions
present in the fact table.
What is Granularity?
Principle: create fact tables with the most granular data possible to support analysis of the business
process.
In Data warehousing grain refers to the level of detail available in a given fact table as well as to the level
of detail provided by a star schema.
It is usually given as the number of records per key within the table. In general, the grain of the fact table
is the grain of the star schema.
Facts: Facts must be consistent with the grain. All facts are at a uniform grain.
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
An item that is in the fact table but is stripped off of its description, because the description belongs in
dimension table, is referred to as Degenerated Dimension. Since it looks like dimension, but is really in fact
table and has been degenerated of its description, hence is called degenerated dimension..
Degenerated Dimension: a dimension which is located in fact table known as Degenerated dimension
Dimensional Model:
A type of data modeling suited for data warehousing. In a dimensional model, there are two
types of tables: dimensional tables and fact tables. Dimensional table records information on
each dimension, and fact table records all the "fact", or measures.
Data modeling
There are three levels of data modeling. They are conceptual, logical, and physical
The differences between a logical data model and physical data model are shown below.
Logical vs Physical Data Modeling
Represents business information and defines Represents the physical implementation of the model
business rules in a database.
Entity Table
Attribute Column
Primary Key Primary Key Constraint
Alternate Key Unique Constraint or Unique Index
Inversion Key Entry Non Unique Index
Rule Check Constraint, Default Value
Relationship Foreign Key
Definition Comment
21
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
22
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
23
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
24
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
ACW_ORGANIZATION_D
ACW_DF_FEES_STG ACW_DF_FEES_F Primary Key
Non-Key Attributes Primary Key ORG_KEY [PK1]
SEGMENT1 ACW_DF_FEES_KEY Non-Key Attributes
ORGANIZATION_ID [PK1] ORGANIZATION_CODE
ITEM_TYPE Non-Key Attributes CREA TED_BY
BUYER_ID CREA TION_DATE
PRODUCT_KEY
COST_REQUIRED LAST_UPDATE_DATE
ORG_KEY
QUARTER_1_COST LAST_UPDATED_BY
DF_MGR_KEY
QUARTER_2_COST D_CREATED_BY
COST_REQUIRED
QUARTER_3_COST
DF_FEES D_CREATION_DATE PID for DF Fees
QUARTER_4_COST D_LAST_UPDATE_DATE
COSTED_BY
COSTED_BY D_LAST_UPDATED_BY
COSTED_DATE
COSTED_DATE
APPROV ING_MGR
APPROV ED_BY
APPROV ED_DATE
APPROV ED_DATE
D_CREATED_BY
D_CREATION_DATE ACW_USERS_D
D_LAST_UPDATE_BY Primary Key
D_LAST_UPDATED_DATE USER_KEY [PK1]
EDW_TIME_HIERARCHY Non-Key Attributes
PERSON_ID
EMAIL_ADDRESS
ACW_PCBA_A PPROVAL_F LAST_NAME
Primary Key FIRST_NAME
ACW_PCBA_A PPROVAL_STG FULL_NAME
PCBA _APPROVAL_KEY
Non-Key Attributes [PK1] EFFECTIV E_STA RT_DATE
INV ENTORY_ITEM_ID Non-Key Attributes EFFECTIV E_END_DATE
LATEST_REV PART_KEY EMPLOYEE_NUMBER
LOCATION_ID CISCO_PART_NUMBER LAST_UPDATED_BY
LOCATION_CODE LAST_UPDATE_DATE
SUPPLY_CHANNEL_KEY
APPROV AL_FLAG NPI CREA TION_DATE
ADJUSTMENT APPROV AL_FLAG CREA TED_BY
APPROV AL_DATE ADJUSTMENT D_LAST_UPDATED_BY
TOTA L_ADJUSTMENT APPROV AL_DATE D_LAST_UPDATE_DATE
TOTA L_ITEM_COST ADJUSTMENT_AMT D_CREATION_DATE
DEMAND SPEND_BY _ASSEMBLY D_CREATED_BY
COMM_MGR COMM_MGR_KEY ACW_PRODUCTS_D
BUYER_ID BUYER_ID Primary Key
BUYER RFQ_CREATED ACW_PART_TO_PID_D PRODUCT_KEY [PK1]
RFQ_CREATED RFQ_RESPONSE Users
Primary Key Non-Key Attributes
RFQ_RESPONSE
CSS PART_TO_PID_KEY [PK1] PRODUCT_NA ME
CSS D_CREATED_BY Non-Key Attributes BUSINESS_UNIT_ID
D_CREATED_DATE PART_KEY BUSINESS_UNIT
D_LAST_UPDATED_BY CISCO_PART_NUMBER PRODUCT_FAMILY_ID
ACW_DF_A PPROVAL_STG D_LAST_UPDATE_DATE PRODUCT_KEY PRODUCT_FAMILY
Non-Key Attributes PRODUCT_NA ME ITEM_TYPE
LATEST_REVISION D_CREATED_BY
INV ENTORY_ITEM_ID ACW_DF_A PPROVAL_F
D_CREATED_BY D_CREATION_DATE
CISCO_PART_NUMBER Primary Key
D_CREATION_DATE D_LAST_UPDATE_BY
LATEST_REV DF_APPROVAL_KEY D_LAST_UPDATED_BY D_LAST_UPDATED_DATE
PCBA _ITEM_FLAG [PK1]
APPROV AL_FLAG D_LAST_UPDATE_DATE
Non-Key Attributes
APPROV AL_DATE
LOCATION_ID PART_KEY
LOCATION_CODE CISCO_PART_NUMBER
BUYER SUPPLY_CHANNEL_KEY
BUYER_ID PCBA _ITEM_FLAG
RFQ_CREATED APPROV ED ACW_SUPPLY_CHA NNEL_D
RFQ_RESPONSE APPROV AL_DATE
Primary Key
CSS BUYER_ID
RFQ_CREATED SUPPLY_CHANNEL_KEY
RFQ_RESPONSE [PK1]
CSS Non-Key Attributes
D_CREATED_BY SUPPLY_CHANNEL
D_CREATION_DATE DESCRIPTION
D_LAST_UPDATED_BY LAST_UPDATED_BY
D_LAST_UPDATE_DATE LAST_UPDATE_DATE
CREA TED_BY
CREA TION_DATE
D_LAST_UPDATED_BY
D_LAST_UPDATE_DATE
D_CREATED_BY
D_CREATION_DATE
25
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
If target and source databases are different and target table volume is high it contains some millions of
records in this scenario without staging table we need to design your informatica using look up to find out
whether the record exists or not in the target table since target has huge volumes so its costly to create
cache it will hit the performance.
If we create staging tables in the target database we can simply do outer join in the source qualifier to
determine insert/update this approach will give you good performance.
2. We can create indexes in the staging state, to perform our source qualifier best.
3. If we have the staging area no need to relay on the informatics transformation to known whether
the record exists or not.
Data cleansing
Weeding out unnecessary or unwanted things (characters and spaces etc) from incoming data to
make it more meaningful and informative
Data merging
Data can be gathered from heterogeneous systems and put together
Data scrubbing
Data scrubbing is the process of fixing or eliminating individual pieces of data that are incorrect,
incomplete or duplicated before the data is passed to end user.
Data scrubbing is aimed at more than eliminating errors and redundancy. The goal is also to bring
consistency to various data sets that may have been created with different, incompatible business
rules.
My understanding of ODS is, its a replica of OLTP system and so the need of this, is to reduce the burden
on production system (OLTP) while fetching data for loading targets. Hence its a mandate Requirement
for every Warehouse.
So every day do we transfer data to ODS from OLTP to keep it up to date?
OLTP is a sensitive database they should not allow multiple select statements it may impact the
26
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
performance as well as if something goes wrong while fetching data from OLTP to data warehouse it will
directly impact the business.
ODS is the replication of OLTP.
ODS is usually getting refreshed through some oracle jobs.
enables management to gain a consistent picture of the business.
A primary key is a special constraint on a column or set of columns. A primary key constraint ensures that
the column(s) so designated have no NULL values, and that every value is unique. Physically, a primary
key is implemented by the database system using a unique index, and all the columns in the primary key
must have been declared NOT NULL. A table may have only one primary key, but it may be composite
(consist of more than one column).
A surrogate key is any column or set of columns that can be declared as the primary key instead of a
"real" or natural key. Sometimes there can be several natural keys that could be declared as the primary
key, and these are all called candidate keys. So a surrogate is a candidate key. A table could actually
have more than one surrogate key, although this would be unusual. The most common type of surrogate
key is an incrementing integer, such as an auto increment column in MySQL, or a sequence in Oracle, or
an identity column in SQL Server.
27
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
ETL-INFORMATICA
Differences between connected lookup and unconnected lookup
28
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
here is first record needs to get insert and last that record to insert flow.
record should get update in the target table. But still this record dose not available in the
cache memory so when the last record comes
to lookup it will check in the cache it will not find
the match so it returns null value again it will go
to insert flow through router but it is suppose to
go to update flow because cache didn’t get
refreshed when the first record get inserted into
target table.
Joiner Lookup
In joiner on multiple matches it will return all In lookup it will return either first record or last record
matching records. or any value or error value.
In joiner we cannot configure to use persistence Where as in lookup we can configure to use
cache, shared cache, uncached and dynamic persistence cache, shared cache, uncached and
cache dynamic cache.
We cannot override the query in joiner We can override the query in lookup to fetch the
data from multiple tables.
We can perform outer join in joiner transformation. We cannot perform outer join in lookup
transformation. But lookup by default work as a left
outer join
We cannot use relational operators in joiner Where as in lookup we can use the relation
transformation.(i.e. <,>,<= and so on) operators. (i.e. <,>,<= and so on)
In source qualifier it will push all the matching Where as in lookup we can restrict whether to
records. display first value, last value or any value
In source qualifier there is no concept of cache. Where as in lookup we concentrate on cache
concept.
When both source and lookup are in same When the source and lookup table exists in
database we can use source qualifier. different database then we need to use lookup.
1. Yes, One of my mapping was taking 3-4 hours to process 40 millions rows into staging table we
don’t have any transformation inside the mapping its 1 to 1 mapping .Here nothing is there to
optimize the mapping so I created session partitions using key range on effective date column. It
improved performance lot, rather than 4 hours it was running in 30 minutes for entire
40millions.Using partitions DTM will creates multiple reader and writer threads.
2. There was one more scenario where I got very good performance in the mapping level .Rather
29
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
than using lookup transformation if we can able to do outer join in the source qualifier query
override this will give you good performance if both lookup table and source were in the same
database. If lookup tables is huge volumes then creating cache is costly.
3. And also if we can able to optimize mapping using less no of transformations always gives you
good performance.
4. If any mapping taking long time to execute then first we need to look in to source and target
statistics in the monitor for the throughput and also find out where exactly the bottle neck by
looking busy percentage in the session log will come to know which transformation taking more
time, if your source query is the bottle neck then it will show in the end of the session log as “query
issued to database “that means there is a performance issue in the source query. We need to
tune the query using.
If we look into session logs it shows busy percentage based on that we need to find out where is bottle
neck.
***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] ****
Thread [READER_1_1_1] created for [the read stage] of partition point [SQ_ACW_PCBA_APPROVAL_STG]
has completed: Total Run Time = [7.193083] secs, Total Idle Time = [0.000000] secs, Busy Percentage =
[100.000000]
Thread [WRITER_1_*_1] created for [the write stage] of partition point [ACW_PCBA_APPROVAL_F1,
ACW_PCBA_APPROVAL_F] has completed: Total Run Time = [0.806521] secs, Total Idle Time = [0.000000]
secs, Busy Percentage = [100.000000]
If suppose I've to load 40 lacs records in the target table and the workflow is taking about 10 - 11 hours to
finish. I've already increased the cache size to 128MB. There are no joiner, just lookups and expression
transformations
(1) If the lookups is uncached and have many records, try creating indexes on the columns used in the
lkp condition. And try increasing the lookup cache.If this doesnt increase the performance. If the target
has any indexes disable them in the target pre load and enable them in the target post load.
30
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
(3)If your target consists key constraints and indexes u slowthe loading of data. To improve the session
performance inthis case drop constraints and indexes before you run thesession and rebuild them after
completion of session.
By setting Constraint Based Loading property at session level in Configaration tab we can load the data
into parent and child relational tables (primary foreign key).
Genarally What it do is it will load the data first in parent table then it will load it in to child table.
If we copy source definaltions or target definations or mapplets from Shared folder to any other folders
that will become a shortcut.
Let’s assume we have imported some source and target definitions in a shared folder after that we are
using those sources and target definitions in another folder as a shortcut in some mappings.
If any modifications occur in the backend (Database) structure like adding new columns or drop existing
columns either in source or target I f we reimport into shared folder those new changes automatically it
would reflect in all folder/mappings wherever we used those sources or target definitions.
If we don’t have primary key on target table using Target Update Override option we can perform
updates.By default, the Integration Service updates target tables based on key values. However, you can
override the default UPDATE statement for each target in a mapping. You might want to update the
target based on non-key columns.
You can override the WHERE clause to include non-key columns. For example, you might want to update
records for employees named Mike Smith only. To do this, you edit the WHERE clause as follows:
If you modify the UPDATE portion of the statement, be sure to use: TU to specify ports.
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Complex Mapping
1. We have one of the order file requirement. Requirement is every day in source system they will
place filename with timestamp in informatica server.
2. We have to process the same date file through informatica.
3. Source file directory contain older than 30 days files with timestamps.
4. For this requirement if I hardcode the timestamp for source file name it will process the same file
every day.
5. So what I did here is I created $InputFilename for source file name.
6. Then I am going to use the parameter file to supply the values to session variables
($InputFilename).
7. To update this parameter file I have created one more mapping.
8. This mapping will update the parameter file with appended timestamp to file name.
9. I make sure to run this parameter file update mapping before my actual mapping.
32
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
5. If the file size is greater than zero then it will send email notification to source system POC (point of
contact) along with deno zero record file and appropriate email subject and body.
6. If file size<=0 that means there is no records in flat file. In this case shell script will not send any
email notification.
7. Or
8. We are expecting a not null value for one of the source column.
9. If it is null that means it is a error record.
10. We can use the above approach for error handling.
Parameter file it will supply the values to session level variables and mapping level variables.
Variables are of two types:
1. Session level variables
2. Mapping level variables
Session level variables:
Session parameters, like mapping parameters, represent values you might want to change between
sessions, such as a database connection or source file. Use session parameters in the session properties,
and then define the parameters in a parameter file. You can specify the parameter file for the session to
use in the session properties. You can also specify it when you use pmcmd to start the session.The
Workflow Manager provides one built-in session parameter, $PMSessionLogFile.With $PMSessionLogFile,
you can change the name of the session log generated for the session.The Workflow Manager also allows
you to create user-defined session parameters.
Use session parameters to make sessions more flexible. For example, you have the same type of
transactional data written to two different databases, and you use the database connections TransDB1
and TransDB2 to connect to the databases. You want to use the same mapping for both tables. Instead
of creating two sessions for the same mapping, you can create a database connection parameter,
$DBConnectionSource, and use it as the source database connection for the session. When you create a
parameter file for the session, you set $DBConnectionSource to TransDB1 and run the session. After the
session completes, you set $DBConnectionSource to TransDB2 and run the session again.
You might use several session parameters together to make session management easier. For example,
you might use source file and database connection parameters to configure a session to read data from
33
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
different source files and write the results to different target databases. You can then use reject file
parameters to write the session reject files to the target machine. You can use the session log parameter,
$PMSessionLogFile, to write to different session logs in the target machine, as well.
When you use session parameters, you must define the parameters in the parameter file. Session
parameters do not have default values. When the PowerCenter Server cannot find a value for a session
parameter, it fails to initialize the session.
Mapping level variables are of two types:
1. Variable
2. Parameter
What is the difference between mapping level and session level variables?
Mapping level variables always starts with $$.
A session level variable always starts with $.
Flat File
Flat file is a collection of data in a file in the specific format.
Informatica can support two types of files
1. Delimiter
2. Fixed Width
In delimiter we need to specify the separator.
In fixed width we need to known about the format first. Means how many character to read for particular
column.
In delimiter also it is necessary to know about the structure of the delimiter. Because to know about the
headers.
If the file contains the header then in definition we need to skip the first row.
List file:
If you want to process multiple files with same structure. We don’t need multiple mapping and multiple
sessions.
We can use one mapping one session using list file option.
First we need to create the list file for all the files. Then we can use this file in the main mapping.
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.ST:s_m_G
EHC_APO_BAAN_SALES_HIST_AUSTRI]
$InputFileName_BAAN_SALE_HIST=/interface/dev/etl/apo/srcfiles/HS_025_20070921
$DBConnection_Target=DMD2_GEMS_ETL
$$CountryCode=AT
$$CustomerNumber=120165
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.ST:s_m_G
EHC_APO_BAAN_SALES_HIST_BELUM]
$DBConnection_Sourcet=DEVL1C1_GEMS_ETL
34
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
$OutputFileName_BAAN_SALES=/interface/dev/etl/apo/trgfiles/HS_002_20070921
$$CountryCode=BE
$$CustomerNumber=101495
35
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
4. Because its var so it stores the max last upd_date value in the repository, in the next run our
source qualifier query will fetch only the records updated or inseted after previous run.
36
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Logic in the SQ is
37
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
In expression assign max last update date value to the variable using function set max variable.
38
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
39
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
40
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
41
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.ST:s_m_GE
HC_APO_BAAN_SALES_HIST_AUSTRI]
$DBConnection_Source=DMD2_GEMS_ETL
$DBConnection_Target=DMD2_GEMS_ETL
$$LastUpdateDate Time =01/01/1940
42
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Main mapping
43
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Workflod Design
44
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Informatica Tuning
The aim of performance tuning is optimize sessionperformance so sessions run during the available load
windowfor the Informatica Server. Increase the session performance by following.
The performance of the Informatica Server is related tonetwork connections. Data generally moves
across a networkat less than 1 MB per second, whereas a local disk movesdata five to twenty times faster.
Thus network connectionsoften affect on session performance.
1. Cache lookups if source table is under 500,000 rows and DON’T cache for tables over 500,000
rows.
2. Reduce the number of transformations. Don’t use an Expression Transformation to collect fields.
Don’t use an Update Transformation if only inserting. Insert mode is the default.
3. If a value is used in multiple ports, calculate the value once (in a variable) and reuse the result
instead of recalculating it for multiple ports.
4. Reuse objects where possible.
5. Delete unused ports particularly in the Source Qualifier and Lookups.
6. Use Operators in expressions over the use of functions.
7. Avoid using Stored Procedures, and call them only once during the mapping if possible.
8. Remember to turn off Verbose logging after you have finished debugging.
45
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
9. Use default values where possible instead of using IIF (ISNULL(X),,) in Expression port.
10. When overriding the Lookup SQL, always ensure to put a valid Order By statement in the SQL. This
will cause the database to perform the order rather than Informatica Server while building the
Cache.
11. Improve session performance by using sorted data with the Joiner transformation. When the
Joiner transformation is configured to use sorted data, the Informatica Server improves
performance by minimizing disk input and output.
12. Improve session performance by using sorted input with the Aggregator Transformation since it
reduces the amount of data cached during the session.
13. Improve session performance by using limited number of connected input/output or output ports
to reduce the amount of data the Aggregator transformation stores in the data cache.
14. Use a Filter transformation prior to Aggregator transformation to reduce unnecessary
aggregation.
15. Performing a join in a database is faster than performing join in the session. Also use the Source
Qualifier to perform the join.
16. Define the source with less number of rows and master source in Joiner Transformations, since this
reduces the search time and also the cache.
17. When using multiple conditions in a lookup conditions, specify the conditions with the equality
operator first.
18. Improve session performance by caching small lookup tables.
19. If the lookup table is on the same database as the source table, instead of using a Lookup
transformation, join the tables in the Source Qualifier Transformation itself if possible.
20. If the lookup table does not change between sessions, configure the Lookup transformation to
use a persistent lookup cache. The Informatica Server saves and reuses cache files from session to
session, eliminating the time required to read the lookup table.
21. Use :LKP reference qualifier in expressions only when calling unconnected Lookup
Transformations.
22. Informatica Server generates an ORDER BY statement for a cached lookup that contains all
lookup ports. By providing an override ORDER BY clause with fewer columns, session performance
can be improved.
23. Eliminate unnecessary data type conversions from mappings.
24. Reduce the number of rows being cached by using the Lookup SQL Override option to add a
WHERE clause to the default SQL statement.
Tuning
Tuning a PowerCenter 8 ETL environment is not that straightforward. A chain is only as strong as the
weakest link. There are four crucial domains that require attention: system, network, database and the
PowerCenter 8 installation itself. It goes without saying that without a well performing infrastructure the
tuning of the PowerCenter 8 environment will not make much of a difference.
As the first three domains are located in the realms of administrators, this article will only briefly touch these
subjects and will mainly focus on the items available to developers within PowerCenter 8.
Tuning is an iterative process: at each iteration the largest bottleneck is removed, gradually improving
performance. Bottlenecks can occur on the system, on the database (either source or target), or within
the mapping or session ran by the Integration Service. To identify bottlenecks, run test sessions, monitor the
system usage and gather advanced performance statistics while running. Examine the session log in detail
as it provides valuable information concerning session performance. From the perspective of a developer,
46
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
1. source / target
2. mapping
3. session
4. system
If tuning the mapping and session still proves to be inadequate, the underlying system will need to be
examined closer. This extended examination needs to be done in close collaboration with the system
administrators and database administrators (DBA). They have several options to improve performance
without invoking hardware changes. Examples are distributing database files over different disks,
improving network bandwidth and lightening the server workload by moving other applications. However
if none of this helps, only hardware upgrades will bring redemption to your performance problems.
Session logs
The PowerCenter session log provides very detailed information that can be used to establish a baseline
and will identify potential problems.
Very useful for the developer are the detailed thread statistics that will help benchmarking your actions.
The thread statistics will show if the bottlenecks occur while transforming data or while reading/writing.
Always focus attention on the thread with the highest busy percentage first. For every thread, detailed
information on the run and idle time are presented. The busy percentage is calculated: (run time – idle
time) / (run time * 100).
1. reader thread
2. transformation thread
3. writer thread
An example:
***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] *****
Thread [READER_1_1_1] created for [the read stage] of partition point [SQ_X_T_CT_F_SITE_WK_ENROLL] has
completed: Total Run Time = [31.984171] secs, Total Idle Time = [0.000000] secs, Busy Percentage =
[100.000000].
Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point
[SQ_X_T_CT_F_SITE_WK_ENROLL] has completed: Total Run Time = [0.624996] secs, Total Idle Time =
[0.453115] secs, Busy Percentage = [27.501083].
Thread [WRITER_1_*_1] created for [the write stage] of partition point [T_CT_F_SITE_WK_BSC] has
completed: Total Run Time = [476.668825] secs, Total Idle Time = [0.000000] secs, Busy Percentage =
[100.000000].
In this particular case it is obvious that the database can be considered as the main bottleneck. Both
reading and writing use most of the execution time. The actual transformations only use a very small
47
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
amount of time. If a reader or writer thread is 100% busy, consider partitioning the session. This will allow the
mapping to open several connections to the database, each reading/writing data from/to a partition
thus improving data read/write speed.
***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] *****
Thread [READER_1_1_1] created for [the read stage] of partition point [SQ_T_CT_F_SITE_WK_BSC] has
completed. The total run time was insufficient for any meaningful statistics.
Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point [SQ_T_CT_F_SITE_WK_BSC]
has completed: Total Run Time = [22.765478] secs, Total Idle Time = [0.000000] secs, Busy Percentage =
[100.000000].
Thread [WRITER_1_*_1] created for [the write stage] of partition point [T_CT_F_SITE_WK_BSC] has
completed: Total Run Time = [30.937302] secs, Total Idle Time = [20.345600] secs, Busy Percentage =
[34.23602355].
In the example above, the transformation thread poses the largest bottleneck and needs to be dealt with
first. The reader thread finished so quickly no meaningful statistics were possible. The writer thread spends
the majority of time in the idle state, waiting for data to emerge from the transformation thread. Perhaps
an unsorted aggregator is used, causing the Integration Service to sort all data before releasing any
aggregated record?
The number of threads can increase if the sessions will read/write to multiple targets, if sessions have
multiple execution paths, if partitioning is used …
Establishing a baseline
48
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
The image part with relationship ID rId37 was not found in the file.
Reading from sources breaks down into two distinct categories: reading relational sources and reading flat
files. Sometimes, both source types are combined in a single mapping.
A homogeneous join is a join between relational sources that combine data from a single origin: for
example a number of Oracle tables being joined.
A heterogeneous joins is a join between sources that combine data from different origins: when for
example Oracle-data is joined with a flat file.
Whatever source you are trying to read, always try to limit the incoming data stream maximally. Place
filters as early as possible in the mapping, preferably in the source qualifier. This will ensure only data
needed by the Integration Services is picked up from the database and transported over the network. If
you suspect the performance of reading relational data is not optimal, replace the relational source with a
flat file source containing the same data. If there is a difference in performance, the path towards the
source database should be investigated more closely, such as execution plan of the query, database
performance, network performance, network package sizes,…
When using homogeneous relational sources, use a single source qualifier with a user defined join instead
of a joiner transformation. This will force the join being executed on the database instead of the
PowerCenter 8 platform. If a joiner transformation is used instead, all data is first picked up from the
database server, then transported to the PowerCenter 8 platform, sorted and only as a last step joined by
the Integration server.
Consider pre-sorting the data in the database, this will make further sorting for later aggregators, joiners,…
by the Integration Service unnecessary. Make sure the query executed on the database has a favourable
execution plan. Use the explain plan (Oracle) facility to verify the query's execution plan if indexes are
optimally used. Do not use synonyms or database links unless really needed as these will slow down the
data stream.
In general it is good practice to always generate keys for primary key and foreign key fields. If no key is
available or known a dummy key should be used. In the reference table an extra dummy record should be
49
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
inserted. This method will improve join performance when using homogenous joins. In general three dummy
rows should be included:
3. 999997 Missing
When using heterogeneous sources there is no alternative but to use a joiner transformation. To ease up
matters for the Integration service, ensure that all relational sources are sorted and joined in advance in the
source qualifier. Flat file sources need to be sorted before joining, using a sorter transformation. When
joining the 2 sorted sources, check the sorted input property at the joiner transformation. The sorted input
option allows the joiner transformation to start passing records to subsequent transformations as soon as the
key value changes. Normal behaviour would be to hold passing data until all data is sorted and processed
in the joiner transformation.
By matching the session property Line Sequential buffer length to the size of exactly one record overhead is
minimized. If possible stage flat files in a staging table. Joining and filtering can then be done in the
database.
One of the most common performance issues in PowerCenter 8 is slow writing to target databases. This is
usually caused by a lack of database or network performance. You can test for this behaviour by replacing
the relational target with a flat file target. If performance increases considerably it is clear something is
wrong with the relational target.
Indexes are usually the root cause of slow target behaviour. In Oracle, every index on a table will decrease
the performance of an insert statement by 25%. The more indexes are defined, the slower insert/update
statements will be. Every time an update/insert statement is executed, the indexes need to be updated as
well. Try dropping the indexes on the target table. If this does not increases performance, the network is
likely causing the problem.
In general avoid having too many targets in a single mapping. Increasing the commit interval will decrease
the amount of session overhead. Three different commit types are available for targets:
3. user defined commit: slowest , avoid using user defined commit when not really necessary
PowerCenter has two methods for inserting data in a relational target: normal or bulk loads. Normal loads
will generate DML-statements. Bulk loads will bypass the database log and are available for DB2, Sybase,
Oracle (SQL Loader), or Microsoft SQL Server. This loading method has a considerable performance gain
50
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
but has two drawbacks: the recovery of a session will not be possible as no rollback data is kept by the
database. When bulk loading, the target table cannot have any indexes defined upon, so drop and
recreate the indexes before and after the session. For every case you will have to weigh if dropping and
recreating the indexes while using a bulk load outperforms a classic insert-statement with all indexes in
place.
Remember to use a very large commit interval when using bulk loads with Oracle and Microsoft to avoid
unnecessary overhead. Dropping and recreating indexes can be done by using pre -and post session tasks
or by calling a stored procedure within the mapping.
When the session is updating a set of records in a large table, the use of a primary key or unique index is
absolutely necessary. Be sure to check the explain plan and verify the proper index usage. Sometimes it is
faster to only keep unique indexes while loading data and dropping the non-unique indexes not needed
by the session. These indexes can be recreated at the end of the session.
Now data is being read and written in the most optimal way, it is time to focus our attention to the actual
mapping. The basic idea is simple: minimize the incoming data stream and create as little as possible
overhead within the mapping. A first step in achieving this goal is to reduce the number of transformations
to a minimum by reusing common logic. Perhaps the use of the same lookup in different pipes could be
redesigned to only use the lookup once? By using clever caching strategies, cache can be reused in the
mapping or throughout the workflow. Especially with active transformations (transformations where the
number of records is being changed) the use of caching is extremely important. Active transformations
that reduce the number of records should be placed as early as possible in the mapping.
Data type conversions between transformations in the mapping are costly. Be sure to check if all explicit
and implicit conversions really are necessary. When the data from a source is passed directly to a target
without any other actions to be done, connect the source qualifier directly to the target without the use of
other transformations.
Single pass reading allows multiple targets being populated using the data from the same source qualifier.
Consider using single pass reading if there are multiple sessions using the same source: the existing mapping
logic can be combined by using multiple pipelines. Common data manipulations for all pipelines should be
done before splitting out the pipeline.
By times it is better not to create mappings at all: staging mappings could be replaced by snapshots or
replication in the database. Databases are specialized in these types of data transfer and are in general
far more efficient in processing then passing the data through PowerCenter .
Transformation Mechanics
Every transformation has its specifics related to performance. In the section below the most important items
are discussed.
51
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
A joiner transformation should be used to join heterogeneous data sources. Homogeneous sources should
always be joined in the database by using the user defined join in the source qualifier transformation.
If not sorted at database level, always use a sorter transformation to sort the data before entering the joiner
transformation. Make sure the sorter transformation has sufficient cache to enable a 1-pass sort. Not having
sufficient cache will plummet performance. The data could be sorted in the joiner, but there are three
advantages of using the sorter transformation:
The use of sorted input enables the joiner transformation to start passing data to subsequent
transformations before all data was passed in the joiner transformation. Consequently, the transformations
following the joiner transformation start receiving data nearly immediately and do not have to wait until all
the data was sorted and joined in the joiner transformation. This logic is only valid when the source can be
sorted in the database: for example when joining SQL-Server and Oracle. Both sources can be sorted in the
database, making additional sorting using sorters superfluous. When a sorter is needed, for example when
joining Oracle and a flat file, the sorter will have to wait until all data is read from the flat file before records
to the joiner transformation can be passed.
The sorting algorithm used in the sorter is faster then the algorithm used in joiners or aggregators.
The use of sorted input in the joiner transformation, allows for a smaller cache size, leaving more memory for
other transformations or sessions. Again, when a flat file is used, a sorter will be needed prior to the joiner
transformation. Although the joiner transformation uses less cache, the sorter cache will need to be
sufficiently large to enable sorting all input records.
The image part with relationship ID rId37 was not found in the file.
As outer joins are far more expensive than inner joins, try to avoid them as much as possible. The master
source should be designated as the source containing fewer rows then the detail source. Join as early as
possible in the pipeline as this limits the number of pipes and decreases the amount of data being sent to
other transformations.
Only use a filter transformation for non-relational sources. When using relational sources, filter in the source
qualifier. Filter as early as possible in the data stream. Try filtering by using numeric lookup conditions.
Numeric matching is considerably faster then the matching of strings. Avoid complex logic in the filter
52
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
condition. Be creative in rewriting complex expressions to the shortest possible length. When multiple filters
are needed, consider using a router transformation as this will simplify mapping logic.
A lookup transformation is used to lookup values from another table. Clever use of lookup caches can
make a huge difference in performance.
By default, lookup transformations are cached. The selected lookup fields from the lookup table are read
into memory and a lookup cache file is built every time the lookup is called. To minimize the usage of
lookup cache, only retrieve lookup ports that are really needed.
However, to cache or not to cache a lookup really depends on the situation. An uncached lookup makes
perfect sense if only a small percentage of lookup rows will be used. For example if we only need 200 rows
in a 10 000 000 rows table. In this particular case, building the lookup cache would require an extensive
amount of time. A direct select to the database for every lookup row will be much faster on the condition
that the lookup key in the database is indexed.
Sometimes a lookup is used multiple times in an execution path of a mapping or workflow. Re-caching the
lookup every time would be time consuming and unnecessary, as long as the lookup source table remains
unchanged. The persistent lookup cache property was created to handle this type of situation. Only when
calling the lookup the first time, the lookup cache file is refreshed. All following lookups reuse the persistent
cache file. Using a persistent cache can improve performance considerably because the Integration
Service builds the memory cache from the cache files instead of the database.
Use dynamic lookup cache when the lookup source table is a target in the mapping and updated
dynamically throughout the mapping. Normal lookups caches are static. The records that were inserted in
the session are not available to the lookup cache. When using dynamic lookup cache, newly inserted or
updated records are update in the lookup cache immediately.
Ensure sufficient memory cache is available for the lookup. If not, the Integration server will have to write to
disk, slowing down.
By using the Additional Concurrent Pipelines property at session level, lookup caches will start building
concurrently at the start of the mapping. Normal behaviour would be that a lookup cache is created only
when the lookup is called. Pre-building caches versus building caches on demand can increase the total
session performance considerably, but only when the pre-build lookups will be used for sure in the session.
Again, the performance gain of setting this property will depend on the particular situation.
An aggregator transformation is an active transformation, used to group and aggregate data. If the input
was not sorted already, always use a sorter transformation in front of the aggregator transformation. As with
the joiner transformation the aggregator transformation will accumulate data until the dataset is complete
and only starts processing and sending output records from there on. When sorted input is used, the
aggregator will process and sent output records as soon as the first set of records is complete. This will allow
for much faster processing and smaller caches. Use as little as possible functions and difficult nested
conditions. Especially avoid the use of complex expressions in the group by ports. If needed use an
expression transformation to build these expressions in advance. When using change data capture,
incremental aggregation will enhance performance considerably.
53
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Sometimes, simple aggregations can be done by using an expression transformation that uses variables. In
certain cases this could be a valid alternative for an aggregation transformation.
Expression transformations are generally used to calculate variables. Try not to use complex nested
conditions, use decode instead. Functions are more expensive than operators; avoid using functions if the
same can be achieved by using operators. Implicit data type conversion is expensive. Try to convert data
types as little as possible. Working with numbers is generally faster then working with strings. Be creative in
rewriting complex expressions to the shortest possible length.
The use of a sequence generator transformation versus a database sequence depends on the load
method of the target table. If using bulk loading, database sequences cannot be used. The sequence
generator transformation can overcome this problem. Every row is given a unique sequence number.
Typically a number of values are cached for performance reasons.
There is however a big catch. Unused sequence numbers at the end of the session are lost. The next time
the session is run, the sequence generator will cache a new batch of numbers.
For example: a sequence generator caches 10000 values. 10000 rows are loaded, using the cached
values. At row 10001, a new batch of sequence values is cached: from 10000 to 20000. However the last
row in the session is 10002. All values between 10002 and 20000 are lost. Next time the session is ran, the first
inserted row will have a key of 20001.
To avoid these gaps use a sequence generator in combination with an unconnected lookup. First look up
the latest key value in the target table. Then use an expression that will call the sequence generator and
add one to the key value that was just retrieved. The sequence generator should restart numbering at
every run. There are some advantages to this approach:
2. if rows are deleted, gaps between key values will be caused by deletes and not by caching issues.
As an added advantage, the use of this method will prevent migration problems with persistent values
between repositories. This method is easy to implement and does not imply a performance penalty while
running the session.
Dealing with transformation errors in advance can save a lot of time while executing a session. By default,
every error row is written into a bad file, which is a text based file containing all error rows. On top of that
the error data is logged into the session log that, if sufficient transformation errors occur, will explode in size.
Both cause a lot of extra overhead and slow down a session considerably.
In general, it is better to capture all data quality issues that could cause transformation errors in advance
and write flawed records into an error table.
54
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
To really get into a session and understand exactly what is happening within a session, even more detailed
performance data than available into the session log can be captured. This can be done by, at session
level, enabling the checkbox ‘Collect performance data'.
This option will allow the developer to see detailed transformation based statistics in the Workflow Monitor
while running the session. When finished a performance file is written to the session log folder. For every
source qualifier and target definition performance details are provided, along with counters that show
performance information about each transformation.
2. readfromdisk/writetodisk: indicates not enough cache memory is available. Increase the cache
size until this counter is no longer shown.
55
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
The image part with relationship ID rId37 was not found in the file.
Memory Optimization
Memory plays an important role when the Integration Service is running sessions. Optimizing cache sizes
can really make a huge difference in performance.
Buffer memory is used to hold source and target data while processing and is allocated when the session is
initialized. DTM Buffer is used to create the internal data structures. Buffer blocks are used to bring data in
and out of the Integration Service. Increasing the DTM buffer size will increase the amount of blocks
available to the Integration Service. Ideally a buffer block can contain 100 rows at the same time.
You can configure the amount of buffer memory yourself or you can configure the Integration Service to
automatically calculate buffer settings at run time. In stead of calculating all values manually or by trial
and error, run the session once on auto and retrieve the correct values from the session log:
56
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
The Integration server uses the index and data caches for mostly active transformations: aggregator, joiner,
sorter, lookup, rank…
Configuring the correct amount of cache is really necessary as the Integration Server will write and read
from disk if not properly sized. The index cache should be about half of the data cache. Cache files should
be stored on a fast drive and surely not on a network share.
The easiest way of calculating the correct cache sizes is by keeping the defaults on auto and examining
the session log. In the session log a line is written for every lookup that looks like this:
TRANSF_1_1_1> SORT_40427 Sorter Transformation [srt_PRESTATIE] required 4-pass sort (1-pass temp I/O:
19578880 bytes). You may try to set the cache size to 27 MB or higher for 1-pass in-memory sort.
The maximum amount of memory used by transformation caches is set by two properties:
The smaller of the two is used. When the value is 0, the automatic memory attributes are disabled. If this
value is set too low, an error will occur if a lookup with manually configure cache wants to allocate more
memory then available. Keep in mind that sessions can run in parallel: every session will try to allocate RAM-
memory.
Ensure plenty of RAM-memory is available for the Integration Service. Do not assume that adding cache
memory will increase performance, at a certain point optimum performance is reached and adding
further memory will not be beneficial.
High precision
The high precision mode will allow using decimals up to a precision of 28 digits. Using this kind of precision
will result in a performance penalty in reading and writing data. It is therefore recommended to disable
57
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
high precision when not really needed. When turned off, decimals are converted to doubles that have a
precision up to 15 digits.
Concurrent sessions
Depending on the available hardware, sessions can be run concurrently instead of sequential. At
Integration Service level the number of concurrent sessions can be set. This value is set default to 10.
Depending on the number of CPU's at the PowerCenter server and at source and target database this
value can be increased or decreased. The next step is in designing a workflow that will launch a number of
sessions concurrently. By trial and error an optimal sampling can be achieved.
Session logging
The amount detail in a session log is determined by the tracing level. This level ranges from ‘Terse' to
‘Verbose Data'. For debugging or testing purposes the ‘Verbose Data' option will trace every row that
passes in the mapping in the session log. At terse, only initialization information, error messages, and
notification of rejected data are logged. It is quite clear the ‘Verbose Data' option causes a severe
performance penalty.
For lookups, use the ‘Additional Concurrent Pipelines for Lookup Creation' to start building lookups as soon
as the session is initialized. By the time the lookups are needed in the session, the cache creation hopefully
is already finished.
Partitioning
If a transformation thread is 100% busy, consider adding a partition point in the segment. Pipeline
partitioning will allow for parallel execution within a single session. A session will have multiple threads for
processing data concurrently. Processing data in pipeline partitions can improve performance, but only if
enough CPU's are available. As a rule of thumb, 1.5 CPU's should be available per partition. Adding a
partition point will increase the number of pipeline stages. This means a transformation will logically be used
a number of times, so remember to multiply the cache memory of transformations, session,..by the number
of partitions. Partitioning can be specified on sources / targets and the mapping transformations.
58
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Pushdown Optimization
Pushdown optimization will push the transformation processing to the database level without extracting the
data. This will reduce the movement of data when source and target are in the same database instance.
Possibly, more optimal specific database processing can be used to even further enhancing performance.
The metadata en lineage however is kept in PowerCenter.
1. Partial pushdown optimization to source: one or more transformations can be processed in the
source
2. Partial pushdown optimization to target: one or more transformations can be processed in the
target
3. Full pushdown optimization: all transformations can be processed in the database A number of
transformations are not supported for pushdown: XML, ranker, router, Normalizer, Update Strategy,
…
Pushdown optimization can be used with sessions with multiple partitions, if the partition types are pass-
through of key range partitioning. You can configure a session for pushdown optimization in the session
properties. Use the Pushdown Optimization Viewer to examine the transformations that can be pushed to
the database. Using pushdown requires the ‘Pushdown Optimization Option' in the PowerCenter license.
Architecture
64-Bit PowerCenter versions will allow better memory usage as the 2GB limitation is removed. When
PowerCenter 8 is run on a grid, the workflows can be configured to use resources efficiently and maximize
scalability. Within a grid, tasks are distributed to nodes. To improve performance on a grid, the network
bandwidth between the nodes is of importance as a lot of data is transferred between the nodes. This data
should always be stored on local disks for optimal performance. This includes the caches and any source
and target file. Of course even 64-bit computing will not help if the system is not properly setup. Make sure
plenty of disk space is available at the PowerCenter 8 server.
59
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
For optimal performance, consider running the Integration service in ASCII data movement mode when all
sources and targets use 7 or 8-bit ASCII as UNICODE can take up to 16 bits.
The repository database should be located on the PowerCenter machine. If not, the repository database
should be physically separated from any target or source database. This will prevent the same database
machine is writing to a target while reading from the repository. Always use native connections over ODBC
connections as they are a lot faster. Maximize the use of parallel operations on the database. The use of
parallelism will cut execution times considerably. Remove any other application from the PowerCenter
server apart from the repository database installation.
Increase the database network packet size to further improve performance. For Oracle this can be done in
the listener.ora and tnsnames.ora. Each database vender has some specific options that can be beneficial
for performance. For Oracle, the use of IPC protocol over TCP will result in a performance gain of at least
by a factor 2 to 6. Inter Process Control (IPC) will remove the network layer between the client and Oracle
database-server. This can only be used if the database is residing on the same machine as the
PowerCenter 8 server. Check the product documentation for further documentation.
By careful load monitoring of the target/source databases and the servers of PowerCenter and databases
while running a session, potential bottlenecks at database or system level can be identified. Perhaps the
database memory is insufficient? Perhaps too much swapping is occurring on the PowerCenter 8 server ?
Perhaps the CPU's are overloaded ?
The tuning of servers and database is just as important as delivering an optimized mapping and should not
be ignored. Tuning a system for a datawarehouse poses different challenges then tuning a system for an
OLTP-application. Try to involve DBA's and admins as soon as possible in this process so they fully
understand the sensitivities involved with data warehousing.
To start development on any data mart you should have the following things set up by the Informatica
Load Administrator
1. Informatica Folder. The development team in consultation with the BI Support Group can decide
a three-letter code for the project, which would be used to create the informatica folder as well
as Unix directory structure.
60
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Transformation Specifications
Before developing the mappings you need to prepare the specifications document for the mappings you
need to develop. A good template is placed in the templates folder You can use your own template as
long as it has as much detail or more than that which is in this template.
While estimating the time required to develop mappings the thumb rule is as follows.
1. Simple Mapping – 1 Person Day
2. Medium Complexity Mapping – 3 Person Days
3. Complex Mapping – 5 Person Days.
Usually the mapping for the fact table is most complex and should be allotted as much time for
development as possible.
Failure Notification
Once in production your sessions and batches need to send out notification when then fail to the Support
team. You can do this by configuring email task in the session level.
Output Ports – If organic data is created with a transformation that will be mapped to the target, make
61
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
sure that it has the same name as the target port that it will be mapped to.
Prefixed with: O_
Quick Reference
62
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Testing regimens:
1. Unit Testing
2. Functional Testing
3. System Integration Testing
4. User Acceptance Testing
Unit testing:The testing, by development, of the application modules to verify each unit (module) itself
meets the accepted user requirements and design and development standards
Functional Testing:The testing of all the application’s modules individually to ensure the modules, as
released from development to QA, work together as designed and meet the accepted user requirements
and system standards
System Integration Testing:Testing of all of the application modules in the same environment, database
instance, network and inter-related applications, as it would function in production. This includes security,
volume and stress testing.
User Acceptance Testing(UAT):The testing of the entire application by the end-users ensuring the
application functions as set forth in the system requirements documents and that the system meets the
business needs.
63
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
UTP Template:
2 Check for all select PRCHG_ID, Both the source and target Should be Pass Stev
the target PRCHG_DESC, table record values should same as the
columns DEPT_NBR, return zero records expected
whether they EVNT_CTG_CDE,
are getting PRCHG_TYP_CDE,
populated PRCHG_ST_CDE,
correctly with from T_PRCHG
source data. MINUS
select PRCHG_ID,
PRCHG_DESC,
DEPT_NBR,
EVNT_CTG_CDE,
PRCHG_TYP_CDE,
PRCHG_ST_CDE,
from PRCHG
3 Check for Identify a one record from It should insert a record into Should be Pass Stev
Insert strategy the source which is not in target table with source same as the
to load target table. Then run the data expected
records into session
target table.
64
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
UNIX
How strong you are in UNIX?
2) And if we suppose to process flat files using informatica but those files were exists in remote server then
we have to write script to get ftp into informatica server before start process those files.
3) And also file watch mean that if indicator file available in the specified location then we need to start
our informatica jobs otherwise will send email notification using
Mail X command saying that previous jobs didn’t completed successfully something like that.
4) Using shell script update parameter file with session start time and end time.
This kind of scripting knowledge I do have. If any new UNIX requirement comes then I can Google and get
the solution implement the same.
Basic Commands:
Cat file1 (cat is the command to create none zero byte file)
cat file1 file2 > all -----it will combined (it will create file if it doesn’t exit)
cat file1 >> file2---it will append to file 2
1. > will redirect output from standard out (screen) to file or printer or whatever you like.
65
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
ps -A
Crontab command
Crontab command is used to schedule jobs. You must have permission to run this command by Unix
Administrator. Jobs are scheduled in five numbers, as follows.
Minutes (0-59) Hour (0-23) Day of month (1-31) month (1-12) Day of week (0-6) (0 is Sunday)
so for example you want to schedule a job which runs from script named backup jobs in /usr/local/bin
directory on sunday (day 0) at 11.25 (22:25) on 15th of month. The entry in crontab file will be. * represents
all values.
25 22 15 * 0 /usr/local/bin/backup_jobs
who | wc -l
$ ls -l | grep '^d'
Pipes:
The pipe symbol "|" is used to direct the output of one command to the input of another.
mv file1 ~/AAA/ move file1 into sub-directory AAA in your home directory.
66
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
ls –a
find command
find -name aaa.txt Finds all the files named aaa.txt in the current directory or
find/ -name vimrc Find all the files named 'vimrc' anywhere on the system.
Find all files whose names contain the string 'xpilot' which
You can find out what shell you are using by the command:
echo $SHELL
67
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
#!/usr/bin/sh
Or
#!/bin/ksh
It actually tells the script to which interpreter to refer. As you know, bash shell has some specific functions
that other shell does not have and vice-versa. Same way is for perl, python and other languages.
It's to tell your shell what shell to you in executing the following statements in your shell script.
Interactive History
A feature of bash and tcsh (and sometimes others) you can use the up-arrow keys to access your previous
commands, edit them, and re-execute them.
Opening a file
Vi filename
Creating text
Edit modes: These keys enter editing modes and type in the text
of your document.
r Replace 1 character
R Replace mode
Deletion of text
68
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
:w!existing.file Overwrite an existing file with the file currently being edited.
:q Quit.
Solution:
Create a Workflow list file with “.lst” extension. Add all the workflows you might want to run in the
appropriate sequence.
Example: wfl_ProcessName.lst
Folder_name, Workflow_Name1
Folder_name, Workflow_Name2
Folder_name, Workflow_Name3
Create a Data File with the Workflow list and Number of Loops (in other words number of re-runs needed for
the Workflow list) as a comma separated file.
Example: EDW_ETLLOOP.dat
wfl_ProcessName1, 5
wfl_ProcessName2, 10
wfl_ProcessName3, 2
69
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
An added feature in the script is an Optional Termination file which can be created into the given
directory to force full termination the looping process. The advantage of Optional termination File is
that users can stop the looping process in case other Processes are being affected due to the looping
jobs.
Processing Step’s
1. Read the Parameter values and assign them to Variables.
2. Validate that Parameter is not null or empty string. If empty exit.
3. If Data file exist read the List file and Number of Loops.
1. If Job termination file exist then Exit
2. Else
1. Call W_CallWorkflow.sh script and pass <workflow list> as variable.
4. Loop previous step till the ‘n’ number of loops.
5. Remove the data file.
#!/bin/ksh
###########################################################################
# FileName: W_WorkflowLooping.ksh
# Parameters: Parm $1 = Looping File name (no extension)
###########################################################################
# Defines program variables
###########################################################################
DATA_FILE=$1
DATA_FILE_EXTN='dat'
LOG_FILE_SUFF='Log.log'
TERM_FILE_SUFF='TerminationInd.dat'
70
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
###########################################################################
# Check if the Data File Name is passed as a Parameter
###########################################################################
if [ -z $DATA_FILE ]
then
echo "!!! W_WorkflowLooping: $DATE ERROR - Data File Name Parameter not provided..!!!"
exit 1
fi
DATA_FILE_NAME=$DATA_FILE.$DATA_FILE_EXTN
LOG_FILE_NAME=$DATA_FILE$LOG_FILE_SUFF
JOB_TERMINATION_IND_FILE_NAME=$DATA_FILE$TERM_FILE_SUFF
DATA_FILE=/informatica/Loop_Dir/$DATA_FILE_NAME
LOG_FILE=/inofrmatica/Unix_Log/$LOG_FILE_NAME
JOB_TERMINATION_IND_FILE=/informatica/Loop_Dir/$JOB_TERMINATION_IND_FILE_NAME
###########################################################################
# Update the status and log file - script is starting.
###########################################################################
echo "***** Starting script $0 on `date`." >> $LOG_FILE
###########################################################################
# Check whether the data files exists
###########################################################################
if [ -s $DATA_FILE ]
then
while read member
do
wf_list_file_name=`echo $member | awk -F"," '{print $1}'`
loop_count=`echo $member | awk -F"," '{print $2}'`
while [ $loop_count -gt 0 ]
do
if [ -f $JOB_TERMINATION_IND_FILE ]
then
rm $JOB_TERMINATION_IND_FILE
# rm $DATA_FILE
echo "Indicator file for terminating the load found in /informatica/Loop_Dir/ on `date`" >>
$LOG_FILE
exit 0
fi
#############################################################################
# Executing the workflows
#############################################################################
informatica/Scripts/W_CallWorkflow.sh $wf_list_file_name
PMRETCODE=$?
71
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
if [ "$PMRETCODE" -ne 0 ]
then
echo "Error in $wf_name Load on `date`" >> $LOG_FILE
exit 1
fi
loop_count=`expr $loop_count - 1`
done
done<$DATA_FILE
else
echo "Source Parameter file $DATA_FILE is missing on `date`" >> $LOG_FILE
exit 1
fi
###########################################################################
# Updates the status and log file - script is ending.
###########################################################################
echo "***** Ending script $0 with no errors on `date`.\n" >> $LOG_FILE
rm $DATA_FILE
exit 0
This script requires Workflow list file name (without extension) as a parameter.
Processing Steps:
7. Read the Parameter values and assign them to Variables.
8. Validate that Parameter is not null or an empty string. If empty exit.
9. Validate that workflow list file exists and is not a zero byte. If yes then exit.
10. Assigning a name to restart and Workflow List Log file.
11. Read the folder name and Worklflow name from .lst file
1. If Restart file is not a zero byte
1. Then loop till the restarting workflow name matches the workflow list and then
execute the workflow with pmcmd command.
2. Else
1. Run the workflow with pmcmd command.
1. If any error occurs create a restart file and exit.
12. Loop Previous step till all the workflow from the lst file have been executed.
#!/bin/ksh
72
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
###########################################################################
# FileName: W_CallWorkFlow.sh
# Parameters: Parm $1 = Workflow List Filename (no extension)
#
# Purpose: Provides the ability to call the PMCMD command from the enterprise
# Scheduler or from Informatica Command Tasks.
#
# Warnings:
#
# Date: 08/28/2007
###########################################################################
########################### MODIFICATIONS LOG #############################
# Changed By Date Description
# ---------- -------- -----------
# Manish Kothari 08/15/2008 Initial Version
###########################################################################
#Include the environment file if any.
#. /scripts/xxx_env.ksh
###########################################################################
# Define Variables.
###########################################################################
DATE=`date '+ %Y-%m-%d %H:%M:%S'`
WORKFLOW_LIST_FILE=$1
WF_LIST_EXTN='lst'
WORKFLOW_LIST_DIR='informatica/WORKFLOW_LISTS/'
UNIXLOG_DIR='informatica/UNIXLOG_DIR'
INFA_REP='infarep:4400'
INFA_USER='USER_NAME'
INFA_PWD='INFA_PWD'
WF_LOG_FILE='informatica/LOGFILES_DIR'
###########################################################################
# Check if the WorkFlow List File Name is Passed as a Parameter
###########################################################################
if [ -z $WORKFLOW_LIST_FILE ]
then
echo "!!! W_CallWorkFlow: $DATE ERROR - Workflow List Parameter not provided..!!!"
exit 1
fi
WORKFLOW_LIST=$WORKFLOW_LIST_DIR/$WORKFLOW_LIST_FILE.$WF_LIST_EXTN
###########################################################################
# Make sure that the WorkFolw List File is a Valid File and is
# Not Zero Bytes
###########################################################################
if [ ! -s $WORKFLOW_LIST ]
73
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
then
echo "!!! W_CallWorkFlow: $DATE ERROR - Workflow List File does not exist or is Zero Bytes!!!!"
exit 1
fi
###########################################################################
# Define the Variables that will be used in the Script
###########################################################################
RESTART_FILE=$UNIXLOG_DIR/$WORKFLOW_LIST_FILE.rst
WF_LOG_FILE=$UNIXLOG_DIR/$WORKFLOW_LIST_FILE.log
RESTART_WF_FLAG=1
###########################################################################
# Check if a Re-Start File Exists. If it does it means that the script has
# started after failing on a previous run. Be Careful while modifying the
# contents of this file. The script if restarted will start running WF's from POF
###########################################################################
if [ -s $RESTART_FILE ]
then
###########################################################################
# If re-start file exists use the WF Name in the Re-start file to determine
# which Failed workflow from a previous run needs to be re-started
# Already completed WF's in a Workstream will be skipped
###########################################################################
74
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
fi
fi
fi
echo "W_CallWorkFlow: $DATE STARTING execution of Workflows $WF_NAME in $INFA_FOLDER using
$WORKFLOW_LIST_FILE" >> $WF_LOG_FILE
echo "\n" >> $WF_LOG_FILE
#-------------------------------------------------------------------------
# Call Informatica pmcmd command with defined parameters.
#-------------------------------------------------------------------------
pmcmd startworkflow -u $INFA_USER -p $INFA_PWD -s $INFA_REP -f $INFA_FOLDER -wait
$WF_NAME
PMRETCODE=$?
if [ "$PMRETCODE" -ne 0 ]
then
echo "!!! W_CallWorkFlow: $DATE ERROR encountered in $WF_NAME in $INFA_FOLDER \n" >>
$WF_LOG_FILE
echo "!!! W_CallWorkFlow: $DATE Restart file for this workstream is $RESTART_FILE \n" >>
$WF_LOG_FILE
###########################################################################
# Incase a WorkFlow Fails the WF Name and the INFA Folder are logged into
# the re-start File. If the script starts again the WF mentioned in
# this file will be started
###########################################################################
echo "$INFA_FOLDER,$WF_NAME" > $RESTART_FILE
exit 1
fi
rm $RESTART_FILE
done< $WORKFLOW_LIST
if [ -f $RESTART_FILE ]
then
echo "!!! Problem either in Restart File or Workflow List. Please make sure WorkFlow Names are
correct in both Places" >> $WF_LOG_FILE
exit 1
fi
75
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
sqlplus -s user@connection_string/password<<END
execute procedure UPD_WORKER_ATTR_FLAG;
exit;
END
INFORMATICA TRANSFORMATIONS
New features of INFORMATICA 9 compared to INFORMATICA 8.6
Informatica 9 empowers line-of-business managers and business analysts to identify bad data and fix it
faster. Architecture wise there are no differences between Informatica 8 and 9 but there are some new
features added in powercenter 9.
Informatica Administrator
The powercenter Administration Console has been renamed the Informatica Administrator.
The Informatica Administrator is now a core service in the Informatica Domain that is used to configure and
manage all Informatica Services, Security and other domain objects (such as connections) used by the
new services.
The Informatica Administrator has a new interface. Some of the properties and configuration tasks from the
powercenter Administration Console have been moved to different locations in Informatica Administrator.
The Informatica Administrator is expanded to include new services and objects.
Cache Update in Lookup Transformation
You can update the lookup cache based on the results of an expression. When an expression is true, you
can add to or update the lookup cache. You can update the dynamic lookup cache with the results of an
expression.
Database deadlock resilience
76
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
In previous releases, when the Integration Service encountered a database deadlock during a lookup, the
session failed. Effective in 9.0, the session will not fail. When a deadlock occurs, the Integration Service
attempts to run the last statement in a lookup. You can configure the number of retry attempts and time
period between attempts.
Multiple rows return
Lookups can now be configured as an Active transformation to return Multiple Rows.We can configure the
Lookup transformation to return all rows that match a lookup condition. A Lookup transformation is an
active transformation when it can return more than one row for any given input row.
Limit the Session Log
You can limit the size of session logs for real-time sessions. You can limit the size by time or by file size. You
can also limit the number of log files for a session.
Auto-commit
We can enable auto-commit for each database connection. Each SQL statement in a query defines a
transaction. A commit occurs when the SQL statement completes or the next statement is executed,
whichever comes first.
Passive transformation
We can configure the SQL transformation to run in passive mode instead of active mode. When the SQL
transformation runs in passive mode, the SQL transformation returns one output row for each input row.
Connection management
Database connections are centralized in the domain. We can create and view database connections in
Informatica Administrator, Informatica Developer, or Informatica Analyst. Create, view, edit, and grant
permissions on database connections in Informatica Administrator.
Monitoring
We can monitor profile jobs, scorecard jobs, preview jobs, mapping jobs, and SQL Data Services for each
Data Integration Service. View the status of each monitored object on the Monitoring tab of Informatica
Administrator.
Deployment
We can deploy, enable, and configure deployment units in the Informatica Administrator. Deploy
Deployment units to one or more Data Integration Services. Create deployment units in Informatica
Developer.
Model Repository Service
Application service that manages the Model repository. The Model repository is a relational database that
stores the metadata for projects created in Informatica Analyst and Informatica Designer. The Model
repository also stores run-time and configuration information for applications deployed to a Data.
Data Integration Service
Application service that processes requests from Informatica Analyst and Informatica Developer to preview
or run data profiles and mappings. It also generates data previews for SQL data services and runs SQL
queries against the virtual views in an SQL data service. Create and enable a Data Integration Service on
the Domain tab of Informatica Administrator.
XML Parser
The XML Parser transformation can validate an XML document against a schema. The XML Parser
transformation routes invalid XML to an error port. When the XML is not valid, the XML Parser transformation
routes the XML and the error messages to a separate output group that We can connect to a target.
Enforcement of licensing restrictions
Powercenter will enforce the licensing restrictions based on the number of CPUs and repositories.
77
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Also Informatica 9 supports data integration for the cloud as well as on premise. You can integrate the
data in cloud applications, as well as run Informatica 9 on cloud infrastructure.
Informatica Transformations
A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set
of transformations that perform specific functions. For example, an Aggregator transformation performs
calculations on groups of data.
Transformations can be of two types:
1. Active Transformation
An active transformation can change the number of rows that pass through the transformation,
change the transaction boundary, can change the row type. For example, Filter, Transaction
Control and Update Strategy are active transformations.
The key point is to note that Designer does not allow you to connect multiple active
transformations or an active and a passive transformation to the same downstream transformation
or transformation input group because the Integration Service may not be able to concatenate
the rows passed by active transformations However, Sequence Generator transformation(SGT) is
an exception to this rule. A SGT does not receive data. It generates unique numeric values. As a
result, the Integration Service does not encounter problems concatenating rows passed by a SGT
and an active transformation.
2. Passive Transformation.
A passive transformation does not change the number of rows that pass through it, maintains the
transaction boundary, and maintains the row type.
The key point is to note that Designer allows you to connect multiple transformations to the same
downstream transformation or transformation input group only if all transformations in the upstream
branches are passive. The transformation that originates the branch can be active or passive.
Transformations can be Connected or UnConnected to the data flow.
3. Connected Transformation
Connected transformation is connected to other transformations or directly to target table in the mapping.
4. UnConnected Transformation
An unconnected transformation is not connected to other transformations in the mapping. It is called
within another transformation, and returns a value to that transformation.
Aggregator Transformation
Aggregator transformation performs aggregate funtions like average, sum, count etc. on multiple rows or
groups. The Integration Service performs these calculations as it reads and stores data group and row data
in an aggregate cache. It is an Active & Connected transformation.
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Custom Transformation
It works with procedures you create outside the designer interface to extend PowerCenter functionality.
calls a procedure from a shared library or DLL. It is active/passive & connected type.
You can use CT to create T. that require multiple input groups and multiple output groups.
Custom transformation allows you to develop the transformation logic in a procedure. Some of the
PowerCenter transformations are built using the Custom transformation. Rules that apply to Custom
transformations, such as blocking rules, also apply to transformations built using Custom transformations.
PowerCenter provides two sets of functions called generated and API functions. The Integration Service
uses generated functions to interface with the procedure. When you create a Custom transformation and
generate the source code files, the Designer includes the generated functions in the files. Use the API
functions in the procedure code to develop the transformation logic.
Difference between Custom and External Procedure Transformation? In Custom T, input and output
functions occur separately.The Integration Service passes the input data to the procedure using an input
function. The output function is a separate function that you must enter in the procedure code to pass
output data to the Integration Service. In contrast, in the External Procedure transformation, an external
procedure function does both input and output, and its parameters consist of all the ports of the
transformation.
Expression Transformation
Passive & Connected. are used to perform non-aggregate functions, i.e to calculate values in a single row.
Example: to calculate discount of each product or to concatenate first and last names or to convert date
to a string field.
You can create an Expression transformation in the Transformation Developer or the Mapping Designer.
Components: Transformation, Ports, Properties, Metadata Extensions.
External Procedure
Passive & Connected or Unconnected. It works with procedures you create outside of the Designer
interface to extend PowerCenter functionality. You can create complex functions within a DLL or in the
COM layer of windows and bind it to external procedure transformation. To get this kind of extensibility, use
the Transformation Exchange (TX) dynamic invocation interface built into PowerCenter. You must be an
experienced programmer to use TX and use multi-threaded code in external procedures.
Filter Transformation
Active & Connected. It allows rows that meet the specified filter condition and removes the rows that do
not meet the condition. For example, to find all the employees who are working in NewYork or to find out
all the faculty member teaching Chemistry in a state. The input ports for the filter must come from a single
transformation. You cannot concatenate ports from more than one transformation into the Filter
transformation. Components: Transformation, Ports, Properties, Metadata Extensions.
HTTP Transformation
Passive & Connected. It allows you to connect to an HTTP server to use its services and applications. With
an HTTP transformation, the Integration Service connects to the HTTP server, and issues a request to retrieves
data or posts data to the target or downstream transformation in the mapping.
Authentication types: Basic, Digest and NTLM. Examples: GET, POST and SIMPLE POST.
79
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Java Transformation
Active or Passive & Connected. It provides a simple native programming interface to define transformation
functionality with the Java programming language. You can use the Java transformation to quickly define
simple or moderately complex transformation functionality without advanced knowledge of the Java
programming language or an external Java development environment.
Joiner Transformation
Active & Connected. It is used to join data from two related heterogeneous sources residing in different
locations or to join data from the same source. In order to join two sources, there must be at least one or
more pairs of matching column between the sources and a must to specify one source as master and the
other as detail. For example: to join a flat file and a relational source or to join two flat files or to join a
relational source and a XML source.
The Joiner transformation supports the following types of joins:
1. Normal
Normal join discards all the rows of data from the master and detail source that do not match,
based on the condition.
2. Master Outer
Master outer join discards all the unmatched rows from the master source and keeps all the rows
from the detail source and the matching rows from the master source.
3. Detail Outer
Detail outer join keeps all rows of data from the master source and the matching rows from the
detail source. It discards the unmatched rows from the detail source.
4. Full Outer
Full outer join keeps all rows of data from both the master and detail sources.
Limitations on the pipelines you connect to the Joiner transformation:
*You cannot use a Joiner transformation when either input pipeline contains an Update Strategy
transformation.
*You cannot use a Joiner transformation if you connect a Sequence Generator transformation directly
before the Joiner transformation.
Lookup Transformation
Default Passive (can be configured active) & Connected or UnConnected. It is used to look up data in a
flat file, relational table, view, or synonym. It compares lookup transformation ports (input ports) to the
source column values based on the lookup condition. Later returned values can be passed to other
transformations. You can create a lookup definition from a source qualifier and can also use multiple
Lookup transformations in a mapping.
You can perform the following tasks with a Lookup transformation:
*Get a related value. Retrieve a value from the lookup table based on a value in the source. For example,
the source has an employee ID. Retrieve the employee name from the lookup table.
*Perform a calculation. Retrieve a value from a lookup table and use it in a calculation. For example,
retrieve a sales tax percentage, calculate a tax, and return the tax to a target.
*Update slowly changing dimension tables. Determine whether rows exist in a target.
Lookup Components: Lookup source, Ports, Properties, Condition.
Types of Lookup:
1) Relational or flat file lookup.
2) Pipeline lookup.
80
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Normalizer Transformation
Active & Connected. The Normalizer transformation processes multiple-occurring columns or multiple-
occurring groups of columns in each source row and returns a row for each instance of the multiple-
occurring data. It is used mainly with COBOL sources where most of the time data is stored in de-
normalized format.
You can create following Normalizer transformation:
*VSAM Normalizer transformation. A non-reusable transformation that is a Source Qualifier transformation
for a COBOL source. VSAM stands for Virtual Storage Access Method, a file access method for IBM
mainframe.
*Pipeline Normalizer transformation. A transformation that processes multiple-occurring data from relational
tables or flat files. This is default when you create a normalizer transformation.
Components: Transformation, Ports, Properties, Normalizer, Metadata Extensions.
Rank Transformation
Active & Connected. It is used to select the top or bottom rank of data. You can use it to return the largest
or smallest numeric value in a port or group or to return the strings at the top or the bottom of a session sort
order. For example, to select top 10 Regions where the sales volume was very high or to select 10 lowest
priced products. As an active transformation, it might change the number of rows passed through it. Like if
you pass 100 rows to the Rank transformation, but select to rank only the top 10 rows, passing from the Rank
transformation to another transformation. You can connect ports from only one transformation to the Rank
transformation. You can also create local variables and write non-aggregate expressions.
Router Transformation
Active & Connected. It is similar to filter transformation because both allow you to apply a condition to test
data. The only difference is, filter transformation drops the data that do not meet the condition whereas
router has an option to capture the data that do not meet the condition and route it to a default output
group.
If you need to test the same input data based on multiple conditions, use a Router transformation in a
mapping instead of creating multiple Filter transformations to perform the same task. The Router
transformation is more efficient.
81
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Sorter Transformation
Active & Connected transformation. It is used sort data either in ascending or descending order according
to a specified sort key. You can also configure the Sorter transformation for case-sensitive sorting, and
specify whether the output rows should be distinct. When you create a Sorter transformation in a mapping,
you specify one or more ports as a sort key and configure each sort key port to sort in ascending or
descending order.
SQL Transformation
Active/Passive & Connected transformation. The SQL transformation processes SQL queries midstream in a
pipeline. You can insert, delete, update, and retrieve rows from a database. You can pass the database
connection information to the SQL transformation as input data at run time. The transformation processes
external SQL scripts or SQL queries that you create in an SQL editor. The SQL transformation processes the
query and returns rows and database errors.
Union Transformation
Active & Connected. The Union transformation is a multiple input group transformation that you use to
merge data from multiple pipelines or pipeline branches into one pipeline branch. It merges data from
multiple sources similar to the UNION ALL SQL statement to combine the results from two or more SQL
statements. Similar to the UNION ALL statement, the Union transformation does not remove duplicate rows.
Rules
1) You can create multiple input groups, but only one output group.
2) All input groups and the output group must have matching ports. The precision, datatype, and scale
must be identical across all groups.
3) The Union transformation does not remove duplicate rows. To remove duplicate rows, you must add
another transformation such as a Router or Filter transformation.
4) You cannot use a Sequence Generator or Update Strategy transformation upstream from a Union
transformation.
5) The Union transformation does not generate transactions.
Components: Transformation tab, Properties tab, Groups tab, Group Ports tab.
82
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
83
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Informatica Lookups
Lookup Cache
Problem:
For non-cached lookups, Informatica hits the database and bring the entire set of rows for each
record coming from the source. There is an impact in terms of time and resources. If there are 2
Million rows from the source qualifier, Informatica hits 2 Million times the database for the same
query.
Solution:
When a lookup is cached: Informatica queries the database, brings the whole set of rows to the
Informatica server and stores in a cache file. When this lookup is called next time, Informatica uses
the file cached. As a result, Informatica saves the time and the resources to hit the database
again.
When to cache a lookup?
As a general rule, we will use lookup cache when the following condition is satisfied:
N>>M
N is the number of records from the source
M is the number of records retrieved from the lookup
Note: Remember to implement database index on the columns used in the lookup condition to
provide better performance in non-cached lookups.
84
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
The image part with relationship ID rId37 was not found in the file.
From now onwards, the same cache file will be used in all the consecutive runs, saving time
building the cache file. However, the lookup data might change and then the cache must be
refreshed by either deleting the cache file or checking the option “Re-cache from lookup source”.
In case of using a lookup reusable in multiple mappings we will have 1 mapping with “Re-cache”
option enabled while others will remain with the “Re-cache” option disabled. Whenever the cache
needs to be refreshed we just need to run the first mapping.
Note:Take into account that it is necessary to ensure data integrity in long run ETL process when
underlying tables change frequently. Furthermore, Informatica Power Center is not able to create
larger files than 2GB. In case of a file exceeds 2GB, Informatica will create multiple cache files.
Using multiple files will decrease the performance. Hence, we might consider joining the lookup
source table in the database.
Unconnected lookup
Problem:
85
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Imagine the following mapping with 1,000,000 records retrieved from the Source Qualifier:
The image part with relationship ID rId37 was not found in the file.
For instance, an Expression transformation will contain a port with the following expression:
IIF (ISNULL (COUNTRY),
:LKP.LKP_COUNTRY (EMPLOYEE_ID), COUNTRY)
If the COUNTRY is null, then the lookup named LKP_COUNTRY is called with the parameter
EMPLOYEE_ID.
The ports in the look up transformation are COUNTRY and EMPLOYEE_ID, as well as the input port.
86
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
To sum up, it is possible to enhance Informatica lookups by using different set of configurations in
order to increase performance as well as save resources and time. However, before applying any
of the mentioned features, an analysis of the tables and the SQL queries involved needs to be
done.
Thread Statistics
1. Thread statistics reveal important information regarding how the session was run and how the
reader, writer and transformation threads were utilized.
2. Busy Percentage of each thread are published by INFA at the end of the file in the session log.
3. By adding partition points judiciously and running it again we can slowly zero in on the
transformation bottleneck.
4. Number of threads cannot be increased for reader and writer, but through partition points we can
increase the number of transformation threads
MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ] has completed: Total Run
.Time = [858.151535] secs, Total Idle Time = [842.536136] secs, Busy Percentage = [1.819655].
MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ] has completed:
Total Run Time = [857.485609] secs, Total Idle Time = [0.485609] secs, Busy Percentage = [100].
MASTER> PETL_24022 Thread [WRITER_1_1_1] created for the write stage of partition point(s) [Target Tables] has completed:
Total Run Time = [1573.351240] secs, Total Idle Time = [1523.193522] secs, Busy Percentage = [3.187954].
Bottleneck
87
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
1. The transformation thread is 100 % busy meaning that the bottleneck for performance lies in the
transformations.
2. If the Reader or Writer threads show 100% then there is a bottleneck in reading data from source or
writing data to targets
3. 100% is only relative. Always FOCUS on the thread that is the greatest among the Reader,
Transformation and Writer.
4. If the Busy percentage of all threads is less than 50% there may not be much of a bottleneck in the
mapping.
1. Aggregator Transformation
You can use the following guidelines to optimize the performance of an Aggregator transformation.
1. Use Sorted Input to decrease the use of aggregate caches:
The Sorted Input option reduces the amount of data cached during the session and improves session
performance. Use this option with the Source Qualifier Number of Sorted Ports option to pass sorted data to
the Aggregator transformation.
88
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
2. Filter Transformation
89
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
To maximize session performance, keep the Filter transformation as close as possible to the
sources in the mapping. Rather than passing rows that you plan to discard through the mapping, you
can filter out unwanted data early in the flow of data from sources to targets.
3. Joiner Transformation
3. Designate as the master source the source with the smaller number of records:
For optimal performance and disk storage, designate the master source as the source with the
lower number of rows. With a smaller master source, the data cache is smaller, and the search time is
shorter.
4. LookUp Transformation
Use the following tips when you configure the Lookup transformation:
90
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
If a Lookup transformation specifies several conditions, you can improve lookup performance
by placing all the conditions that use the equality operator first in the list of conditions that appear
under the Condition tab.
8. Un Select the cache look-up option in Look Up transformation if there is no look up over-ride. This
improves performance of session.
1. Provide the join condition in Source Qualifier Transformation itself as far as possible. If it is
compulsory, use a Joiner Transformation.
2. Use functions in Source Qualifier transformation itself as far as possible (in SQL Override.)
3. Don’t bring all the columns into the Source Qualifier transformation. Take only the necessary
columns and delete all unwanted columns.
4. Too many joins in Source Qualifier can reduce the performance. Take the base table and the first
level of parents into one join condition, base table and next level of parents into another and so
on. Similarly, there can be multiple data flows, which can either insert or insert as well as update.
5. Better to use the sorted ports in Source Qualifier Transformation to avoid the Sorter transformation.
10. Minimize aggregate function calls: SUM(A+B) will perform better than SUM(A)+SUM(B)
11. If you are using Aggregator & Lookup transformations, try to use Lookup after aggregation.
12. Don’t bring all the columns into the look up transformation. Take only the necessary columns and
delete all unwanted columns.
91
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
13. Using the more no of lookups reduces the performance. If there are 2 lookups, try to club them into
one using SQL override.
14. Use the Reusable Lookups on Dimensions for getting the Keys
15. Cache lookup rows if the number of rows in the lookup table is significantly less than the typical
number of source rows.
16. Share caches if several lookups are based on the same data set
18. If you use a Filter transformation in mapping, keep it as close as possible to the sources in mapping
and before the Aggregator transformation
20. Try to use proper index for the columns used in where conditions while searching.
22. Be careful while selecting the bulk load option. If bulk load is used, disable all constraints in pre-
session and enable them in post-session. Ensure that the mapping does not allow null, duplicates,
etc...
23. As far as possible try to convert procedures (functions) into informatica transformations.
24. Do not create multiple groups in Router (Like Error, Insert, Update etc), Try to Utilize the Default
Group.
25. Don't take two instances of the target table for insert/Update. Use Update Strategy Transformation
to achieve the same.
26. In case of joiners ensure that smaller tables are used as master tables
27. Configure the sorted input to the Joiner transformation to improve the session performance.
28. Use operators instead of functions since the Informatica Server reads expressions written with
operators faster than those with functions. For example, use the || operator instead of the
CONCAT () function.
29. Make sure data types should be Unique across mapping from Source to target
30. If you are using the bulk load option increase the commit interval
31. Check the source queries at the backend while developing mapping and store all queries used
during development separately so that it can be re-use during unit testing which saves time.
92
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
SESSION LOGS
Information that reside in a session log:
- Allocation of system shared memory
- Execution of Pre-session commands/ Post-session commands
- Session Initialization
- Creation of SQL commands for reader/writer threads
- Start/End timings for target loading
- Error encountered during session
- Load summary of Reader/Writer/ DTM statistics
Other Information
- By default, the server generates log files based on the server code page.
Thread Identifier
Ex: CMN_1039
Reader and Writer thread codes have 3 digit and Transformation codes have 4 digits. The number
following athread name indicate the following:
(a) Target load order group number
(b) Source pipeline number
(c) Partition number
(d) Aggregate/ Rank boundary number
Log File Codes
Error Codes Description
BR - Related to reader process, including ERP, relational and flat file.
CMN - Related to database, memory allocation
DBGR - Related to debugger
EP- External Procedure
LM - Load Manager
TM - DTM
REP - Repository
WRT - Writer
Load Summary
(a) Inserted
(b) Updated
(c) Deleted
(d) Rejected
Statistics details
(a) Requested rows shows the no of rows the writer actually received for the specified operation
(b) Applied rows shows the number of rows the writer successfully applied to the target (Without
Error)
(c) Rejected rows show the no of rows the writer could not apply to the target
(d) Affected rows shows the no of rows affected by the specified operation
Detailed transformation statistics
The server reports the following details for each transformation in the mapping
(a) Name of Transformation
(b) No of I/P rows and name of the Input source
(c) No of O/P rows and name of the output target
(d) No of rows dropped
93
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Tracing Levels
Normal - Initialization and status information, Errors encountered, Transformation errors, rows
skipped,
summarize session details (Not at the level of individual rows)
Terse - Initialization information as well as error messages, and notification of rejected data
Verbose Init - Addition to normal tracing, Names of Index, Data files used and detailed
transformation
statistics.
Verbose Data - Addition to Verbose Init, Each row that passes in to mapping detailed
transformation statistics.
NOTE
When you enter tracing level in the session property sheet, you override tracing levels configured
for
transformations in the mapping.
Session Failures and Recovering Sessions
Two types of errors occurs in the server
- Non-Fatal
- Fatal
(a) Non-Fatal Errors
It is an error that does not force the session to stop on its first occurrence. Establish the error
threshold in the
session property sheet with the stop on option. When you enable this option, the server counts Non-
Fatal errors
that occur in the reader, writer and transformations.
Reader errors can include alignment errors while running a session in Unicode mode.
Writer errors can include key constraint violations, loading NULL into the NOT-NULL field and
database errors.
Transformation errors can include conversion errors and any condition set up as an ERROR,. Such as
NULL
Input.
(b) Fatal Errors
This occurs when the server can not access the source, target or repository. This can include loss of
connectionor target database errors, such as lack of database space to load data.
If the session uses normalizer (or) sequence generator transformations, the server can not update
thesequence values in the repository, and a fatal error occurs.
© Others
Usages of ABORT function in mapping logic, to abort a session when the server encounters a
transformation
error.
Stopping the server using pmcmd (or) Server Manager
Performing Recovery
- When the server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and notes the
rowed of the last row commited to the target database. The server then reads all sources again
and startsprocessing from the next rowid.
- By default, perform recovery is disabled in setup. Hence it won’t make entries in
OPB_SRVR_RECOVERYtable.
- The recovery session moves through the states of normal session schedule, waiting to run,
Initializing,running, completed and failed. If the initial recovery fails, you can run recovery as many
times.
- The normal reject loading process can also be done in session recovery process.
- The performance of recovery might be low, if
o Mapping contain mapping variables
94
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
95
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
96
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
from files faster than they can the SQL commands to insert the same data into the database.
Method:
When a session used External loader, the session creates a control file and target flat file. The
control file contains
information about the target flat file, such as data format and loading instruction for the External
Loader. The control
file has an extension of “*.ctl “ and you can view the file in $PmtargetFilesDir.
For using an External Loader:
The following must be done:
- configure an external loader connection in the server manager
- Configure the session to write to a target flat file local to the server.
- Choose an external loader connection for each target file in session property sheet.
Issues with External Loader:
- Disable constraints
- Performance issues
o Increase commit intervals
o Turn off database logging
- Code page requirements
- The server can use multiple External Loader within one session (Ex: you are having a session with
the two
target files. One with Oracle External Loader and another with Sybase External Loader)
Other Information:
- The External Loader performance depends upon the platform of the server
- The server loads data at different stages of the session
- The serve writes External Loader initialization and completing messaging in the session log.
However,
details about EL performance, it is generated at EL log, which is getting stored as same target
directory.
- If the session contains errors, the server continues the EL process. If the session fails, the server
loadspartial target data using EL.
- The EL creates a reject file for data rejected by the database. The reject file has an extension of
“*.ldr”reject.
- The EL saves the reject file in the target file directory
- You can load corrected data from the file, using database reject loader, and not through
Informatica reject
load utility (For EL reject file only)
Configuring EL in session
- In the server manager, open the session property sheet
- Select File target, and then click flat file options
Caches
- Server creates index and data caches in memory for aggregator ,rank ,joiner and Lookup
transformation in a
mapping.
- Server stores key values in index caches and output values in data caches : if the server requires
more
memory ,it stores overflow values in cache files .
- When the session completes, the server releases caches memory, and in most circumstances, it
deletes
the caches files .
- Caches Storage overflow :
- releases caches memory, and in most circumstances, it deletes the caches files .
Caches Storage overflow :
97
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
98
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
99
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
- Normalizer
- Pre/Post session stored procedure
- Target definitions
- XML source definitions
Types of Mapplets:
(a) Active Mapplets - Contains one or more active transformations
(b) Passive Mapplets - Contains only passive transformation
Copied mapplets are not an instance of original mapplets. If you make changes to the original,
the copy does not inherit
your changes
You can use a single mapplet, even more than once on a mapping.
Ports
Default value for I/P port - NULL
Default value for O/P port - ERROR
Default value for variables - Does not support default values
Session Parameters
This parameter represent values you might want to change between sessions, such as DB Connection or
source file.
We can use session parameter in a session property sheet, then define the parameters in a session
parameter file.
The user defined session parameter are:
(a) DB Connection
(b) Source File directory
(c) Target file directory
(d) Reject file directory
Description:
Use session parameter to make sessions more flexible. For example, you have the same type of
transactional data
written to two different databases, and you use the database connections TransDB1 and TransDB2 to
connect to the
databases. You want to use the same mapping for both tables.
Instead of creating two sessions for the same mapping, you can create a database connection
parameter, like
$DBConnectionSource, and use it as the source database connection for the session.
When you create a parameter file for the session, you set $DBConnectionSource to TransDB1 and run the
session.
After it completes set the value to TransDB2 and run the session again.
NOTE:
You can use several parameter together to make session management easier.
Session parameters do not have default value, when the server can not find a value for a session
parameter, it fails to
initialize the session.
Session Parameter File
- A parameter file is created by text editor.
- In that, we can specify the folder and session name, then list the parameters and variables used in the
session and assign each value.
- Save the parameter file in any directory, load to the server
- We can define following values in a parameter
o Mapping parameter
o Mapping variables
o Session parameters
100
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
- You can include parameter and variable information for more than one session in a single parameter file
by
creating separate sections, for each session with in the parameter file.
- You can override the parameter file for sessions contained in a batch by using a batch parameter file. A
batch parameter file has the same format as a session parameter file
Locale
Informatica server can transform character data in two modes
(a) ASCII
a. Default one
b. Passes 7 byte, US-ASCII character data
(b) UNICODE
a. Passes 8 bytes, multi byte character data
b. It uses 2 bytes for each character to move data and performs additional checks at session level, to
ensure data integrity.
Code pages contains the encoding to specify characters in a set of one or more languages. We can
select a code page,
based on the type of character data in the mappings.
Compatibility between code pages is essential for accurate data movement.
The various code page components are
- Operating system Locale settings
- Operating system code page
- Informatica server data movement mode
- Informatica server code page
- Informatica repository code page
Locale
(a) System Locale - System Default
(b) User locale - setting for date, time, display
© Input locale
Mapping Parameter and Variables
These represent values in mappings/mapplets.
If we declare mapping parameters and variables in a mapping, you can reuse a mapping by altering the
parameter
and variable values of the mappings in the session.
This can reduce the overhead of creating multiple mappings when only certain attributes of mapping
needs to be
changed.
When you want to use the same value for a mapping parameter each time you run the session.
Unlike a mapping parameter, a mapping variable represent a value that can change through the session.
The
server saves the value of a mapping variable to the repository at the end of each successful run and used
that value
the next time you run the session.
Mapping objects:
Source, Target, Transformation, Cubes, Dimension
Debugger
We can run the Debugger in two situations
(a) Before Session: After saving mapping, we can run some initial tests.
(b) After Session: real Debugging process
MEadata Reporter:
- Web based application that allows to run reports against repository metadata
- Reports including executed sessions, lookup table dependencies, mappings and source/target schemas.
101
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Repository
Types of Repository
(a) Global Repository
a. This is the hub of the domain use the GR to store common objects that multiple developers can use
through shortcuts. These may include operational or application source definitions, reusable
transformations, mapplets and mappings
(b) Local Repository
a. A Local Repository is with in a domain that is not the global repository. Use4 the Local Repository for
development.
© Standard Repository
a. A repository that functions individually, unrelated and unconnected to other repository
NOTE:
- Once you create a global repository, you can not change it to a local repository
- However, you can promote the local to global repository
Batches
- Provide a way to group sessions for either serial or parallel execution by server
- Batches
o Sequential (Runs session one after another)
o Concurrent (Runs sessions at same time)
Nesting Batches
Each batch can contain any number of session/batches. We can nest batches several levels deep,
defining batcheswithin batches
Nested batches are useful when you want to control a complex series of sessions that must run sequentially
orconcurrently
Scheduling
When you place sessions in a batch, the batch schedule override that session schedule by default.
However, wecan configure a batched session to run on its own schedule by selecting the “Use Absolute
Time Session” Option.
Server Behavior
Server configured to run a batch overrides the server configuration to run sessions within the batch. If you
havemultiple servers, all sessions within a batch run on the Informatica server that runs the batch.
The server marks a batch as failed if one of its sessions is configured to run if “Previous completes” and that
previous session fails.
Sequential Batch
If you have sessions with dependent source/target relationship, you can place them in a sequential batch,
so that
Informatica server can run them is consecutive order.
They are two ways of running sessions, under this category
(a) Run the session, only if the previous completes successfully
(b) Always run the session (this is default)
Concurrent Batch
In this mode, the server starts all of the sessions within the batch, at same time
Concurrent batches take advantage of the resource of the Informatica server, reducing the time it takes to
run thesession separately or in a sequential batch.
Concurrent batch in a Sequential batch
If you have concurrent batches with source-target dependencies that benefit from running those batches
in aparticular order, just like sessions, place them into a sequential batch.
Stopping and aborting a session
- If the session you want to stop is a part of batch, you must stop the batch
102
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
103
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Mapping
- Optimize data type conversions.
- Eliminate transformation errors.
- Optimize transformations/ expressions.
Session:
- concurrent batches.
- Partition sessions.
- Reduce error tracing.
- Remove staging area.
- Tune session parameters.
System:
- improve network speed.
- Use multiple preservers on separate systems.
- Reduce paging.
Example Walkthrough
1. Go to Mappings Tab, Click Parameters and Variables Tab, Create a NEW port as below.
104
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Click Properties Tab, In Source Filter area, ENTER the following Expression.
UpdateDateTime (Any Date Column from source) >= '$$LastRunTime'
AND
UPDATEDATETIME < '$$$SESSSTARTTIME'
iif(isnull(AgedDate),to_date('1/1/1900','MM/DD/YYYY'),trunc(AgedDate,'DAY'))
Second, drag and drop these two columns into UPDATE Strategy Transformation.
Check the Value coming from source (EMPID_IN) with the column in the target table (EMPID). If both are
equal this means that the record already exists in the target. So we need to update the record
(DD_UPDATE). Else will insert the record coming from source into the target (DD_INSERT). See below for
UPDATE Strategy expression.
Note:Always the Update Strategy expression should be based on Primary keys in the target table.
6. EXPRESSION TRANSFORMATION
7. FILTER CONDITION
105
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
To pass only NOT NULL AND NOT SPACES VALUES THROUGH TRANSFORMATION.
IIF ( ISNULL(LENGTH(RTRIM(LTRIM(ADSLTN)))),0,LENGTH(RTRIM(LTRIM(ADSLTN))))>0
iif(isnull(USER_NAME),FALSE,TRUE)
106
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
SENARIOS
1. Using indirect method we can load files with same structure. How to load file name in the
database.
Input files
File1.txt
Andrew|PRES|Addline1|NJ|USA
Samy|NPRS|Addline1|NY|USA
File2.txt
Bharti|PRES|Addline1|KAR|INDIA
Ajay|PRES|Addline1|RAJ|INDIA
Bhawna|NPRS|Addline1|TN|INDIA
In database want to load the file name
File Name Name Type Address Line State Country
File1.txt Andrew PRES Addline1 NJ USA
File1.txt Samy NPRS Addline1 NY USA
107
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
2. How to separate the duplicate in 1 target and unique only to another target
1|Piyush|Patra|
2|Somendra|Mohanthy
3|Santhosh|bishoyi
1|Piyush|Patra|
2|Somendra|Mohanthy
108
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
O/P
File1
1|Piyush|Patra|
2|Somendra|Mohanthy
File2
3|Santhosh|bishoyi
Solution:
This can be done this the help of Aggregator.
Group by the columns on which you want to decide duplicate or unique
Port Expression Group by
ID Yes
FName Yes
LName Yes
Count count(id)
In router condition for duplicate: Count >1 (or unique: Count =1)
109
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Sequence Generator
Use this to generate the record number from 1 to 4
Set following properties
Expression:
Use this to get the next value from Sequence generator
Router:
Use this to redirect output to 4 targets based on the group property
110
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
111
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Lalsingh Bharti
Poornaya Cherukmala
Rajeev TK
Solution:
Option1
You can assign serial number to Source data,then group by serial number and take write to target
Expression
Use expression to assign the same serial number to first_name and last_name
112
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
113
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
In Data mart records are loaded on the same date 4:00 PM, records should be loaded as below
First Name Last Name Address Effective date End date
Srini Reddy Cegedim, Bangalore 01-01-2011 14:00:00 01-01-2011 14:00:10
114
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Solution:
If you u use static lookup then for srini Reddy only one record will be loaded. Because lookup will return
only one value
First Name Last Name Address Effective date End date
Srini Reddy Cegedim, Bangalore 01-01-2011 14:00:00
Solution1:
Using SQL Query
Select a.Family_ID,a.Person_Id,a.Person_name,(a.Earning/b.Tot_Sal)*100 as cont_To_Family from
(Select sum (Earning) as Tot_Sal,Family_ID from family_income group by Family_ID)b,family_income
a where a.Family_ID=b.Family_ID
Solution2:
This can be done this the help of Joiner and Aggregator.
Port Expression Group by
115
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Family_id1 Yes
Sal
Tot_sal Sum(sal)
Port Master/Detail
Family_id Detail
Tot_Sal Detail
Person_id Master
Family_id Master
Person_name Master
Sal Master
Use the Join condtion
Family_id1=family_id
Use the expression to get the calculation
Port Epression
Contribution (sal/Tot_sal)*100
116
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
I/P
ID Month Sales
1 Jan 100
1 Feb 120
1 March 135
2 Jan 110
2 Feb 130
2 March 120
O/P
ID Jan Feb March
1 100 120 135
2 110 130 120
Select
ID,
Max (decode (Month, 'Jan', Sales)) as Jan,
Max (decode (Month, 'Feb', Sales)) as Feb,
Max (decode (Month, 'March', Sales)) as March
From (Select ID, Sales, Month from Normalized)
group by ID
Solution2:
117
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
8. Data Scenario
When multiple records are coming from the source when join with another table for a single input.
Sample Logic:
1. The logic has been used to find a single valid crcust number from the source.
3. The crcust number will be pulled from the CUSTMR table by the normal join on TRAN table based
on the fields gtkey, gtcmpy and gtbrch.
4. If multiple records are coming from the custmr table for a single combination of gtkey, gtcmpy and
gtbrch then we can do lookup on the customer table based on crcust number from custmr table
and outlet_status in customer table should be in (‘1’ or spaces) or if we are getting only one crcust
number for a single combination then we can use that valid crcust number.
118
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
5. If we are getting only one crcust number from customer table then we can process that crcust
number or if we are getting multiple crcust numbers then we have to use filter, sorter and
aggregator transformations to get the valid crcust number without max or min on the multiple
records in the customer table.
The following query that has used to retrieve the source records from the first source table. (AS400
Environment)
Select
tran.gtdvsn,
tran.gtamnt,
tran.gtpsdt,
tran.gtpsmt,
tran.gtpsyr,
tran.gtlact,
tran.gtbrch
From
Npc.tran Tran
Where
tran.gtcmpy = 300
Source data:
119
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
The following source query has used to retrieve the second table source records (AS400 Environment):
When there is a single input from the TRAN table, CUSTMR table populating multiple records (Normal Join).
SELECT
CUSTMR.CRCUST as CRCUST,
CUSTMR.CRPCUS as CRPCUS,
SUBSTR (CUSTMR.CREXT1, 1, 3) as CREXT1,
CUSTMR.CRPBRC as CRPBRC,
A.COUNT
FROM
NMC.CUSTMR , (SELECT COUNT (*) COUNT,
CUSTMR.CRPCUS as CRPCUS,
SUBSTR (CUSTMR.CREXT1, 1, 3) as CREXT1,
CUSTMR.CRPBRC as CRPBRC
FROM
NMC.CUSTMR CUSTMR
WHERE
SUBSTR (CUSTMR.CREXT1, 1, 3) = '300'
GROUP BY
CUSTMR.CRPCUS,
SUBSTR (CUSTMR.CREXT1, 1, 3),
CUSTMR.CRPBRC) A
WHERE
CUSTMR.CRPCUS=A.CRPCUS and
SUBSTR (CUSTMR.CREXT1, 1, 3) =A.CREXT1 AND
CUSTMR.CRPBRC=A.CRPBRC AND
SUBSTR (CUSTMR.CREXT1, 1, 3) = '300'
AND CUSTMR.CRCUST IN ('0045907','0014150')
The below detail outer join on customer table has been used to get the valid crcust number when we are
getting multiple crcust numbers after the normal join.
The master table CUSTMR has joined with the detail table CUSTOMER based on CRCUST field (detail outer
join).
SELECT
DISTINCT LPAD (AR_NUM, 7,'0') AS AR_NUM
FROM
120
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
CUSTOMER
WHERE
LPAD (AR_NUM, 7,'0') IN ('0014150','0045907')
AR_NUM
0014150
The valid crcust number '0014150' will be processed and populated to the target.
We will see the source data set when multiple records have found in customer table, and when we should
not use max or min on the crcust number.
Source data set:
SELECT
TRAN.GTDVSN,
TRAN.GTAMNT,
TRAN.GTPSDT,
TRAN.GTPSMT,
TRAN.GTPSYR,
TRAN.GTLACT,
SUBSTR (DIGITS (TRAN.GTKEY), 1, 7) AS GTKEY,
DIGITS (TRAN.GTCMPY) AS GTCMPY,
TRAN.GTBRCH
FROM
NMP.TRAN TRAN
WHERE
TRAN.GTCMPY = 300
AND TRAN.GTAMNT IN (1030.5, 558.75, 728)
121
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
SELECT CUSTMR.CRCUST,
CUSTMR.CRPCUS as CRPCUS,
SUBSTR (CUSTMR.CREXT1, 1, 3) as CREXT1,
CUSTMR.CRPBRC as CRPBRC
FROM
NMC.CUSTMR CUSTMR
WHERE
SUBSTR (CUSTMR.CREXT1, 1, 3) = '300'
AND CUSTMR.CRPCUS= '0006078'
SELECT
DISTINCT LPAD (AR_NUM, 7,'0') AS AR_NUM
FROM
CUSTOMER
WHERE
LPAD (AR_NUM, 7,'0') IN ('0001877','0002392','0041271')
AR_NUM
0001877
0041271
The crcust numbers '0001877' and '0041271' are valid among those three crcust numbers. But the mapping
should populate only one crcust number among these two valid crcust numbers.
The below logic has been used to select a one valid crcust number.
122
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Filter transformation:
The inputs for the filter transformation are coming from the joiner transformation which have done the detail
outer join on the master table custmr and the detail table customer table based on the crcust number.
Filter condition:
COUNT_CRCUST=1 represents that the records from custmr table which have only one valid crcust number
for a single combination of gtkey, gtcmpy and gtbrch fields and that crcust number may not be present in
customer table.
COUNT_CRCUST<>1 AND NOT ISNULL (AR_NUM) represents that the records which have more than one
crcust number for a single combination of gtkey, gtcmpy and gtbrch fields and the ar_num from customer
table should not be null(It means that the multiple crcust numbers should have present in customer table
also).
The filter transformation is used to filter the records which are the records that have the valid crcust
numbers.
123
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Sorter Transformation:
The sorter transformation is used to sort the records in descending order based on crcust number, crpcus,
crext1 and crpbrc. In the next aggregator transformation we are going to use min on outlet_status field, we
will loose the multiple records for that we are sorting the records. Till this sorter transformation we have
processed the records from customer table when all kinds of outlet_status.
Aggregator transformation:
The aggregator transformation has used to eliminate the crcust numbers which has the outlet_status not in
(‘1’ or spaces) and has used to group by the crcust numbers based on the source fields crpcus, crext1 and
crpbrc.
124
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
The outlet_status has transformed as ‘1’ when outlet_status is in ‘1’ or spaces else that will be transformed as
‘9’. From this we are taking a min outlet_status to consider only the records which have the outlet_status is
in ‘1’ or spaces for the target.
125
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Then we can join these records with the source table TRAN to get the valid crcust number.
126
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
This joiner transformation is used to join the records from both the inputs from the aggregator transformation
and TRAN source qualifier transformation.
127
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Using this logic, we can find a single crcust number for a single combination of inputs from the source TRAN.
128
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
6. Data Scenario
When more than two continuous joiner transformations have to be used in the mapping and when the
source tables are belongs to the same database for those joins.
Sample Logic:
1. The tables that have used in the logic are SALES, PURCXREF and SALES_INDEX. All these tables are
belongs to the same database.
2. The fields to the target from source are historical_business_unit, item_code, order_qty, ship_date,
grp_code and event_order_ind.
3. This logic is applicable for only for the records which have the SALES.historical_business_unit is in
‘BH’ and ‘BW’.
5. If join to PURCXREF is successful, then perform required join to SALES_INDEX table when
PURCXREF.ITEM_OD = SALES_INDEX.ITEM_ID.
6. If join to PURCXREF is not successful then perform required join to SALES where SALES_INDEX.ITEM_ID
= SALES.ITEM_ID.
7. The logic that have used for item_code is that concatenation of style_num, color_code,
attribution_code and size_num from SALES_INDEX when join to PURCXREF is successful or not.
8. The logic that have used for order_qty is that SUM (PURCXREF.ZCOMP_QTY *
SALES.ORINGINAL_ORDER_UNITS) when the join to PURCXREF is successful and SUM
(SALES.ORIGINAL_ORDER_UNITS) when the join to PURCXREF is not successful.
The below sql query is used to reduce the number of joiners in the mapping.
Select
S.HISTORICAL_BUSINESS_UNIT,
S.ORIGINAL_ORDER_UNITS * PCX.ZCOMP_QTY
ELSE
129
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
S.ORIGINAL_ORDER_UNITS
END),0) ORDER_QTY,
S. SHIP_DATE,
S.GRP_CDE,
S.EVENT_ORDER_IND
FROM
OPC.SALES S
p.af_grid,
p.zcmp_grid,
p.zcomp_qty,
p.material,
p.component,
p.zitemid,
p.compitemid
FROM OPC.PURCXREF p
GROUP BY p.af_grid,
p.zcmp_grid,
p.zcomp_qty,
p.material,
p.component,
p.zitemid,
p.compitemid) PCX
ON S.ITEM_ID = PCX.ZITEMID
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
S.ITEM_ID
GROUP BY
S.HISTORICAL_BUSINESS_UNIT,
S.REQUESTED_SHIP_DATE,
S.GRP_CDE,
S.EVENT_ORDER_IND,
9. How to populate 1st record to 1st target ,2nd record to 2nd target ,3rd record to 3rd target and 4th
record to 1st target through informatica?
We can do using sequence generator by setting end value=3 and enable cycle option.then in the router
take 3 goups
In 1st group specify condition as seq next value=1 pass those records to 1st target simillarly
In 2nd group specify condition as seq next value=2 pass those records to 2nd target
In 3rd group specify condition as seq next value=3 pass those records to 3rd target.
Since we have enabled cycle option after reaching end value sequence generator will start from 1,for the
4th record seq.next value is 1 so it will go to 1st target.
131
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
In order to generate the target file names from the mapping, we should make use of the special "FileName"
port in the target file. You can't create this special port from the usual New port button. There is a special
button with label "F" on it to the right most corner of the target flat file when viewed in "Target Designer".
When you have different sets of input data with different target files created, use the same instance, but
with a Transaction Control transformation which defines the boundary for the source sets.
in target flat file there is option in column tab i.e filename as column.
when you click that one non editable column gets created in metadata of target.
in transaction control give condition as iif(not isnull(emp_no),tc_commit_before,continue) else
tc_commit_before
map the emp_no column to target's filename column
ur mapping will be like this
source -> squlf-> transaction control-> target
run it ,separate files will be created by name of Ename
Target:
Ename EmpNo
Stev methew 100
John tom 101
Lookup (Dynamic):
Condition: EMPNO=IN_EMPNO
Click on “Output Old Value On Update “
Filter:
Filter Condition: NewLookupRow=2
132
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Expression:
V_ENAME=LKP_ENAME || ' ' || IN_ENAME
O_ENAME= V_ENAME
Expression:
V_ENAME=IIF (EMPNO = V_PREV_EMPNO, V_PREV_ENAME || ' ' || ENAME)
O_ENAME= V_ENAME
V_PREV_ENAME= ENAME
V_PREV_EMPNO= EMPNO
Filter:
Filter Condition: O_ENAME != ''
ONE MORE SOLUTION I HAVA SEE MAPPING M_PG124 but only for two records
12. How to send Unique (Distinct) records into One target and duplicates into another tatget?
M_125_12
Source:
Ename EmpNo
stev 100
Stev 100
john 101
Mathew 102
Output:
Target_1:
Ename EmpNo
Stev 100
John 101
Mathew 102
Target_2:
133
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Ename EmpNo
Stev 100
Lookup (Dynamic):
Condition: EmpNo= IN_EmpNo
Router:
If it is first time then sent it to first target_1
Group filter condition: NewLookupRow=1
134
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Sorter:
Sort EmpNo column as Ascending
Expression:
V_Seq_ID=1
V_ EmpNo=IIF (EmpNo! = V_Prev_ EmpNo, 1, V_Seq_ID+1)
V_Prev_ EmpNo= EmpNo
O_ EmpNo= V_ EmpNo
Router:
If it is first time then sent it to first target_1
Group filter condition: O_ EmpNo=1
13. How to process multiple flat files to single target table through informatica if all files are same
structure?
We can process all flat files through one mapping and one session using list file.
First we need to create list file using unix script for all flat file the extension of the list file is .LST.
The list files it will have only flat file names.
At session level we need to set source file directory as list file path
And source file name as list file name
And file type as indirect.
14. How to populate file name to target while loading multiple files using list file concept.
In informatica 8.6 by selecting Add currently processed flatfile name option in the properties tab of source
definition after import source file defination in source analyzer.It will add new column as currently
processed file name.we can map this column to target to populate filename.
135
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
15. If we want to run 2 workflow one after another(how to set the dependence between wf’s)
1. If both workflow exists in same folder we can create 2 worklet rather than creating 2 workfolws.
2. Finally we can call these 2 worklets in one workflow.
3. There we can set the dependency.
4. If both workflows exists in different folders or repository then we cannot create worklet.
5. We can set the dependency between these two workflow using shell script is one approach.
6. The other approach is event wait and event rise.
If both workflow exists in different folrder or different rep then we can use below approaches.
Using shell script
1. As soon as first workflow get completes we are creating zero byte file (indicator file).
2. If indicator file is available in particular location. We will run second workflow.
3. If indicator file is not available we will wait for 5 minutes and again we will check for the indicator.
Like this we will continue the loop for 5 times i.e 30 minutes.
4. After 30 minutes if the file does not exists we will send out email notification.
Event wait and Event rise approach
We can put event wait before actual session run in the workflow to wait a indicator file if file available then
it will run the session other event wait it will wait for infinite time till the indicator file is available.
136
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
6. I have 1000 records in my source table, the same i have in target ,but a new column added in
target as "batchno", and this column adds no 10 for 1st 100 records and 20 for next 100 records and
30 next 100 records and vice versa. How to acheive this? M_test_seq
Firstly it should have a primary key(unique key in target) on the target table then use update startegy t/r to
update the targert.
Either use Seq or a variable as counter
Option1:
In Sequence Generator:
Give Start Value = 1
End value = 100
Check the Properties as CYCLE.
Give the following condition In Expression Transformation:
O_SEQUENCE= NEXT_VAL
v_COUNT = IIF (O_SEQUENCE = 1, v_COUNT+10)
7. How can we call a stored procedure at session level under Target post SQL
We can call the procedures in a anonymous block within pre-sql /post -sql using the following syntax:
begin procedure_name()\; end\;
The data is ordered based on applied_date. When the latest record status is FALSE for a trad_id the very
next previous record values should be passed to target.
Output required:
200 53.98 TRUE EUR 10/25/2013 19:37
137
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Solution1:
Sorter:
TRADE_ID=Ascending
STATUS= Ascending
Aggregator:
TRADE_ID=GroupBy
O_AMOUNT= IIF (LAST
LAST (STATUS) = 'FALSE', LAST (AMOUNT, STATUS! = 'FALSE'), LAST (AMOUNT))
Solution2:
SQ:
Source Filter=LATEST_RECORD.STATUS='TRUE'
Aggregator:
TRADE_ID=GroupBy
9. I have a input column which has alpha numeric and special character of data type string, I want
only numbers [0-9] from it, in my output portm_130_9
10. Input flatfile1 has 5 columns, flatfile2 has 3 columns (no common column) output should contain
merged data(8 columns) How to achieve ?
Take first file's 5 column to a expressiontransformation,adda output column in it let say 'A'. Create
a mapping variable let say 'countA' having datatypeinteger.Now in port 'A' you put expression like
setcountvariable(countA).
138
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Take second file's 3 column to a expression transformation,add a output column in it let say 'B'.create a
mappingvariable let say 'countB' having datatype integer.Now inport 'B' you put expression like
setcountvariable(countB).
The above two is only meant for creating common fields withcommon data in two pipeline.
Now join two pipe line in a joiner transformation upon the
condition A=B.
Now join the required 8 ports to the target.
11. How will restrict values in 0-9 and A-Z and a-z and special character. Only allowed these chars
otherwise we will reject the records? what is the function we used to restrict...
IIF(reg_match(in_String,’^[a-zA-Z][\w\.-]*[a-zA-Z0-9]@[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-
zA-Z]$’),’Valid’,'Invalid’)
This function is usually usedto validate the email.
139
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
TARGET
3 SWATHI
4 KEERTHI
5 SANTHOSE
6 VASU
Here as the source meta data is same we can use UNION t/r, after that use AGGRIGATOR t/r where you
GroupBy : NO ,count(NO) after that keep the filter t/r in that the condition is count=1.then connect to the
target
The flow is as fallows
src--->sq--->union--->agg---->filter---->trg
13. Below
Source
name sal
aaaa 2000
bbbb 3000
abcd 5000
Target
name sal
aaaa 2000
bbbb 3000
abcd 5000
total 10000 ---AS NEW ROW
SRC==>EXP==>AGGR==>UNION
SRC==> ==>TRG
In AGGR take one output port that as name field with constant value(total) and the sum sal field as sal
after that UNION with the other pipeline.
140
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
14. We have 20 records in source system, when we run for the 1st time, it should load only 10 records
into the target, when you run for the second time it should load another 10 record which are not
loaded. How do we do that? Can we write a SQL query in source qualifier to do it.
M_PG130_14
SQ:
NOTE: $$Var1 and $$Var2 are mapping parameter the initial values of these parameter 1 and 10 with
respectively.
After completion of first run the parameter are increased 11and 20 respectively (When set initial values of
these parameter 11 and 20 with respectively)
15. I have a wf like wf-->s1-->s2-->s3-->s4; first start s1 later s2 later s3 here my session s3 have to run
3 times later start s4?
Option1: Wf-->s1-->s2-->s3-->Instance_of_s3--->Instance_of_s3-->s4
141
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
It can be achieved through Assignment task set a variable ‘$$Var = $$Var+1’& check ‘mod ($$Var, 3) =0’ in
the link condition and enable on persistent ‘$$Var’ port.
Option2:create S3 in another workflow & call it from PMCMD command. OR in write a shell script to call
session s1 and after successful completion s2; later in a loop call s2 three times and later s4.
Ramya female
Ram male
deesha female
david male
kumar male
male female
ram ramya
david deesha
kumar
/ \
142
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
\ /
Router: Filter on the basis of gender. All records with gender male wil go into EXP1 and the records with
gender female will go into EXP2.
In EXP1 and EXP2, create a variable dummy port (Seq_Var) with value Seq_Var+1 and an output port
Seq_Out (value-Seq_Var).
Pass the name and sequence ports from EXP1 and EXP2 to SRT1 and SRT2 (sorter) respectively.
In Joiner, use 'Full Outer Join' and keep 'Sorted Input' checked in properties.
Pass the male name values and female name values from joiner to EXP3 (Gather all the data). Pass all the
data to Target.
17. We have 30 wf with 3 batches means 1batch=10 wf, 2 batch=10-20 wf, 3batch=20-30wf. First you
have to complete batch1 after that batch2and batch3 can have to run concurrently. How can you
do in unix?
Write three shell scripts for each batch of workflows using pmcmd commad to invoke workflows. Write
another shell script to call first batch shell script which contains wfs the after the in script to schedule next
batch of shell for the next minute using crontab.
18. How to display session logs based upon particular dates. If I want to display session logs for 1 week
from a particular date how can I do it without using unix
Open the session properties of a session,Go to CONFIG OBJECT Tab then Log Option settings , in this look for
Save
session log by and then select the option Session Timestamp.
19. How to add source flat file header with output field names into target file?
When we use target as a flatfile then in the mapping tab of a session click the flatfile target then we have a
header option, in that select output field names
20. What is Test load plan? Let us assume 10 records in source, how many rows will be loaded into
target?
NO rows will be loaded in to the target,bcoz this only TESTING
143
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Source records should be load into the target as per the number of records set at the session level after
enabling test load plan.
21. How do schedule a workflow in Informatica thrice in a day? Like run the workflow at 3am, 5am and
4pm?
In workflow designer > workflow > schedulers ... you can set 3 different daily schedulers. Each one has its
execution time (3, 4, 5 pm)
22. Can we insert,update, delete in target tables with oneupdate strategy tran.?
Yes we can do all these insert,update,delete also reject by using single update strategy.
iif(somexvalue=xx,dd_insert,iif(somexvalue=yy,dd_update,iif (somexvalue=zz,dd_delete,dd_reject)))
23. If our source containing 1 terabyte data so while loading data into target what are the thing we
keep in mind?
1tera data is huge amount of data so if we use normal load type it takes so much of time....to overcome
this better to use bulk loading...and also taking the large commit intervals we can get good
performance....and one more technique is external loading for databases like netezza and teradat
Use partition option: Works for both relational and file source systems
If source is file: Instead of loading huge amount of data at a time, we can always split the source if it is a file
using unix and extract the data from all the source files using indirect type and load into target.
24. Can we insert duplicate data with dynamic look up cache, if yes than why and if no why?
Duplicate data cannot be inserted using dynamic look up cache..bcoz dynamic look up cache performs
update and insert function based on the keys it gets in the target table.
25. If there are 3 workflows are running and if 1st workflow fails then how could we start 2nd workflow or
if 2nd workflow fails how could we start 3rd workflow?
Option 1: Use worklet for each workflow and create and workflow with these worklets and create
dependency.
Option2: Use a scheduling tools such as control M or Autosys to create dependency among workflows.
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
tom 8
and I want to load into target like ram record 3 times, sam record 5 times, tom record 8 times.
Option1:
SQ SQLTran Target
In SQLTran : INSERT INTO TABLE (c1) select ?p_name? From dual connect by level<=?p_occurs?
Option2:
SQ Stored ProcTrans Target
Stored Proc: Call stored procedure within the pipeline SP_INSERT(p_name,p_occurs) within the proc use
p_occurs as loop counter and within in the loop insert records
27. SOURECE
TARGET:
I have source and target. How to validate data? Tell me difference in data between steps above table?
There are many ways to check this:
1. Do column level check meaning SRC.ID=TGT.ID but otherscolumns are not matching on data.
2. You can do MINUS (SRC MINUS TGT) and (TGT MINUS SRC) forparticular day's load.
3. You can do referential integrity check by checking any idin SRC but not in TGT or any id in TGT but not in
SRC
28. We have 10 records in source in that good records go to relational target and bad records go to
target bad file ? Here if any bad records you need to send an email, if no bad records means no
need to send email
Option1:
Pre session command: remove the existing file from infa_shared/BadFiles/XXXXX.txt
In post session command of the session you can call a shell script which checks the bad file for non zero
records and send email accordingly.
Option 2:
Link Condition if $Session.TgtFailedsRows>0 then call a email task
145
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
30. When dynamic lookup; How do wegenerate surrogate keys using dynamic lookup? Can we use
itfor scd type 2 mapping and why?
Get surrogate key in as a lookup port and the set the associated port to sequence-ID.
When the Integration Service updates (old records) rows in the lookup cache it uses the primary key
(PK_PRIMARYKEY) values for rows in the cache and the target table. The Integration Service uses the
sequence ID to generate a primary key for the customer that it does not find in the cache. The Integration
Service inserts the primary key value into the lookup cache and it returns the value to the lookup/output
port.
146
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
src->sorter->exp->agg->tgt
Sorter:-
select col1 key as sorter
Exp:-
var1=iff(var2=col1,var1||','||col2,col2)
var2=col1
output_port=var1
Agg;-
Select group by col1
Source qualifier --> Normalizer --> Expr --> Agg --> target
Take a normalizer transformation. Create a normalized port named "detail" with occurence 4
. Connect input ports from source qualifier to each detailport in normalizer. Next take an expression
transformation. In that create anoutput port named month. And in expression editor write the
logic as
147
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Reason2: That is UNION transformation is derived from custom transformation which is active transformation
City Gender No
chennai male 40
chennai female 35
bangalore male 25
bangalore female 25
mumbai female 15
chennai40 35
bangalore25 25
mumbai15
148
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
149
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Solution3:
Agg:
O_MALE= SUM (DECODE
DECODE (GENDER, 'male', NO, 0))
O_FEMALE= SUM (DECODE
DECODE (GENDER, 'female', NO, 0))
month total
jan 1500
feb 900
mar 1200
apr 1500
Solution1:
Using UNION ALL, You can achieve it, here is Your Query
Solution2:
150
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
AMOUNTYEAR QUARTER
1000 2003 first
2000 2003 second
3000 2003 third
4000 2003 fourth
5000 2004 first
6000 2004 second
7000 2004 third
8000 2004 fourth
151
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
37. If I have 10 records in my source, if we use router t/r and given the condition as i>2,i=5 and i<2in
the different groups. what is the o/p in the target
Router
38. A flat file.dat is having 1 lakh columns. Can I have and excelled file format as a target.
No, A excel sheet can hold having 65536 colums but flat files one lak columns. The only option is to save it in
as .CSV (comma separated value)
152
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
I want the output unique one column duplicate another column following format like
Unique duplicate
1 1
2 2
3 3
4
5
Solution2:Using mapping
Agg:
DEPTNO=GroupBy
O_Count=Count (DEPTNO)
Sorter:
In properties click on distinct
Filter:
Filter Condition=O_Count >1
Joiner:
Condition: Dup_DEPTNO=Uniq_DEPTNO
Join Type=Master Outer Join
In properties click on sorted Input
153
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
41. How will u get 1 and 3rd and 5th records in table what is the query in oracle
Display odd records
Select * from emp where (rowid, 1) in (select
rowid,mod(rownum,2) from emp)
43. To improve the performance of aggregator we use sorted input option and use sorter t/r before
aggregator. But here we are increasing one more cache in our mapping i.e; sorter. So how can u
convince that you are increasing the performance?
To do aggregation calculations function it needs some time.. We can reduce that time by providing sorted
I/P. The time taken to forward the rows from sorter to aggregator and then to downstream transformations
is less than the time to do aggregation calculations without sorted data
45. I have one flatfile as target in a mapping. When i am trying to load data second time into it.
The records already is in flatfile is getting override. I don't want to override existing records.
Note : we can do this by implementing CDC / Incremental pool logic if target is relational .
But this is flatfile. So, even i use this same technique it will override only
So what is the solution?
Double click on session-->mapping tab-->>Target properties--
>Append if exists (check this option).
154
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Theory Question
1. What is sql override?
Overriding SQL in source qualifier or lookup for additional logic
155
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
10. Connected Lookup If there is no match for the lookup condition, the Informatica Server returns the
default value for all output ports. If you configure dynamic caching, the Informatica Server inserts
rows into the cache.
11. Unconnected Lookup If there is no match for the lookup condition, the Informatica Server returns
NULL.
12. Connected Lookup Pass multiple output values to another transformation. Link lookup/output ports
to another transformation.
13. Unconnected Lookup pass one output value to another transformation. The lookup/output/return
port passes the value to the transformation calling: LKP expression.
14. Connected Lookup Supports user-defined default values.
15. Unconnected Lookup Does not support user-defined default values
16. If u have data coming from diff. sources what transformation will u use in your designer?
Joiner Transformation
156
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
157
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
36. What is a reusable transformation? What is a mapplet . Explain the difference between them
Reusable tranformation:- if u want to create transformation that perform common tasks such as avg sal in a
dept
Mapplet:- Is a reusable object that represents a set of transformations
37. What happens when you use the delete or update or reject or insert statement in your update
strategy?
Inserts:- treats all records as inserts , while inserting if the record violates primary, foreign key or
foreign key in the database it rejects the record
158
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
41. When you run the session does debugger loads the data to target?
If you select the option discard target data then it will not load to target
42. Can you use flat file and table (relational) as source together?
Yes
43. Suppose I need to separate the data for delete and insert to target depending on the condition,
which transformation you use?
Router or filter
44. What is the difference between lookup Data cache and Index cache.
Index cache:Contains columns used in condition
Data cache: :Contains other output columns than the condition columns.
50. While using Debugger, how will you find out which transformation is currently running?
The left hand corner of the transformation that has an arrow looks like moving.
159
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
52. What is a Filter Transformation? or what options you have in Filter Transformation?
The Filter transformation provides the means for filtering records in a mapping. You pass all the rows
from a source transformation through the Filter transformation, then enter a filter condition for the
transformation. All ports in a Filter transformation are input/output and only records that meet the
condition pass through the Filter transformation.
54. What are the two programs that communicate with the Informatica Server?
Informatica provides Server Manager and pmcmd programs to communicate with the Informatica
Server:
Server Manager. A client application used to create and manage sessions and batches, and to
monitor and stop the Informatica Server. You can use information provided through the Server
Manager to troubleshoot sessions and improve session performance.
pmcmd. A command-line program that allows you to start and stop sessions and batches, stop the
Informatica Server, and verify if the Informatica Server is running.
56. What are different types of Tracing Levels you have in Transformations?
Tracing Levels in Transformations :-
Level Description
Terse Indicates when the Informatica Server initializes the session and its components. Summarizes
session results, but not at the level of individual records. Normal Includes initialization information as
well as error messages and notification of rejected data.
Verbose initialization Includes all information provided with the Normal setting plus more extensive
information about initializing transformations in the session.
Verbose data Includes all information provided with the Verbose initialization setting.
Note: By default, the tracing level for every transformation is Normal.
To add a slight performance boost, you can also set the tracing level to Terse, writing the minimum
of detail to the session log when running a session containing the transformation.
160
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
transformations to find each dimension key. You can then use the mapplet in each fact table mapping,
rather than recreate the same lookup logic in each mapping.
58. If data source is in the form of Excel Spread sheet then how do use?
PowerMart and PowerCenter treat a Microsoft Excel source as a relational database, not a flat file.
Like relational sources, the Designer uses ODBC to import a Microsoft Excel source. You do not
need database permissions to import Microsoft Excel sources.
To import an Excel source definition, you need to complete the following tasks:
· Install the Microsoft Excel ODBC driver on your system.
· Create a Microsoft Excel ODBC data source for each source file in the ODBC 32-bit Administrator.
· Prepare Microsoft Excel spreadsheets by defining ranges and formatting columns of numeric
data.
· Import the source definitions in the Designer.
Once you define ranges and format cells, you can import the ranges in the Designer. Ranges
display as source definitions when you import the source.
59. When do u use connected lookup and when do you use unconnected lookup?
A connected Lookup transformation is part of the mapping data flow. With connected lookups,
you can have multiple return values. That is, you can pass multiple values from the same row in the
lookup table out of the Lookup transformation.
Common uses for connected lookups include:
=> Finding a name based on a number ex. Finding a Dname based on deptno
=> Finding a value based on a range of dates
=> Finding a value based on multiple conditions
Unconnected Lookups : -
An unconnected Lookup transformation exists separate from the data flow in the mapping. You
write an expression using the :LKP reference qualifier to call the lookup within another
transformation.
Some common uses for unconnected lookups include:
=> Testing the results of a lookup in an expression
=> Filtering records based on the lookup results
=> Marking records for update based on the result of a lookup (for example, updating slowly
changing dimension tables)
=> Calling the same lookup multiple times in one mapping
60. How many values it (informatica server) returns when it passes thru Connected Lookup n
Unconncted Lookup?
Connected Lookup can return multiple values where as Unconnected Lookup will return only one
values that is Return Value.
61. What kind of modifications you can do/perform with each Transformation?
Using transformations, you can modify data in the following ways:
----------------- ------------------------
Task Transformation
----------------- ------------------------
Calculate a value Expression
Perform an aggregate calculations Aggregator
Modify text Expression
Filter records Filter, Source Qualifier
161
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
63. In case of Flat files (which comes thru FTP as source) has not arrived then what happens
You get a fatal error which cause server to fail/stop the session.
162
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
11. I have two sources S1 having 100 records and S2 having 10000 records, I want to join them, using
joiner transformation. Which of these two sources (S1,S2) should be master to improve my
performance? Why?
S1 should be the master as it contains few records so that the usage of cache can be reduced, S2
should be detail.
12. I have a source and I want to generate sequence numbers using mappings in informatica. But I
don’t want to use sequence generator transformation. Is there any other way to do it?
YES, Use an unconnected lookup to get max key value and there on increment by 1 using an
expression variable OR write a stored procedure and use Stored Procedure Transformation.
15. What are the contents of the cache directory in the server?
Index cache files and Data caches
20. If a sequence generator (with increment of 1) is connected to (say) 3 targets and each target uses
the NEXTVAL port, what value will each target get?
Each target will get the value in multiple of 3
163
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
During a source-based commit session, the Informatica Server commits data to the target based
on the number of rows from an active source in a single pipeline. These rows are referred to as
source rows. A pipeline consists of a source qualifier and all the transformations and targets that
receive data from the source qualifier. An active source can be any of the following active
transformations:
Advanced External Procedure
Source Qualifier
Normalizer
Aggregator
Joiner
Rank
Sorter
Mapplet, if it contains one of the above transformations
Note: Although the Filter, Router, and Update Strategy transformations are active transformations,
the Informatica Server does not use them as active sources in a source-based commit session.
23. What do you know about the Informatica server architecture? Load Manager, DTM, Reader, Writer,
Transformer
o Load Manager is the first process started when the session runs. It checks for validity of mappings,
locks sessions and other objects.
o DTM process is started once the Load Manager has completed its job. It starts a thread for each
pipeline.
o Reader scans data from the specified sources.
o Writer manages the target/output data.
o Transformer performs the task specified in the mapping.
164
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
2. Sources
Set a filter transformation after each SQ and see the records are not through.
If the time taken is same then there is a problem.
You can also identify the Source problem by
Read Test Session – where we copy the mapping with sources, SQ and remove all transformations
and connect to file target.
If the performance is same then there is a Source bottleneck.
Using database query – Copy the read query directly from the log. Execute the query against the
source database with a query tool. If the time it takes to execute the query and the time to fetch
the first row are significantly different, then the query can be modified using optimizer hints.
Solutions:
Optimize Queries using hints.
Use indexes wherever possible.
3. Mapping
If both Source and target are OK then problem could be in mapping.
Add a filter transformation before target and if the time is the same then there is a problem.
(OR) Look for the performance monitor in the Sessions property sheet and view the counters.
Solutions:
If High error rows and rows in lookup cache indicate a mapping bottleneck.
Optimize Single Pass Reading
165
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
When the object the shortcut references changes, the shortcut inherits those changes. By using a
shortcut instead of a copy, you ensure each use of the shortcut matches the original object. For
example, if you have a shortcut to a target definition, and you add a column to the definition, the
shortcut inherits the additional column.
· Scenario1
Here is a table with Single Row, in a target table the same row should be populated 10 times.
Using Normalizer, we can do it. Hint : Normalizer / Occurs make it 10 and
Have 10 inputs and a output. You will get 10 rows.
30. While importing the relational source definition from database, what are the metadata of source
you import?
Source name
Database location
Column names
Data types
Key constraints
31. How many ways U can update a relational source definition and what are they?
Two ways
1. Edit the definition
2. Re-import the definition
35. Can you use the mapping parameters or variables created in one mapping into another mapping?
NO.
We can use mapping parameters or variables in any transformation of the same mapping or
mapplet in which U have created mapping parameters or variables.
166
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
36. Can u use the mapping parameters or variables created in one mapping into any other reusable
transformation?
Yes. Because reusable transformation is not contained with any maplet or mapping.
38. What is the difference between joiner transformation and source qualifier transformation?
U can join heterogeneous data sources in joiner transformation which we cannot achieve in
source qualifier transformation.
U need matching keys to join two relational sources in source qualifier transformation. Whereas u
doesn't need matching keys to join
two sources.
Two relational sources should come from same data source in sourcequalifier. U can join relational
sources which are coming from different sources also.
39. In which conditions we can/cannot use joiner transformation(Limitations of joiner transformation)?
Ideally in joiner transformation, below are the limitations
1. Both input pipelines originate from the same Source Qualifier transformation.
2. Both input pipelines originate from the same Normalizer transformation.
3. Both input pipelines originate from the same Joiner transformation.
4. Either input pip`elines contains an Update Strategy transformation.
5. Either input pipelines contains a connected or unconnected Sequence Generator
transformation.
But you can join data using joiner from single pipeline by selecting sorted input option in Joiner
transformation.
6. What are the settings that you use to configure the joiner transformation?
Master and detail source
Type of join
Condition of the join
8. How the informatica server sorts the string values in Rank transformation?
When the informatica server runs in the ASCII data movement mode it sorts session data using
Binary sort order. If you configure the session to use a binary sort order, the informatica server
calculates the binary value of each string and returns the specified number of rows with the
highest binary values for the string.
167
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
A Router transformation is similar to a Filter transformation because both transformations allow you
to use a condition to test data.
However, a Filter transformation tests data for one condition and drops the rows of data that do
not meet the condition. A Router transformation tests data for one or more conditions and gives
you the option to route rows of data that do not meet any of the conditions to a default output
group.
If you need to test the same input data based on multiple conditions, use a Router Transformation
in a mapping instead of creating multiple Filter transformations to perform the same task.
5. What are the types of data that passes between informatica server and stored procedure?
3 types of data
Input/Out put parameters
Return Values
Status code.
9. What are the basic needs to join two sources in a source qualifier?
168
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
UPDATE DD_UPDATE 1
DELETE DD_DELETE 2
REJECT DD_REJECT 3
12. What is the default source option for update strategy transformation?
Data driven
14. What are the options in the target session of update strategy transformation?
Insert
Delete
Update
Update as update
Update as insert
Update else insert
Truncate table
15. What are the types of mapping wizards that are to be provided in Informatica?
The Designer provides two mapping wizards to help you create mappings quickly and easily. Both
wizards are designed to create mappings for loading and maintaining star schemas, a series of
dimensions related to a central fact table.
169
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Getting Started Wizard: Creates mappings to load static fact and dimension tables, as well as
slowly growing dimension tables.
Slowly Changing Dimensions Wizard:. Creates mappings to load slowly changing dimension tables
based on the amount of historical dimension data you want to keep and the method you choose
to handle historical dimension data.
17. What are the mappings that we use for slowly changing dimension table?
Type1: Rows containing changes to existing dimensions are updated in the target by overwriting
the existing dimension. In the Type 1 Dimension mapping, all rows contain current dimension data.
Use the Type 1 Dimension mapping to update a slowly changing dimension table when you do not
need to keep any previous versions of dimensions in the table.
Type 2: The Type 2 Dimension Data mapping inserts both new and changed dimensions into the
target. Changes are tracked in the target table by versioning the primary key and creating a
version number for each dimension in the table.
Use the Type 2 Dimension/Version Data mapping to update a slowly changing dimension table
when you want to keep a full history of dimension data in the table. Version numbers and
versioned primary keys track the order of changes to each dimension.
Type 3: The Type 3 Dimension mapping filters source rows based on user-defined comparisons and
inserts only those found to be new dimensions to the target. Rows containing changes to existing
dimensions are updated in the target. When updating an existing dimension, the Informatica
Server saves existing data in different columns of the same row and replaces the existing data with
the updates
19. How can u recognize whether or not the newly added rows in the source r gets insert in the target?
In the Type2 mapping we have three options to recognize the newly added rows
Version number
Flag value
Effective date Range
20. What r two types of processes that informatica runs the session?
Load manager Process: Starts the session, creates the DTM process, and sends post-session email
when the session completes.
170
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
The DTM process. Creates threads to initialize the session, read, write, and transform data, and
handle pre- and post-session operations.
24. Which tool U use to create and manage sessions and batches and to monitor and stop the
informatica server?
Informatica Workflow manager and monitor
26. To achieve the session partition what r the necessary tasks u have to do?
Configure the session to partition source data.
Install the informatica server on a machine with multiple CPU's.
27. How the informatica server increases the session performance through partitioning the source?
For relational sources informatica server creates multiple connections for each partition of a single
source and extracts separate range of data for each connection. Informatica server reads
multiple partitions of a single source concurrently; each partition is associated to a thread. Similarly
for loading also informatica server creates multiple connections to the target and loads partitions
of data concurrently.
For XML and file sources, informatica server reads multiple files concurrently. For loading the data
informatica server creates a separate file for each partition(of a source file).U can choose to
merge the targets.
29. What are the tasks that Load manger process will do?
Manages the session and batch scheduling: When u start the informatica server the load manager
launches and queries the repository for a list of sessions configured to run on the informatica server.
When u configure the session the load manager maintains list of list of sessions and session start
times. When u start a session load manger fetches the session information from the repository to
perform the validations
and verifications prior to starting DTM process
Locking and reading the session: When the informatica server starts a session load manager locks
the session from the repository. Locking prevents U starting the session again and again.
171
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Reading the parameter file: If the session uses a parameter files, load manager reads the
parameter file and verifies that the session level parameters are declared in the file
Verifies permission and privileges: When the session starts load manger checks whether or not the
user have privileges to run the session.
33. What r the output files that the informatica server creates during the session running?
Informatica server log: Informatica server(on UNIX) creates a log for all status and error
messages(default name: pm.server.log).It also creates an error log for error messages. These files
will be created in informatica home directory.
Session log file: Informatica server creates session log file for each session. It writes information
about session into log files such as initialization process, creation of sql commands for reader and
writer threads, errors encountered and load summary. The amount of detail in session log file
depends on the tracing level that u set.
Session detail file: This file contains load statistics for each target in mapping. Session detail include
information such as table name, number of rows written or rejected. U can view this file by double
clicking on the session in monitor window
Performance detail file: This file contains information known as session performance details which
helps U where performance can be improved. To generate this file select the performance detail
option in the session property sheet.
Reject file: This file contains the rows of data that the writer does not write to targets.
Control file: Informatica server creates control file and a target file when U run a session that uses
the external loader. The control file contains the information Post session email: Post session email
allows U to automatically communicate information about a session run to designated recipients. U
can create two different Indicator file: If u use the flat file as a target, U can configure the
informatica server to create indicator file. For each target row, the indicator file contains output
file: If session writes to a target file, the informatica server creates the target file based on file
properties entered in the session property sheet.
Cache files: When the informatica server creates memory cache it also creates cache files. For the
following circumstances informatica server creates index Aggregator transformation
172
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Joiner transformation
Rank transformation
Lookup transformation
42. What are the different options used to configure the sequential batches?
Two options
Run the session only if previous session completes successfully.
Always runs the session.
43. In a sequential batch can u run the session if previous session fails?
Yes. By setting the option always runs the session.
173
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
We can start our required session only in case of sequential batch. in case of concurrent batch we
can’t do like this.
46. How can u stop a batch?
By using workflow monitor or pmcmd or forcefully cancel the.
7. What is difference between portioning of relational target and partitioning of file targets?
If u partition a session with a relational target informatica server creates multiple connections to the
target database to write target data concurrently. If u partition a session with a file target the
informatica server creates one target file for each partition. U can configure session properties to
merge these target files.
174
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
following.
Network: The performance of the Informatica Server is related to network connections. Data
generally moves across a network at less than 1 MB per second, whereas a local disk moves data
five to twenty times faster. Thus network connections often affect on session performance. So avoid
network connections.
Flat files: If your flat files stored on a machine other than the informatica server, move those files to
the machine that consists of informatica server.
Less Connections: Minimize the connections to sources ,targets and informatica server to improve
session performance. Moving target database into server system may improve session
performance.
Staging areas: If you use staging areas you force informatica server to perform multiple data
passes. Removing of staging areas may improve session performance. Use staging area only when
its mandatory
Informatica Servers: You can run the multiple informatica servers against the same repository.
Distributing the session load to multiple informatica servers may improve session performance.
Run the informatica server in ASCII data movement mode improves the session performance.
Because ASCII data movement mode stores a character value in one byte. Unicode mode takes 2
bytes to store a character.
Source qualifier: If a session joins multiple source tables in one Source Qualifier, optimizing the query
may improve performance. Also, single table select statements with an ORDER BY or GROUP BY
clause may benefit from optimization such as adding indexes.
Drop constraints: If target consists of key constraints and indexes it slows the loading of data. To
improve the session performance in this case drop constraints and indexes before we run the
session (while loading facts and dimensions) and rebuild them after completion of session.
Parallel sessions: Running parallel sessions by using concurrent batches will also reduce the time of
loading the data. So concurrent batches may also increase the session performance.
Partitioning: the session improves the session performance by creating multiple connections to
sources and targets and loads data in parallel pipe lines.
Incremental Aggregation: In some cases if a session contains an aggregator transformation, you
can use incremental aggregation to improve session performance.
Transformation Errors: Avoid transformation errors to improve the session performance. Before
saving the mapping validate it and see and if any transformation errors rectify it.
Lookup Transformations: If the session contained lookup transformation you can improve the
session performance by enabling the look up cache.The cache improves the speed by saving the
previous data and hence no need to load that again
Filter Transformations: If your session contains filter transformation, create that filter transformation
nearer to the sources or you can use filter condition in source qualifier.
Group transformations: Aggregator, Rank and joiner transformation may often decrease the session
performance .Because they must group data before processing it. To improve session performance
in this case use sorted ports option ie sort the data before using the transformation.
Packet size: We can improve the session performance by configuring the network packet size,
which allows data to cross the network at one time. To do this go to server manger, choose server
configure database connections.
175
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
Use repository manager to create the repository. The Repository Manager connects to the
repository database and runs the code needed to create the repository tables. These tables stores
metadata in specific format the informatica server, client tools use.
14. What is tracing level and what r the types of tracing level?
Tracing level represents the amount of information that informatica server writes in a log file.
Types of tracing level
Normal
Verbose
Verbose init
Verbose data
15. What is difference between stored procedure transformation and external procedure
transformation?
In case of stored procedure transformation procedure will be compiled and executed in a
relational data source. U need database connection to import the stored procedure in to your
mapping. Where as in external procedure transformation procedure or function will be executed
outside of data source. ie u need to make it as a DLL to access in u r mapping. No need to have
data base connection in case of external procedure transformation.
176
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
If you stop a session or if an error causes a session to stop, refer to the session and error logs to
determine the cause of failure. Correct the errors, and then complete the session. The method you
use to complete the session depends on the properties of the mapping, session, and Informatica
Server configuration.
Use one of the following methods to complete the session:
· Run the session again if the Informatica Server has not issued a commit.
· Truncate the target tables and run the session again if the session is not recoverable.
· Consider performing recovery if the Informatica Server has issued at least one commit.
17. If a session fails after loading of 10,000 records in to the target. How can u load the records from
10001st record when u run the session next time?
As explained above informatica server has 3 methods to recovering the sessions. Use performing
recovery to load the records from where the session fails.
177
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
23. What are the circumstances that informatica server results an unrecoverable session?
The source qualifier transformation does not use sorted ports.
If u change the partition information after the initial session fails.
Perform recovery is disabled in the informatica server configuration.
If the sources or targets changes after initial session fails.
If the mapping consists of sequence generator or normalizer transformation.
If a concurrent batch contains multiple failed sessions.
24. If I have done any modifications for my table in back end does it reflect in informatica warehouse
or mapping designer or source analyzer?
NO. Informatica is not at all concern with back end data base. It displays u all the information that
is to be stored in repository. If want to reflect back end changes to informatica screens, again u
have to import from back end to informatica by valid connection. And u have to replace the
existing files with imported files.
25. After dragging the ports of three sources(sql server,oracle,informix) to a single source qualifier, can
u map these three ports directly to target?
NO. Unless and until u join those three ports in source qualifier u cannot map them directly.
178
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
27. what are the main issues while working with flat files as source and as targets ?
We need to specify correct path in the session and mension either that file is 'direct' or 'indirect'.
Keep that file in exact path which you have specified in the session.
1. We cannot use SQL override. We have to use transformations for all our requirements
2. Testing the flat files is a very tedious job
3. The file format (source/target definition) should match exactly with the format of data file. Most
of the time erroneous
result come when the data file layout is not in sync with the actual file.
(i) Your data file may be fixed width but the definition is delimited----> truncated data
(ii) Your data file as well as definition is delimited but specifying a wrong delimiter
(a) a delimitor other than present inactual file or
(b) a delimiter that comes as a character in some field of the file--->wrong data again
(iii) Not specifying NULL character properly may result in wrong data
(iv) there are other settings/attributes while creating file definition which one should be very careful
4. If you miss link to any column of the target then all the data will be placed in wrong fields. That
missed column won’t exist in the target data file.
28. Explain about Informatica server process that how it works relates to mapping variables?
Informatica primarily uses load manager and data transformation manager(dtm) to perform
extracting transformation and loading. Load manager reads parameters and a variable related to
session mapping and server and passes the mapping parameters and variable information to the
DTM.DTM uses this information to perform the data movement from source to target
The PowerCenter Server holds two different values for a mapping variable during a session run:
1. Start value of a mapping variable
2. Current value of a mapping variable
Start Value
The start value is the value of the variable at the start of the session. The start value could be a
value defined in the parameter file for the variable a value saved in the repository from the
previous run of the session a user defined initial value for the variable or the default value based on
the variable datatype.
The PowerCenter Server looks for the start value in the following order:
1. Value in parameter file
2. Value saved in the repository
3. Initial value
4. Default value
Current Value
The current value is the value of the variable as the session progresses. When a session starts the
current value of a variable is the same as the start value. As the session progresses the
PowerCenter Server calculates the current value using a variable function that you set for the
variable. Unlike the start value of a mapping variable the current value can change as the
PowerCenter Server evaluates the current value of a variable as each row passes through the
mapping.
3. A query to retrieve the latest records from the table sorted by version(scd).
Select* from dimension a
where a.version (select max(b.version) from dimension b where a.dno b.dno);
179
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
180
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
GOOD LINKS
http://stackoverflow.com/questions/tagged/informatica
http://www.itnirvanas.com/2009/01/informatica-interview-questions-part-1.html
http://gogates.blogspot.in/2011/05/informatica-interview-questions.html
https://community.informatica.com/thread/38970
http://shan-informatica.blogspot.in/
181
@..Shashank Nikam..@
Tech-Zealous
Software
Solutions
http://www.info-etl.com/course-materials/informatica-powercenter-development-best-practices
http://informaticaramamohanreddy.blogspot.in/2012/08/final-interview-questions-etl.html
http://informaticaconcepts.blogspot.in/search/label/ScenaroBasedQuestions
http://baluinfomaticastar.blogspot.in/2011/06/dwh-material-with-informatica-material.html
182
@..Shashank Nikam..@