Insert, Update Ordering in Informatica

The document discusses how the order of inserts and updates to a target table in Informatica mappings can significantly impact performance. Testing showed that routing inserts and updates to separate target tables (Mapping 2) was much faster than interleaving them (Mapping 1). Trace files revealed that Mapping 1 did not use array inserts, which are more efficient, because inserts and updates were interleaved instead of grouped. Grouping operations allows arrays of multiple rows to be inserted/updated together, reducing database communication overhead. The key implication is that separating inserts from updates can yield major performance benefits for Informatica ETL mappings.

Uploaded by

Ur's Gopinath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views6 pages

Insert, Update Ordering in Informatica

Uploaded by

Ur's Gopinath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Insert / Update ordering in Informatica mappings

ETL-Performance.com Stephen Barr

Does the order of inserts & updates to a target make a substantial difference to the overall performance of the mapping, and perhaps more importantly to the overall scalability of the solution? The resounding answer is YES! Test case My source and target tables exist in the same database but within different schemas. Ive designed the data such that 50% of the rows from the source will be updates and 50% will be inserts.
SOURCE@INFADB>select count(*) 2 from insert_update_source 3 / COUNT(*) ---------202992 Elapsed: 00:00:01.45 SOURCE@INFADB>select action, count(*) 2 from insert_update_source 3 group by action 4 / ACTION COUNT(*) ------ ---------UPDATE 101496 INSERT 101496 Elapsed: 00:00:00.48

Using these sources and targets I created two mappings. Mapping 1 interleaved inserts / updates

In this mapping, the target will receive an insert, update, insert, etc. This has been designed to signify a worse case scenario.

Mapping 2 inserts / updates routed to separate targets In this mapping, there are two versions of the target. The inserts are routed to one target, while the updates are routed to the other. We then use the Target Load Plan to choose which one we should load first.

The scripts for creating the source and target tables are available at the bottom of this document.

Results Overall run times Mapping 1 Mapping 2 6 minutes 14 seconds 2 minutes 25 seconds

As you can see there is a massive difference in the runtimes between the two mappings. Obviously, something fundamental is happening in the first making which is making is perform so poorly and from looking at the oracle trace files we can see exactly what the issue is. From the trace of the target we can see the overall statistics for the insert statement from the first mapping
INSERT INTO INSERT_UPDATE_TARGET(ID,OWNER,OBJECT_NAME,SUBOBJECT_NAME, OBJECT_ID,DATA_OBJECT_ID,OBJECT_TYPE,CREATED,LAST_DDL_TIME,TIMESTAMP,STATUS, TEMPORARY,GENERATED,SECONDARY,ACTION) VALUES ( :1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15)

call ------Parse Execute Fetch ------total

count -----1 100922 0 -----100923

cpu elapsed disk query current -------- ---------- ---------- ---------- ---------0.00 0.00 0 0 0 26.60 27.14 46 1990 323326 0.00 0.00 0 0 0 -------- ---------- ---------- ---------- ---------26.60 27.14 46 1990 323326

rows ---------0 101496 0 ---------101496

We can see that there were 100923 executions of the insert statement, resulting in 323326 current block gets. However, if we look at the second mapping
INSERT INTO INSERT_UPDATE_TARGET(ID,OWNER,OBJECT_NAME,SUBOBJECT_NAME, OBJECT_ID,DATA_OBJECT_ID,OBJECT_TYPE,CREATED,LAST_DDL_TIME,TIMESTAMP,STATUS, TEMPORARY,GENERATED,SECONDARY,ACTION) VALUES ( :1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15)

call count ------- -----Parse 1 Execute 705 Fetch 0 ------- -----total 706

cpu elapsed disk query current -------- ---------- ---------- ---------- ---------0.00 0.00 0 0 0 3.50 5.50 1 4005 28802 0.00 0.00 0 0 0 -------- ---------- ---------- ---------- ---------3.50 5.50 1 4005 28802

rows ---------0 101496 0 ---------101496

You can see that there were only 706 executions of the insert statement with only 28802 current block gets. If we stack up the figures we can see this more starkly

Map1 insert Executions CPU time Elapsed time Block gets 100923 26.6 27.14 322326

Map 2 - insert 706 3.5 5.5 28802

Map1 update 101497 37.39 50.58 110023

Map 2 update 101497 30.68 42.67 107386

As you can see, there is a huge difference in the inserts especially when it comes to cpu time and the number of block gets. The reason? Array inserts. Informatica uses the native Oracle Call Interface to communicate with the oracle server. One of the features of the OCI interface is its ability to allow an OCI client to perform array inserts / updates. This means that for a single execution of the statement, multiple rows of data are processed. We can see this is happening because of the rows / executions for our insert statement is > 1. In fact, the average array size in this case looks to be ~170 rows of data. These array operations are much more efficient that ordinary insert operations. So why is one mapping performing array operations but the other is not? Informatica has implemented its OCI interface in a very simple generic way. If an insert statement is receiving by the writer process it will start to build an array. If another insert statement comes through, then this is simply added to the existing array. When the array is full then Informatica will send that array to oracle for processing as a single message. However, if we are in the middle of building an array of inserts and the writer receives an update, then Informatica will send the insert array as it currently is, followed by the update. Therefore, it we have interleaved inserts then updates, then we are effectively not using arrays at all. We can see this from the raw trace files In mapping 1, we can see the inserts and updates are interleaved almost perfectly
EXEC #1:c=0,e=287,p=0,cr=0,cu=3,mis=0,r=1,dep=0,og=1,tim=26087356343 WAIT #1: nam='SQL*Net message to client' ela= 6 driver id=1413697536 obj#=-1 tim=26087356647 WAIT #1: nam='SQL*Net message from client' ela= 873 driver id=1413697536 obj#=-1 tim=26087357719 EXEC #2:c=0,e=330,p=0,cr=2,cu=1,mis=0,r=1,dep=0,og=1,tim=26087358483 WAIT #2: nam='SQL*Net message to client' ela= 6 driver id=1413697536 obj#=-1 tim=26087358803 WAIT #2: nam='SQL*Net message from client' ela= 877 driver id=1413697536 obj#=-1 tim=26087359884 EXEC #1:c=0,e=268,p=0,cr=0,cu=3,mis=0,r=1,dep=0,og=1,tim=26087360720 WAIT #1: nam='SQL*Net message to client' ela= 6 driver id=1413697536 obj#=-1 tim=26087361027 WAIT #1: nam='SQL*Net message from client' ela= 885 driver id=1413697536 obj#=-1 tim=26087362116 EXEC #2:c=0,e=331,p=0,cr=2,cu=1,mis=0,r=1,dep=0,og=1,tim=26087362877 WAIT #2: nam='SQL*Net message to client' ela= 7 driver id=1413697536 obj#=-1 tim=26087363197 WAIT #2: nam='SQL*Net message from client' ela= 869 driver id=1413697536 obj#=-1 tim=26087364264 #bytes=1 p3=0 #bytes=1 p3=0

#bytes=1 p3=0 #bytes=1 p3=0

EXEC #1 is our insert, EXEC #2 is our update. However, looking at the trace file from mapping 2, we can see that the operations are grouped together
EXEC #1:c=0,e=205,p=0,cr=2,cu=1,mis=0,r=1,dep=0,og=1,tim=28092846779 WAIT #1: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=28092846876 WAIT #1: nam='SQL*Net message from client' ela= 488 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=28092847418 EXEC #1:c=0,e=264,p=0,cr=2,cu=1,mis=0,r=1,dep=0,og=1,tim=28092847781 WAIT #1: nam='SQL*Net message to client' ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=28092847887

WAIT #1: nam='SQL*Net message from client' ela= 425 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=28092848366 EXEC #1:c=0,e=209,p=0,cr=2,cu=1,mis=0,r=1,dep=0,og=1,tim=28092848672 WAIT #1: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=28092848771 WAIT #1: nam='SQL*Net message from client' ela= 414 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=28092849238 EXEC #1:c=0,e=207,p=0,cr=2,cu=1,mis=0,r=1,dep=0,og=1,tim=28092849540 WAIT #1: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=28092849638 WAIT #1: nam='SQL*Net message from client' ela= 403 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=28092850093 EXEC #1:c=0,e=207,p=0,cr=2,cu=1,mis=0,r=1,dep=0,og=1,tim=28092850387

We can see the massive difference in the traffic generated between Informatica and oracle when comparing both mappings Mapping 1 SQL*Net message to client SQL*Net message from client Mapping 2 SQL*Net message to client SQL*Net message from client 705 705 0.00 0.08 0.00 5.02 100922 100922 0.02 0.08 0.74 63.88

This is a massive reduction in the time the mapping is spending communicating with the oracle database and if we start to scale these figures up to production volumes you can see that these sort of issues need to be seriously considered. Those of you with a good eye will have spotted something a bit strange. The figures for the update statement are effectively the same for both mappings. It actually looks like Informatica does not support array updates. If this is true then it seems like a glaring hole in its OCI implementation. However, if you have evidence to the contrary let me know! Implications This was a very contrived test on a small single cpu box. I was using relatively small volumes and very simple structures. However, the trace files from oracle reflect the magnitude of the difference between the two approaches which will hold true even for the biggest of systems or the most complex of mappings. Its very easy to detect whether or not youre experiencing these issues and if you are then the performance benefits you could glean from separating your inserts / updates from each other could be fantastic especially given how little effort is required to make a change like this.

Scipts to create source & target tables SOURCE create sequence insert_update_seq; create table insert_update_source as select insert_update_seq.nextval, owner, object_name, subobject_name, object_id, data_object_id, object_type, created, last_ddl_time, timestamp, status, temporary, generated, secondary, decode(mod(rownum,2),0,'INSERT','UPDATE') as action from dba_objects / insert into insert_update_source ( select insert_update_seq.nextval, owner, object_name, subobject_name, object_id, data_object_id, object_type, created, last_ddl_time, timestamp, status, temporary, generated, secondary, decode(mod(rownum,2),0,'INSERT','UPDATE') as action from insert_update_source ) / / / commit; exec dbms_stats.gather_table_stats(user, 'INSERT_UPDATE_SOURCE');

TARGET create table insert_update_target ( ID NUMBER, OWNER VARCHAR2(30), OBJECT_NAME VARCHAR2(128), SUBOBJECT_NAME VARCHAR2(30),

OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE CREATED LAST_DDL_TIME TIMESTAMP STATUS TEMPORARY GENERATED SECONDARY ACTION ) /

NUMBER, NUMBER, VARCHAR2(19), DATE, DATE, VARCHAR2(19), VARCHAR2(7), VARCHAR2(1), VARCHAR2(1), VARCHAR2(1), VARCHAR2(6)

insert into insert_update_target ( select * from source.insert_update_source where action = 'UPDATE' ) / commit; create index id_idx on insert_update_target(id); exec dbms_stats.gather_table_stats(user,'INSERT_UPDATE_TARGET',cascade=>TR UE);

BI DW Assessement
13% (8)
BI DW Assessement
24 pages
SQL Server Architecture - PPT
No ratings yet
SQL Server Architecture - PPT
20 pages
SQL Material
100% (3)
SQL Material
252 pages
Digital Engineering: Complex System Design
From Everand
Digital Engineering: Complex System Design
S Mathioudakis
No ratings yet
Enkitec RealWorldExadata
No ratings yet
Enkitec RealWorldExadata
38 pages
Performance Is Overrated - NEDB 2012
100% (2)
Performance Is Overrated - NEDB 2012
44 pages
Interview Questions for IBM Mainframe Developers
From Everand
Interview Questions for IBM Mainframe Developers
Robert Wingate
1/5 (1)
More on C# in Front Office
From Everand
More on C# in Front Office
Xing Zhou
No ratings yet
Student Management System
No ratings yet
Student Management System
20 pages
12 DataWarehousing
No ratings yet
12 DataWarehousing
213 pages
Study Guide 300-615 Dcit Troubleshooting Cisco Data Centre Infrastructure
From Everand
Study Guide 300-615 Dcit Troubleshooting Cisco Data Centre Infrastructure
Anand Vemula
No ratings yet
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
Transfer Letter
100% (1)
Transfer Letter
3 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
PgDay 2017 Innodb Architecture Performance Optimization
No ratings yet
PgDay 2017 Innodb Architecture Performance Optimization
175 pages
Transaction Management
No ratings yet
Transaction Management
81 pages
A Interview Questions and Answers - Cool Interview
100% (16)
A Interview Questions and Answers - Cool Interview
30 pages
Chapter-1 Transaction Processing
No ratings yet
Chapter-1 Transaction Processing
69 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
CC Part 1 No Quizzes
No ratings yet
CC Part 1 No Quizzes
69 pages
Chapter 6
No ratings yet
Chapter 6
60 pages
CAS CS 460/660 Introduction To Database Systems Transactions and Concurrency Control
No ratings yet
CAS CS 460/660 Introduction To Database Systems Transactions and Concurrency Control
62 pages
Concurrency Control Protocols
No ratings yet
Concurrency Control Protocols
56 pages
Chapter 1 - Transaction Processing and MGT
No ratings yet
Chapter 1 - Transaction Processing and MGT
70 pages
Informatica Questions and Answers Complete
No ratings yet
Informatica Questions and Answers Complete
27 pages
Week7 Lecture
No ratings yet
Week7 Lecture
64 pages
Chapter 3 - Transaction Management
No ratings yet
Chapter 3 - Transaction Management
46 pages
Ad Database Transaction Concept
No ratings yet
Ad Database Transaction Concept
62 pages
Indexing
No ratings yet
Indexing
66 pages
6 - TransactionProcessing - Ch17 (Autosaved)
No ratings yet
6 - TransactionProcessing - Ch17 (Autosaved)
62 pages
Oracle E-Biz Performance
No ratings yet
Oracle E-Biz Performance
32 pages
Oracle Performance Tuning
No ratings yet
Oracle Performance Tuning
18 pages
Top 10, No - Make That 11, Things About Oracle Database 11g Release 1
No ratings yet
Top 10, No - Make That 11, Things About Oracle Database 11g Release 1
81 pages
DMS Exp 8-23-24
No ratings yet
DMS Exp 8-23-24
39 pages
Software Engineer Concepts - 4030afdb-00a4-4f83-A520 - 241007 - 202416
No ratings yet
Software Engineer Concepts - 4030afdb-00a4-4f83-A520 - 241007 - 202416
26 pages
(TakeHome Assignment 2)
No ratings yet
(TakeHome Assignment 2)
28 pages
Crash Recovery Method: Kathleen Durant CS 3200
No ratings yet
Crash Recovery Method: Kathleen Durant CS 3200
35 pages
Database System Recovery: CSEP 545 Transaction Processing For E-Commerce Philip A. Bernstein
No ratings yet
Database System Recovery: CSEP 545 Transaction Processing For E-Commerce Philip A. Bernstein
45 pages
Teradata 13.10 Features
No ratings yet
Teradata 13.10 Features
43 pages
Transaction Management
No ratings yet
Transaction Management
33 pages
DBMS Unit5
No ratings yet
DBMS Unit5
24 pages
ch16 Overview Xacts
No ratings yet
ch16 Overview Xacts
18 pages
Dbms-Unit-5 R16
No ratings yet
Dbms-Unit-5 R16
21 pages
4 6028372524222383733
No ratings yet
4 6028372524222383733
11 pages
12 Final Lecture
No ratings yet
12 Final Lecture
19 pages
Dbms Chapter1
No ratings yet
Dbms Chapter1
29 pages
Oracle Metrics
No ratings yet
Oracle Metrics
31 pages
Session Level Yapp Handout PDF
No ratings yet
Session Level Yapp Handout PDF
27 pages
Dbmsendsem
No ratings yet
Dbmsendsem
14 pages
Dca6111 - Relational Database Management System
No ratings yet
Dca6111 - Relational Database Management System
14 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
13 pages
Tracing SQL Statements
No ratings yet
Tracing SQL Statements
16 pages
A Smart Card Based Prepaid Electricity System
No ratings yet
A Smart Card Based Prepaid Electricity System
15 pages
Logical IO Vs Physical IO Vs Consistent Gets
No ratings yet
Logical IO Vs Physical IO Vs Consistent Gets
11 pages
Falcon From The Beginning
No ratings yet
Falcon From The Beginning
19 pages
MySQL Architecture
No ratings yet
MySQL Architecture
6 pages
Diagnosing Another Buffer Busy Waits Issue
No ratings yet
Diagnosing Another Buffer Busy Waits Issue
15 pages
Entregable 1.
No ratings yet
Entregable 1.
5 pages
DBMS Unit-V
No ratings yet
DBMS Unit-V
3 pages
Revis Ioin
No ratings yet
Revis Ioin
5 pages
Query Optimization
No ratings yet
Query Optimization
3 pages
(WWW - Entrance Exam - Net) Syntel
No ratings yet
(WWW - Entrance Exam - Net) Syntel
2 pages
Database Tuning Improvements: User Initiated Buffer Cache Flushing
No ratings yet
Database Tuning Improvements: User Initiated Buffer Cache Flushing
4 pages
Fun Game
No ratings yet
Fun Game
1 page
Transaction Management - PPTs
No ratings yet
Transaction Management - PPTs
4 pages