0% found this document useful (0 votes)
98 views6 pages

Insert, Update Ordering in Informatica

The document discusses how the order of inserts and updates to a target table in Informatica mappings can significantly impact performance. Testing showed that routing inserts and updates to separate target tables (Mapping 2) was much faster than interleaving them (Mapping 1). Trace files revealed that Mapping 1 did not use array inserts, which are more efficient, because inserts and updates were interleaved instead of grouped. Grouping operations allows arrays of multiple rows to be inserted/updated together, reducing database communication overhead. The key implication is that separating inserts from updates can yield major performance benefits for Informatica ETL mappings.

Uploaded by

Ur's Gopinath
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views6 pages

Insert, Update Ordering in Informatica

The document discusses how the order of inserts and updates to a target table in Informatica mappings can significantly impact performance. Testing showed that routing inserts and updates to separate target tables (Mapping 2) was much faster than interleaving them (Mapping 1). Trace files revealed that Mapping 1 did not use array inserts, which are more efficient, because inserts and updates were interleaved instead of grouped. Grouping operations allows arrays of multiple rows to be inserted/updated together, reducing database communication overhead. The key implication is that separating inserts from updates can yield major performance benefits for Informatica ETL mappings.

Uploaded by

Ur's Gopinath
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Insert / Update ordering in Informatica mappings

ETL-Performance.com Stephen Barr


Does the order of inserts & updates to a target make a substantial difference to the overall performance of the mapping, and perhaps more importantly to the overall scalability of the solution? The resounding answer is YES! Test case My source and target tables exist in the same database but within different schemas. Ive designed the data such that 50% of the rows from the source will be updates and 50% will be inserts.
SOURCE@INFADB>select count(*) 2 from insert_update_source 3 / COUNT(*) ---------202992 Elapsed: 00:00:01.45 SOURCE@INFADB>select action, count(*) 2 from insert_update_source 3 group by action 4 / ACTION COUNT(*) ------ ---------UPDATE 101496 INSERT 101496 Elapsed: 00:00:00.48

Using these sources and targets I created two mappings. Mapping 1 interleaved inserts / updates

In this mapping, the target will receive an insert, update, insert, etc. This has been designed to signify a worse case scenario.

Mapping 2 inserts / updates routed to separate targets In this mapping, there are two versions of the target. The inserts are routed to one target, while the updates are routed to the other. We then use the Target Load Plan to choose which one we should load first.

The scripts for creating the source and target tables are available at the bottom of this document.

Results Overall run times Mapping 1 Mapping 2 6 minutes 14 seconds 2 minutes 25 seconds

As you can see there is a massive difference in the runtimes between the two mappings. Obviously, something fundamental is happening in the first making which is making is perform so poorly and from looking at the oracle trace files we can see exactly what the issue is. From the trace of the target we can see the overall statistics for the insert statement from the first mapping
INSERT INTO INSERT_UPDATE_TARGET(ID,OWNER,OBJECT_NAME,SUBOBJECT_NAME, OBJECT_ID,DATA_OBJECT_ID,OBJECT_TYPE,CREATED,LAST_DDL_TIME,TIMESTAMP,STATUS, TEMPORARY,GENERATED,SECONDARY,ACTION) VALUES ( :1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15)

call ------Parse Execute Fetch ------total

count -----1 100922 0 -----100923

cpu elapsed disk query current -------- ---------- ---------- ---------- ---------0.00 0.00 0 0 0 26.60 27.14 46 1990 323326 0.00 0.00 0 0 0 -------- ---------- ---------- ---------- ---------26.60 27.14 46 1990 323326

rows ---------0 101496 0 ---------101496

We can see that there were 100923 executions of the insert statement, resulting in 323326 current block gets. However, if we look at the second mapping
INSERT INTO INSERT_UPDATE_TARGET(ID,OWNER,OBJECT_NAME,SUBOBJECT_NAME, OBJECT_ID,DATA_OBJECT_ID,OBJECT_TYPE,CREATED,LAST_DDL_TIME,TIMESTAMP,STATUS, TEMPORARY,GENERATED,SECONDARY,ACTION) VALUES ( :1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15)

call count ------- -----Parse 1 Execute 705 Fetch 0 ------- -----total 706

cpu elapsed disk query current -------- ---------- ---------- ---------- ---------0.00 0.00 0 0 0 3.50 5.50 1 4005 28802 0.00 0.00 0 0 0 -------- ---------- ---------- ---------- ---------3.50 5.50 1 4005 28802

rows ---------0 101496 0 ---------101496

You can see that there were only 706 executions of the insert statement with only 28802 current block gets. If we stack up the figures we can see this more starkly

Map1 insert Executions CPU time Elapsed time Block gets 100923 26.6 27.14 322326

Map 2 - insert 706 3.5 5.5 28802

Map1 update 101497 37.39 50.58 110023

Map 2 update 101497 30.68 42.67 107386

As you can see, there is a huge difference in the inserts especially when it comes to cpu time and the number of block gets. The reason? Array inserts. Informatica uses the native Oracle Call Interface to communicate with the oracle server. One of the features of the OCI interface is its ability to allow an OCI client to perform array inserts / updates. This means that for a single execution of the statement, multiple rows of data are processed. We can see this is happening because of the rows / executions for our insert statement is > 1. In fact, the average array size in this case looks to be ~170 rows of data. These array operations are much more efficient that ordinary insert operations. So why is one mapping performing array operations but the other is not? Informatica has implemented its OCI interface in a very simple generic way. If an insert statement is receiving by the writer process it will start to build an array. If another insert statement comes through, then this is simply added to the existing array. When the array is full then Informatica will send that array to oracle for processing as a single message. However, if we are in the middle of building an array of inserts and the writer receives an update, then Informatica will send the insert array as it currently is, followed by the update. Therefore, it we have interleaved inserts then updates, then we are effectively not using arrays at all. We can see this from the raw trace files In mapping 1, we can see the inserts and updates are interleaved almost perfectly
EXEC #1:c=0,e=287,p=0,cr=0,cu=3,mis=0,r=1,dep=0,og=1,tim=26087356343 WAIT #1: nam='SQL*Net message to client' ela= 6 driver id=1413697536 obj#=-1 tim=26087356647 WAIT #1: nam='SQL*Net message from client' ela= 873 driver id=1413697536 obj#=-1 tim=26087357719 EXEC #2:c=0,e=330,p=0,cr=2,cu=1,mis=0,r=1,dep=0,og=1,tim=26087358483 WAIT #2: nam='SQL*Net message to client' ela= 6 driver id=1413697536 obj#=-1 tim=26087358803 WAIT #2: nam='SQL*Net message from client' ela= 877 driver id=1413697536 obj#=-1 tim=26087359884 EXEC #1:c=0,e=268,p=0,cr=0,cu=3,mis=0,r=1,dep=0,og=1,tim=26087360720 WAIT #1: nam='SQL*Net message to client' ela= 6 driver id=1413697536 obj#=-1 tim=26087361027 WAIT #1: nam='SQL*Net message from client' ela= 885 driver id=1413697536 obj#=-1 tim=26087362116 EXEC #2:c=0,e=331,p=0,cr=2,cu=1,mis=0,r=1,dep=0,og=1,tim=26087362877 WAIT #2: nam='SQL*Net message to client' ela= 7 driver id=1413697536 obj#=-1 tim=26087363197 WAIT #2: nam='SQL*Net message from client' ela= 869 driver id=1413697536 obj#=-1 tim=26087364264 #bytes=1 p3=0 #bytes=1 p3=0

#bytes=1 p3=0 #bytes=1 p3=0

#bytes=1 p3=0 #bytes=1 p3=0

#bytes=1 p3=0 #bytes=1 p3=0

EXEC #1 is our insert, EXEC #2 is our update. However, looking at the trace file from mapping 2, we can see that the operations are grouped together
EXEC #1:c=0,e=205,p=0,cr=2,cu=1,mis=0,r=1,dep=0,og=1,tim=28092846779 WAIT #1: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=28092846876 WAIT #1: nam='SQL*Net message from client' ela= 488 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=28092847418 EXEC #1:c=0,e=264,p=0,cr=2,cu=1,mis=0,r=1,dep=0,og=1,tim=28092847781 WAIT #1: nam='SQL*Net message to client' ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=28092847887

WAIT #1: nam='SQL*Net message from client' ela= 425 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=28092848366 EXEC #1:c=0,e=209,p=0,cr=2,cu=1,mis=0,r=1,dep=0,og=1,tim=28092848672 WAIT #1: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=28092848771 WAIT #1: nam='SQL*Net message from client' ela= 414 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=28092849238 EXEC #1:c=0,e=207,p=0,cr=2,cu=1,mis=0,r=1,dep=0,og=1,tim=28092849540 WAIT #1: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=28092849638 WAIT #1: nam='SQL*Net message from client' ela= 403 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=28092850093 EXEC #1:c=0,e=207,p=0,cr=2,cu=1,mis=0,r=1,dep=0,og=1,tim=28092850387

We can see the massive difference in the traffic generated between Informatica and oracle when comparing both mappings Mapping 1 SQL*Net message to client SQL*Net message from client Mapping 2 SQL*Net message to client SQL*Net message from client 705 705 0.00 0.08 0.00 5.02 100922 100922 0.02 0.08 0.74 63.88

This is a massive reduction in the time the mapping is spending communicating with the oracle database and if we start to scale these figures up to production volumes you can see that these sort of issues need to be seriously considered. Those of you with a good eye will have spotted something a bit strange. The figures for the update statement are effectively the same for both mappings. It actually looks like Informatica does not support array updates. If this is true then it seems like a glaring hole in its OCI implementation. However, if you have evidence to the contrary let me know! Implications This was a very contrived test on a small single cpu box. I was using relatively small volumes and very simple structures. However, the trace files from oracle reflect the magnitude of the difference between the two approaches which will hold true even for the biggest of systems or the most complex of mappings. Its very easy to detect whether or not youre experiencing these issues and if you are then the performance benefits you could glean from separating your inserts / updates from each other could be fantastic especially given how little effort is required to make a change like this.

Scipts to create source & target tables SOURCE create sequence insert_update_seq; create table insert_update_source as select insert_update_seq.nextval, owner, object_name, subobject_name, object_id, data_object_id, object_type, created, last_ddl_time, timestamp, status, temporary, generated, secondary, decode(mod(rownum,2),0,'INSERT','UPDATE') as action from dba_objects / insert into insert_update_source ( select insert_update_seq.nextval, owner, object_name, subobject_name, object_id, data_object_id, object_type, created, last_ddl_time, timestamp, status, temporary, generated, secondary, decode(mod(rownum,2),0,'INSERT','UPDATE') as action from insert_update_source ) / / / commit; exec dbms_stats.gather_table_stats(user, 'INSERT_UPDATE_SOURCE');

TARGET create table insert_update_target ( ID NUMBER, OWNER VARCHAR2(30), OBJECT_NAME VARCHAR2(128), SUBOBJECT_NAME VARCHAR2(30),

OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE CREATED LAST_DDL_TIME TIMESTAMP STATUS TEMPORARY GENERATED SECONDARY ACTION ) /

NUMBER, NUMBER, VARCHAR2(19), DATE, DATE, VARCHAR2(19), VARCHAR2(7), VARCHAR2(1), VARCHAR2(1), VARCHAR2(1), VARCHAR2(6)

insert into insert_update_target ( select * from source.insert_update_source where action = 'UPDATE' ) / commit; create index id_idx on insert_update_target(id); exec dbms_stats.gather_table_stats(user,'INSERT_UPDATE_TARGET',cascade=>TR UE);

You might also like