Informatica Performance Tuning

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Jyotheswar Kuricheti

Agenda:
1. Performance Tuning Overview

2. Identify Bottlenecks
3. Optimizing at different levels :
Target
Source
Mapping
Session
System
2

Performance Tuning Overview:

What is Performance Tuning?


Goal of Performance Tuning

How do you measure performance?


Throughput
Why is it Critical?

Load time is very critical to meet SLA needs of the data availability
in the reports.
How do you improve performance?

Identify Bottlenecks
Eliminate Bottlenecks
Test Load option to see if any improvement in the performance
Add partitions
Change one variable at a time
5

Reasons for Session Performance Issues:


CPU : CPU intensive operations like string manipulation inside
Expression transformation
Memory/Disk access :
File system read/write issues
Paging (lookup cache etc.) due to non availability of RAM
Non availability of buffer blocks
Network : Database and PowerCenter servers connected by WAN
Input/output Operations
Poor/Complex Design
Incorrect Load Strategies

The optimization can be done on different levels


of Informatica:
1) Target level
2) Source level
3) Mapping level
4) Transformation level
5) Session level
6) Grid Level
7) Component level
8) System level

Identify Bottlenecks :

WRT_8165 : TIMEOUT BASED COMMIT POINT


This is a common message that can be seen in the session log where
there are session performance issues

Signifies that there arent enough rows available in memory to insert


and issue a commit
This message means that there is a bottle neck either in source, target
or any of the transformations and the bottleneck needs to be identified
and removed to improve session performance.

Methods to identify performance bottlenecks:


1. Run test sessions
2. Analyze performance details
3. Analyze thread statistics
4. Monitor system performance
1.Run Test Sessions: Running a test load of few records to read data
from a flat file or to write to a flat file target to identify source and target
bottlenecks. Precisely depending upon the throughput performance will
be measured
2.Analyze thread statistics: Analyze thread statistics to determine the
optimal number of partition points.

10

3.Analyze performance details: Like performance counters, to


determine where session performance decreases.
Use Collect Performance Data in Session properties
Areas to check when repository performance is a concern
User would like to see statistics from the monitor

4.Monitor system performance: You can use system monitoring tools


to view the percentage of CPU use, I/O waits, and paging to identify
system bottlenecks. Use the Workflow Monitor to view system resource
usage.

11

2.Using Thread statistics: This is the way where we get statistics from
a session log file. Before going we need to know few points about
Thread.
DTM (Data Transformation manager) create a master thread to run our
sessions. For each target load order group in a mapping, the master
thread can create several threads. The types of threads depend on the
session properties and the transformations in the mapping. The number
of threads depends on the partitioning information for each target load
order group in the mapping.
1. Mapping Threads
2. Pre- and Post-Session Threads
3. Reader Threads
4. Transformation Threads
5. Writer Threads
Thread analysis is to decide the mapping performance depending upon
the statistics of threads. we can use these statistics to identifying the
source, target, or transformation bottlenecks.

From session log file we will have 4 entries which give details about
performance.

12

1. Run Time : total time taken by a thread


2. Idle Time: time period where thread is idle.
3. Busy: (run time - idle time) / run time X 100
4. Thread work time: time taken by each transformation in a thread
Example :
MANAGER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition
point [SQ_XXXX] has completed: Total Run Time = [576.620988] secs, Total Idle Time =
[535.601729] secs, Busy Percentage = [7.113730].
MANAGER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of
partition point [SQ_XXXX] has completed: Total Run Time = [577.301318] secs, Total Idle Time
= [0.000000] secs, Busy Percentage = [99.000000].
LKP_ADDRESS: 20.000000 percent
AGG_ADDRESS: 79.000000 percent
MANAGER> PETL_24022 Thread [WRITER_1_1_1] created for the write stage of partition
point(s) [TGT_XXXX] has completed: Total Run Time = [577.363934] secs, Total Idle Time =
[492.602374] secs, Busy Percentage = [14.680785].
The thread with the highest busy percentage identifies the bottleneck in the session. Blindly
we can't add a partition point also to enable few thread to the process because the CPU may
be busy with other task and just we are putting more pressure on CPU. Even we can ignore
high busy percentage if total run time is less than 60 seconds
13

Optimizing at different levels :

14

(1).Target Level optimization:


We are having two different target types.
1. Flat File
2. Database.
If you are facing issue with flat file target then problem may not be with
the flat file target, but problem lies with the storage space or with
storage drive.
Database: While loading into database we need to consider the
following points.
1. Drop indexes and key constraints
2. Increase checkpoint intervals
3. Use bulk loading
4. Use external loading
5. Minimize deadlocks
6. Increase database network packet size
7. Optimize Oracle target databases
15

Drop indexes and key constraints : The loading of data will be slow
on indexes or key constraints defined tables. Use pre-session
commands to drop indexes before session loading. After loading the
data the constraints or indexes need to be built again using postsession commands.
Increase checkpoint intervals: The performance of loading depends
on how many less check points do we have. To do so increase the
checkpoint interval in the database

Use bulk loading: Integration Service bypasses the database log,


which speeds performance. Recovery is not possible.
Use external loaders: With almost all the databases we have self built
loading mechanism. Like for oracle SQL loader, for Teradata we can
use Teradata external loader, to increase the performance we can load
separate pipelines for separate partitions.
Minimize deadlocks: We need to avoid attacking on same target from
multiple sources systems, i.e. using multiple ways we shouldn't try to
populate data at a single target. Use different target connection groups.
16

Increase database network packet size: If you realized that problem is


with database consult your DBA and try to increase network packet size
in listener.ora and tnsnames.ora
Optimize Oracle target databases: With help of your DBA you can
increase storage segments size or any database level changes can be
created or added. Tune the Oracle redo log in the init.ora file

17

(2).Source Level Optimization:


1. Optimize the query.
2. Use conditional filters
3. Increase database network packet size
Optimize the query: Join multiple sources in one SQ with hints,
Indexes on group by and order by classes. Configure the source
database to run parallel queries.

Use conditional Filters: Apply filter on source data, but we need to do


complete analysis before applying filters on source data. Connect ports
from SQ only if they are needed in target.
Increase database network packet size: If you realized that problem
is with database consult your DBA and try to increase network packet
size in listener.ora and tnsnames.ora

18

(3). Mapping Level Optimization:


Mapping level optimization is a time taking process . Eliminate
unwanted transformations, unwanted fields and links. The mapping
optimization has to be done after source and target level optimization.
1. Optimize the flat file sources
2. Configure single-pass reading
3. Optimize Simple Pass Through mappings
4. Optimize filters
5. Optimize data type conversions
6. Optimize expressions
Optimize the flat file sources: by avoiding double or single quotes and
escape characters we can optimize flat file sources for delimiter files
and managing the sequential buffer length for fixed files.
Configure single-pass reading: Consider using single-pass reading if
you have multiple sessions that use the same sources. It allows you to
populate multiple targets with one source qualifier. It avoids using of a
joiner for RDBMS source tables.

19

20

Optimize Simple Pass Through mappings : If we are passing data


from source to target, connect directly from source qualifier to target. If
use wizard to create Simple Pass Through mappings it will add an
expression in between target and source qualifier.
Optimize filters: If your source is a relational table use filter at source
qualifier. It restricts some of the rows which are not valid for mapping
process.
For flat file sources, use filter transformation after the source qualifier.
Avoid complex conditions at filter, go for integer or true/false conditions.
Optimize datatype conversions: eliminate unnecessary datatype
conversions. Use integer values in conditions of Lookup and Filter
transformations. Before doing data conversion be aware of source and
target data types.
Optimize expressions: Factoring Out Common Logic. Minimizing
Aggregate Function Calls. Ex: Use SUM(COLUMN_A + COLUMN_B)
instead of SUM(COLUMN_A) + SUM(COLUMN_B)
Call lookups conditionally. Use local variables in expression
transformation. Use operators instead of Functions.
21

(4).Transformation level Optimization:


The Transformation level optimization we can consider as a part of
mapping optimization. Here we will get more information how to handle
transformations more effectively.

Optimizing Aggregator Transformations: They often slow performance


because they must group data before processing it.

1. Group on Numeric : Group on numeric columns instead of string and


date columns.
2. Group on Indexed Columns:
3. Using Sorted input: It reduces the amount of data cached which
improves performance.
4. Reduce complex logic in aggregator expressions
5. Using Incremental Aggregation
5. Filter Data Before You Aggregate.
6. Limiting Port Connections

22

Optimizing Joiner Transformations : Joiner joins data of different sources into a


single pipeline.
1.Designate the master source as the source with fewer duplicate key values.
2.Designate the master source as the source with fewer rows as it compares each
row of the detail source against the master source
3.Perform joins in a database. Use SQ to perform join for relational tables
4.Use Sorted Data to join

Optimizing Lookup Transformations :

Caching lookup tables:


Use the appropriate cache type: Static, Shared and Persistent caches
Enable concurrent caches: number of additional concurrent pipelines is set to
one or more
Optimize Lookup policy on multiple matching: use any matching value,
performance can improve because the transformation does not index on all ports
but it still returns first value that matches lkp condition.
Reduce the number of cached rows.
Override the ORDER BY statement.

23

Optimizing the Lookup Condition: =,<,>,<=,>=,!=


Filtering Lookup Rows
Indexing the conditional columns in Lookup Table
Optimizing Multiple Lookups

Optimizing Sequence Generator Transformations: create a


reusable Sequence Generator and use it in multiple mappings
simultaneously. By configuring the Number of Cached Values property
for sequence number we get some good results.
Optimizing Sorter Transformations: If the Integration Service cannot
allocate enough memory to sort data, it fails the session. For best
performance, configure Sorter cache size with a value less than or
equal to the amount of available physical RAM on the Integration
Service machine. Default size is 16 MB
Use the following formula to determine the size of incoming data:
# input rows ([Sum(column size)] + 16
24

Optimizing Source Qualifier Transformations:


Select distinct, filter, tune the query
Optimizing SQL Transformations:
Use query mode instead of script mode
Do not use transaction statements like commit, rollback in an SQL
transformation query.

In query mode construct a static query by using parameter binding


instead of string substitution in the SQL Editor
Choose static connection instead of dynamic connection

25

(5).Session Level optimization:


1.Grid
2.Pushdown Optimization
3.Concurrent Sessions and Workflows
4.Buffer Memory
5.Caches
6.Target-Based Commit
7.Real-time Processing
8.Staging Areas
9.Log Files
10.Error Tracing
11.Post-Session Emails

26

1.Grid:
A Load Balancer distributes tasks to nodes without overloading any
node.

A grid can improve performance when you have a performance


bottleneck in the extract and load steps of a session, when memory or
temporary storage is a performance bottleneck
Ex : Sorter, Aggregator, Joiner (stores intermediate results)
2.Pushdown Optimization: Integration Service executes SQL against
the source or target database instead of processing the transformation
logic within the Integration Service.
3.Concurrent Sessions and Workflows :
4.Buffer Memory: Adjust DTM Buffer Size & Default Buffer Block Size

27

5.Caches:
Limit the Number of Connected Ports
With a 64-bit platform, the Integration Service is not limited to the 2 GB
cache limit of a 32-bit platform.
If the allocated cache is not large enough to store the data, the
Integration Service stores the data in a temporary disk file, a cache file.
Performance slows each time the Integration Service pages to a
temporary file.
The Transformation_readfromdisk or Transformation_writetodisk
counters for any Aggregator, Rank, or Joiner transformation indicate the
number of times the Integration Service pages to disk to process the
transformation.
6.Target-Based Commit :
If the commit interval is too high, the Integration Service may fill the
database log file and cause the session to fail.

28

7.Real-time Processing:
Increase the flush latency to improve throughput
Source-based commit interval determines how often the Integration
Service commits real-time data to the target. To obtain the fastest
latency, set the source-based commit to 1.
8.Staging Areas:
The Integration Service can read multiple sources with a single pass,
which can reduce the need for staging areas.
9.Log Files:
Workflows and sessions always create binary logs which can be
accessed in the Administrator tool.
10.Error Tracing:
Set the tracing level appropriately. To debug use Verbose. Use Terse
when you do not want to log error messages for reject data.

29

11.Post-Session Emails:
configure the session to write to log file when you configure post-session
email to attach a session log. Enable flat file logging

(6).Optimizing Grid Deployments:

Add nodes to the grid.


Increase storage capacity and bandwidth.
Use shared file systems.
Use a high-throughput network

(7).Optimizing PowerCenter Repository Performance:


Ensure the Repository Service process runs on the same machine where
the repository database resides
Order conditions in object queries
Use a single-node tablespace for the PowerCenter repository if you install it
on a DB2 db.
Optimize the database schema for the PowerCenter repository if you install
it on a DB2 or Microsoft SQL Server database by
30

enabling the Optimize Database Schema option for the Repository


Service in the Administration Console.

Optimizing Integration Service Performance:


Use native drivers instead of ODBC drivers for the Integration Service.
Run the Integration Service in ASCII data movement mode if character
data is 7-bit ASCII or EBCDIC. ASCII mode take 1 byte to store each
character where as UNICODE takes 2 bytes.
Cache PowerCenter metadata for the Repository Service.
Run Integration Service with high availability : Integration Service
recovers workflows and sessions that may fail because of temporary
network or machine failures. To recover from a workflow or session, the
Integration Service writes the states of each workflow and session to
temporary files in a shared directory which may decrease performance

31

(8).Optimizing the System:


Improve network speed :
Minimize the number of network hops between the source and target
databases and the IS
A local disk can move data 5 to 20 times faster than a network. Store flat files
as source or target in IS machine
Move Target DB to a Server System if possible
Ask Network Engineer to provide enough Bandwidth
Use multiple CPUs to run multiple sessions in parallel
Reduce paging
Use processor binding
Using Pipeline Partitions:
After you tune the application, databases, and system for maximum singlepartition performance, you may find that the system is under-utilized. At this
point, you can configure the session to have two or more partitions
To improve performance, ensure the number of pipeline partitions equals the
number of database partitions.
32

Use the database partitioning partition type for source and target
databases. Enable parallel queries/inserts
SQ : pass-through partition
Filter : round-robin partition
Sorter : hash auto-keys partitioning. Delete default partition at
Aggregator
Performance Counters :
All transformations have counters. The Integration Service tracks the
number of input rows, output rows, and error rows for each
transformation. Some transformations have performance counters:
right-click the session in the Workflow Monitor and choose Properties.
Click the Properties tab in the details dialog box.
Errorrows
Readfromcache and Writetocache
Readfromdisk and Writetodisk
Rowsinlookupcache
33

If these counters display any number other than zero, you can increase
the cache sizes to improve session performance.

34

2014 by Author (Jyotheswar Kuricheti).


All rights reserved. No part of this document may be reproduced or transmitted in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise, without prior written
permission of Author.

35

You might also like