Performance Bottlenecks
Performance Bottlenecks
Target 2. Source 3. Mapping 4. Session 5. System Use following methods to identify performance bottleneck: 1. Run test sessions: configure test session to read from flat file source and flat file target to identify source and target bottleneck. 2. Study performance details and thread statistics: use thread statistics to identify source, target, or transformation bottlenecks. By default, the Integration Service uses one reader thread, one transformation thread, and one writer thread to process a session. The session log provides the following thread statistics: Run time. Amount of time the thread was running. Idle time. Amount of time the thread is idle. Busy. Percentage of the run time the thread is not idle. Formula to determine percent of time a thread is busy: (run time - idle time) / run time X 100. If a transformation thread is 100% busy, consider adding a partition point in the segment. When you add partition points to the mapping, the Integration Service increases the number of transformation threads it uses for the session. If the reader or writer thread is 100% busy, consider using string data types in the source or target ports. Non-string ports require more processing. You can identify which thread the Integration Service uses the most by reading the thread statistics in the session log. 3. Monitor system performance. You can use system monitoring tools to view the percentage of CPU usage, I/O waits, and paging to identify system bottlenecks. Identifying Target bottleneck: The most common performance bottleneck occurs when the Integration Service writes to a target database. To identify a target bottleneck, Configure a copy of the session to write to a flat file target. If the session performance increases significantly when you write to a flat file, you have a target bottleneck. If a session already writes to a flat file target, you probably do not have a target bottleneck. When the Integration Service spends more time on the writer thread than the transformation or reader threads, you have a target bottleneck. Causes of target bottlenecks may include small check point intervals, small database network packet sizes, or problems during heavy loading operations. Identifying Source Bottlenecks: Performance bottlenecks can occur when Integration Service reads from a source database. Read the thread statistics, when IS spends more time on reader thread than the transformation or writer threads, you have a source bottleneck. If session reads from a flat file source, do not have a source bottleneck. can improve session performance by setting the number of bytes the Integration Service reads per line if you read from a flat file source. If the session reads from a relational source, use the following methods to identify source bottlenecks: Use a Filter transformation: in the mapping to measure the time it takes to read source data. Add a Filter transformation in the mapping after each source qualifier. Set the filter condition to false so that no data is processed passed the filter transformation. If the time it takes to run the new session remains about the same, then you have a source bottleneck. Create read test mapping: to identify source bottlenecks. It will isolate the read query by removing the transformation in the mapping. Complete the following steps to create a read test mapping:
1.Make a copy of the original mapping.2.In the copied mapping, keep only the sources, source qualifiers, and any custom joins or queries.3.Remove all transformations.4.Connect the source qualifiers to a file target.5.Run a session against the read test mapping. If the session performance is similar to the original session, you have a source bottleneck. Executing database query: directly against the source database. 1.Copy the read query directly from the session log. Execute the query against the source database with a query tool such as isql. On Windows, you can load the result of the query in a file. On UNIX systems, you can load the result of the query in /dev/null. 2.Measure the query execution time and the time it takes for the query to return the first row. If there is a long delay between the two time measurements, you can use an optimizer hint to eliminate the source bottleneck.3.Causes of source bottlenecks may include an inefficient query or small database network packet sizes. Identifying Mapping Bottlenecks: If source or target do not have bottleneck, might mapping have bottleneck. identify mapping bottlenecks by 1. adding a Filter before each target. Set the filter condition to false so that no data is loaded into the target tables. If the time it takes to run the new session is the same as the original session, you have a mapping bottleneck.2.also identify by using performance details. High errorrows and rowsinlookupcache counters indicate a mapping bottleneck. 3.To determine which transformation in the mapping is the bottleneck, add pass-through partition points to all transformations possible and read the thread statistics in the session log. When the Integration Service spends more time on one transformation thread than the reader, writer, or other transformation threads, that transformation has a bottleneck Identifying Session Bottlenecks: If source, target, or mapping not have bottleneck, you may have a session bottleneck. You can identify a session bottleneck by using 1. Performance details: Enable Collect Performance Data in the Performance settings on the session properties to display information about each transformation. All transformations have some basic counters that indicate the number of input rows, output rows, and error rows.2. Small cache size, low buffer memory, and small commit intervals can cause session bottlenecks. Identifying System Bottlenecks: After you tune the source, target, mapping, and session, consider tuning the system. You can identify system bottlenecks by using 1.system tools to monitor CPU usage, memory usage, and paging. The Integration Service uses system resources to process transformations, run sessions, and read and write data. The Integration Service also uses system memory for other data such as aggregate, joiner, rank, and cached lookup tables. You can use system performance monitoring tools to monitor the amount of system resources the Integration Service uses and identify system bottlenecks. On Windows, you can use system tools in the Task Manager or Administrative Tool Performance Monitor, click Start > Programs > Administrative Tools, and choose Performance Monitor. Percent processor time. If you have more than one CPU, monitor each CPU for percent processor time. If the processors are utilized at more than 80%, you may consider adding more processors. Pages/second. If pages/second is greater than five, you may have excessive memory pressure (thrashing). You may consider adding more physical memory. Physical disks percent time. The percent of time that the physical disk is busy performing read or write requests. If the percent of time is high, tune the cache for PowerCenter to use in-memory cache instead of writing to disk. If you tune the cache, requests are still in queue, and the disk busy percentage is at least 50%, add another disk device or upgrade to a faster disk device. You can also use a separate disk for each partition in the session.Physical disks queue length. The number of users waiting for access to the same disk device. If physical disk queue length is greater than two, you may consider adding another
disk device or upgrading the disk device. You also can use separate disks for the reader, writer, and transformation threads. Server total bytes per second. This is the number of bytes the server has sent to and received from the network. You can use this information to improve network bandwidth. On UNIX, se following UNIX tools to identify system bottlenecks: lsattr -E -I sys0. Use this tool to view current system settings. This tool shows maxuproc, the maximum level of user background processes. You may consider reducing the amount of background process on the system.iostat. Use this tool to monitor loading operation for every disk attached to the database server. Iostat displays the percentage of time that the disk was physically active. High disk utilization suggests that you may need to add more disks. If you use disk arrays, use utilities provided with the disk arrays instead of iostat. vmstat or sar -w. Use this tool to monitor disk swapping actions. Swapping should not occur during the session. If swapping does occur, you may consider increasing the physical memory or reduce the number of memory-intensive applications on the disk. sar -u. Use this tool to monitor CPU loading. This tool provides percent usage on user, system, idle time, and waiting time. If the percent time spent waiting on I/O (%wio) is high, you may consider using other under-utilized disks. For example, if the source data, target data, lookup, rank, and aggregate cache files are all on the same disk, consider putting them on different disks. Once you determine the location of a performance bottleneck, use the following guidelines to eliminate the bottleneck: Eliminate source and target database bottlenecks. Have the database administrator optimize database performance by optimizing the query, increasing the database network packet size, or configuring index and key constraints. Eliminate mapping bottlenecks: Fine tune the pipeline logic and transformation settings and options in mappings to eliminate mapping bottlenecks. Eliminate session bottlenecks. Optimize the session strategy and use performance details to help tune session configuration. Eliminate system bottlenecks. Have the system administrator analyze information from system monitoring tools and improve CPU and network performance. Optimizing the Target: You can optimize the following types of targets: Flat file: If you use a shared storage directory for flat file targets, you can optimize session performance by ensuring that the shared storage directory is on a machine that is dedicated to storing and managing files, instead of performing other tasks. If the Integration Service runs on a single node and the session writes to a flat file target, you can optimize session performance by writing to a flat file target that is local to the Integration Service process node. Relational: If session writes to a relational target, perform the following tasks to increase performance: Drop indexes and key constraints: When you define key constraints or indexes in target tables, you slow the loading of data to those tables. To improve performance, drop indexes and key constraints before running the session. You can rebuild those indexes and key constraints after the session completes. If you decide to drop and rebuild indexes and key constraints on a regular basis, you can use the following methods to perform these operations each time you run the session: Use pre-load and post-load stored procedures. Use pre-session and post-session SQL commands. Note: To optimize performance, use constraint-based loading only if necessary. Increase checkpoint intervals: The Integration Service performance slows each time it waits for the database to perform a checkpoint. To increase performance, consider increasing the database checkpoint interval. When you increase the database checkpoint interval, you increase the likelihood
that the database performs checkpoints as necessary, when the size of the database log file reaches its limit. Use bulk loading: can be used to improve the performance of a session that inserts a large amount of data into a DB2, Sybase ASE, Oracle, or Microsoft SQL Server database. Configure bulk loading in the session properties. When bulk loading, the Integration Service bypasses the database log, which speeds performance. Without writing to the database log, however, the target database cannot perform rollback. As a result, you may not be able to perform recovery. When you use bulk loading, weigh the importance of improved session performance against the ability to recover an incomplete session. When bulk loading to Microsoft SQL Server or Oracle targets, define a large commit interval to increase performance. Microsoft SQL Server and Oracle start a new bulk load transaction after each commit. Use external loading: You can use an external loader to increase session performance. If you have a DB2 EE or DB2 EEE target database, you can use the DB2 EE or DB2 EEE external loaders to bulk load target files. The DB2 EE external loader uses the Integration Service db2load utility to load data. The DB2 EEE external loader uses the DB2 Autoloader utility. If you have a Teradata target database, you can use the Teradata external loader utility to bulk load target files. To use the Teradata external loader utility, set up the attributes, such as Error Limit, Tenacity, MaxSessions, and Sleep, to optimize performance. If the target database runs on Oracle, you can use the Oracle SQL*Loader utility to bulk load target files. When you load data to an Oracle database using a pipeline with multiple partitions, you can increase performance if you create the Oracle target table with the same number of partitions you use for the pipeline. If the target database runs on Sybase IQ, you can use the Sybase IQ external loader utility to bulk load target files. If the Sybase IQ database is local to the Integration Service process on the UNIX system, you can increase performance by loading data to target tables directly from named pipes. If you run the Integration Service on a grid, configure the Integration Service to check resources, make Sybase IQ a resource, make the resource available on all nodes of the grid, and then, in the Workflow Manager, assign the Sybase IQ resource to the applicable sessions. Minimize deadlocks: If the Integration Service encounters a deadlock when it tries to write to a target, the deadlock only affects targets in the same target connection group. The Integration Service still writes to targets in other target connection groups. Encountering deadlocks can slow session performance. To improve session performance, you can increase the number of target connection groups the Integration Service uses to write to the targets in a session. To use a different target connection group for each target in a session, use a different database connection name for each target instance. You can specify the same connection information for each connection name. Increase database network packet size: Default packet size is 4096 bytes. If you write to Oracle, Sybase ASE or, Microsoft SQL Server targets, you can improve the performance by increasing the network packet size. Increase the network packet size to allow larger packets of data to cross the network at one time. Increase the network packet size based on the database you write to: Oracle. You can increase the database server network packet size in listener.ora and tnsnames.ora. Consult your database documentation for additional information about increasing the packet size, if necessary. Sybase ASE and Microsoft SQL. Consult your database documentation for information about how to increase the packet size. For Sybase ASE or Microsoft SQL Server, you must also change the packet size in the relational connection object in the Workflow Manager to reflect the database server packet size. Optimize Oracle target databases: If the target database is Oracle, you can optimize the target database by checking the storage clause, space allocation, and rollback or undo segments. When you write to an Oracle database, check the storage clause for database objects. Make sure that tables are using large initial and next values. The database should also store table and index data in separate
tablespaces, preferably on different disks. When you write to Oracle databases, the database uses rollback or undo segments during loads. Ask the Oracle database administrator to ensure that the database stores rollback or undo segments in appropriate tablespaces, preferably on different disks. The rollback or undo segments should also have appropriate storage clauses. You can optimize the Oracle database by tuning the Oracle redo log. The Oracle database uses the redo log to log loading operations. Make sure the redo log size and buffer size are optimal. You can view redo log properties in the init.ora file. If the Integration Service runs on a single node and the Oracle instance is local to the Integration Service process node, you can optimize performance by using IPC protocol to connect to the Oracle database. You can set up Oracle database connection in listener.ora and tnsnames.ora. Optimizing the Source: If the session reads from a relational source, review the following suggestions for improving performance: Optimize the query: 1. Tune the query to return rows faster or create indexes for queries that contain ORDER BY or GROUP BY clauses. 2. If a session joins multiple source tables in one Source Qualifier then improve performance by optimizing the query with optimizing hints. DBA can analyze the query, and then create optimizer hints and indexes for the source tables to tell the database how to execute the query for a particular set of source tables.3. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes. 4. The query that the Integration Service uses to read data appears in the session log. You can also find the query in the Source Qualifier transformation. Have the database Use optimizing hints if there is a long delay between when the query begins executing and when PowerCenter receives the first row of data. Configure optimizer hints to begin returning rows as quickly as possible, rather than returning all rows at once. This allows the Integration Service to process rows parallel with the query execution. Once you optimize the query, 5. Use the SQL override option to take full advantage of these modifications. 6. You can also configure the source database to run parallel queries to improve performance Use conditional filters. Use the PowerCenter conditional filter in the Source Qualifier to improve performance. Because simple source filter on the source database can sometimes negatively impact performance because of the lack of indexes. It depends on the session. For example, if multiple sessions read from the same source simultaneously, the PowerCenter conditional filter may improve performance. However, some sessions may perform faster if you filter the source data on the source database. You can test the session with both the database filter and the PowerCenter filter to determine which method improves performance Increase database network packet size. You can improve the performance of a source database by increasing the network packet size, which allows larger packets of data to cross the network at one time. Connect to Oracle databases using IPC protocol. You can use IPC protocol to connect to the Oracle database to improve performance. If you are running the Integration Service on a single node and the Oracle instance is local to the Integration Service process nodeYou can set up an Oracle database connection in listener.ora and tnsnames.ora. Use the FastExport utility to extract Teradata data: FastExport is a utility that uses multiple Teradata sessions to quickly export large amounts of data from a Teradata database. You can create a PowerCenter session that uses FastExport to read Teradata sources quickly. To use FastExport, create a mapping with a Teradata source database. In the session, use FastExport reader instead of
Relational reader. Use a FastExport connection to the Teradata tables that you want to export in a session. Create tempdb to join Sybase ASE or Microsoft SQL Server tables. When you join large tables on a Sybase ASE or Microsoft SQL Server database, it is possible to improve performance by creating the tempdb as an in-memory database. Optimizing Mappings: Mapping-level optimization may take time to implement, but it can significantly boost session performance. 1. Generally, you reduce the number of transformations in the mapping and delete unnecessary links between transformations to minimize the amount of data moved.2. Configure the mapping with the least number of transformations and expressions to do the most amount of work possible. Optimize the flat file sources. improve session performance if the source flat file does not contain quotes or escape characters. Optimizing the Line Sequential Buffer Length: If the session reads from a flat file source, you can improve session performance by setting the number of bytes the Integration Service reads per line. By default, the Integration Service reads 1024 bytes per line. If each line in the source file is less than the default setting, you can decrease the line sequential buffer length in the session properties. Optimizing Delimited Flat File Sources: If a source is a delimited flat file, you must specify the delimiter character to separate columns of data in the source file. You must also specify the escape character. The Integration Service reads the delimiter character as a regular character if you include the escape character before the delimiter character. You can improve session performance if the source flat file does not contain quotes or escape characters. Optimizing XML and Flat File Sources: XML files are usually larger than flat files because of the tag information. The size of an XML file depends on the level of tagging in the XML file. More tags result in a larger file size. As a result, the Integration Service may take longer to read and cache XML sources. Configure single-pass reading. You can use single-pass readings to reduce the number of times the Integration Service reads sources. It allows you to populate multiple targets with one source qualifier. If you have multiple sessions that use the same sources. You can combine the transformation logic for each mapping in one mapping and use one source qualifier for each source. The Integration Service reads each source once and then sends the data into separate pipelines. A particular row can be used by all the pipelines, by any combination of pipelines, or by no pipelines. For example, you have the Purchasing source table, and you use that source daily to perform an aggregation and a ranking. If you place the Aggregator and Rank transformations in separate mappings and sessions, you force the Integration Service to read the same source table twice. However, if you include the aggregation and ranking logic in one mapping with one source qualifier, the Integration Service reads the Purchasing source table once, and then sends the appropriate data to the two separate pipelines. Also consider by factoring out common functions from mappings. For example, if you need to subtract a percentage from the Price ports for both the Aggregator and Rank transformations, you can minimize work by subtracting the percentage before splitting the pipeline. You can use an Expression transformation to subtract the percentage, and then split the mapping after the transformation. Optimize Simple Pass Through mappings. You can use simple pass through mappings to improve session throughput. To pass directly from source to target without any other transformations, connect the Source Qualifier transformation directly to the target. Optimize filters. Use a Source Qualifier transformation. to filter rows from relational sources.
Use a Filter transformation: to filters data within a mapping from any type of source early in the data flow. If possible with relational source use a filter in the Source Qualifier transformation to remove the rows at the source. Avoid using complex expressions in filter conditions, optimize by using simple integer or true/false expressions. Note: You can also use a Filter or Router transformation to drop rejected rows from an Update Strategy transformation if you do not need to keep rejected rows. To maximize session performance, only process records that belong in the pipeline. Optimize datatype conversions. You can increase performance by eliminating unnecessary datatype conversions. Use integer values in place of other datatypes when performing comparisons using Lookup and Filter transformations. For example, many databases store U.S. zip code information as a Char or Varchar datatype. If you convert the zip code data to an Integer datatype, the lookup database stores the zip code 94303-1234 as 943031234. This helps increase the speed of the lookup comparisons based on zip code.Convert the source dates to strings through port-to-port conversions to increase session performance. You can either leave the ports in targets as strings or change the ports to Date/Time ports. Optimize expressions. Minimize aggregate function calls and replace common expressions with local variables. The Integration Service processes numeric operations faster than string operations. For example, if you look up large amounts of data on two columns, EMPLOYEE_NAME and EMPLOYEE_ID, configuring the lookup around EMPLOYEE_ID improves performance. When the Integration Service performs comparisons between CHAR and VARCHAR columns, it slows each time it finds trailing blank spaces in the row. You can use the Treat CHAR as CHAR On Read option when you configure the Integration Service in the Administration Console so that the Integration Service does not trim trailing spaces from the end of Char source fields. Choosing DECODE Versus LOOKUP: When you use a LOOKUP function, the Integration Service must look up a table in a database. When you use a DECODE function, you incorporate the lookup values into the expression so the Integration Service does not have to look up a separate table. Therefore, when you want to look up a small set of unchanging values, using DECODE may improve performance. Using Operators Instead of Functions: The Integration Service reads expressions written with operators faster than expressions with functions. so, use operators to write expressions. E.g. CONCAT( CONCAT( CUST.F_NAME, ) CUST.L_NAME) rewrite with the ||-CUST.F_NAME || ||CUST.L_NAME Optimizing IIF Expressions: IIF expressions can return a value and an action, which allows for more compact expressions. For example, you have a source with three Y/N flags: FLG_A, FLG_B, FLG_C. You want to return values based on the values of each flag and take advantage of the IIF function,write that expression as: IIF(FLG_A='Y', VAL_A, 0.0)+ IIF(FLG_B='Y', VAL_B, 0.0)+ IIF(FLG_C='Y', VAL_C, 0.0) This results in three IIFs, two comparisons, two additions, and a faster session. Evaluating Expressions: If you are not sure which expressions slow performance, evaluate the expression performance to isolate the problem by completing following steps: 1.Time the session with the original expressions. 2.Copy the mapping and replace half of the complex expressions with a constant. 3.Run and time the edited session. 4.Make another copy of the mapping and replace the other half of the complex expressions with a constant. 5.Run and time the edited session
Optimizing Expressions: used in the transformations. When possible, isolate slow expressions and simplify them. Complete the following tasks to isolate the slow expressions:1.Remove the expressions one-by-one from the mapping.2.Run the mapping to determine the time it takes to run the mapping without the transformation. If there is a significant difference in session run time, look for ways to optimize the slow expression. Factoring Out Common Logic: If the mapping performs the same task in multiple places, reduce the number of times the mapping performs the task by moving the task earlier in the mapping. For example, you have a mapping with five target tables. Each target requires a Social Security number lookup. Instead of performing the lookup five times, place the Lookup transformation in the mapping before the data flow splits. Next, pass the lookup results to all five targets.Minimizing Aggregate Function Calls: When writing expressions, factor out as many aggregate function calls as possible. Each time you use an aggregate function call, the Integration Service must search and group the data. For example, in the following expression, the Integration Service reads COLUMN_A, finds the sum, then reads COLUMN_B, finds the sum, and finally finds the sum of the two sums: SUM(COLUMN_A) + SUM(COLUMN_B). If you factor out the aggregate function call, as below, the Integration Service adds COLUMN_A to COLUMN_B, then finds the sum of both. SUM(COLUMN_A + COLUMN_B) Replacing Common Expressions with Local Variables: If you use the same expression multiple times in one transformation, you can make that expression a local variable. You can use a local variable only within the transformation. However, by calculating the variable only once, you speed performance. Choosing Numeric Versus String Operations: The Integration Service processes numeric operations faster than string operations. For example, if you look up large amounts of data on two columns, EMPLOYEE_NAME and EMPLOYEE_ID, configuring the lookup around EMPLOYEE_ID improves performance. When the Integration Service performs comparisons between CHAR and VARCHAR columns, it slows each time it finds trailing blank spaces in the row. You can use the Treat CHAR as CHAR On Read option when you configure the Integration Service in the Administration Console so that the Integration Service does not trim trailing spaces from the end of Char source fields. Choosing DECODE Versus LOOKUP: When you use a LOOKUP function, the Integration Service must look up a table in a database. When you use a DECODE function, you incorporate the lookup values into the expression so the Integration Service does not have to look up a separate table. Therefore, when you want to look up a small set of unchanging values, using DECODE may improve performance. Using Operators Instead of Functions: The Integration Service reads expressions written with operators faster than expressions with functions. so, use operators to write expressions. E.g. CONCAT( CONCAT( CUST.F_NAME, ) CUST.L_NAME) rewrite with the ||-CUST.F_NAME || ||CUST.L_NAME Optimizing IIF Expressions: IIF expressions can return a value and an action, which allows for more compact expressions. For example, you have a source with three Y/N flags: FLG_A, FLG_B, FLG_C. You want to return values based on the values of each flag and take advantage of the IIF function,write that expression as: IIF(FLG_A='Y', VAL_A, 0.0)+ IIF(FLG_B='Y', VAL_B, 0.0)+ IIF(FLG_C='Y', VAL_C, 0.0) This results in three IIFs, two comparisons, two additions, and a faster session. Optimizing External Procedures:You might want to block input data if the external procedure needs to alternate reading from input groups. Without the blocking functionality, you would need to write
the procedure code to buffer incoming data. You can block input data instead of buffering it which usually increases session performance. For example, you need to create an external procedure with two input groups. The external procedure reads a row from the first input group and then reads a row from the second input group. If you use blocking, you can write the external procedure code to block the flow of data from one input group while it processes the data from the other input group. When you write the external procedure code to block data, you increase performance because the procedure does not need to copy the source data to a buffer. However, you could write the external procedure to allocate a buffer and copy the data from one input group to the buffer until it is ready to process the data. Copying source data to a buffer decreases performance. Optimizing Transformations : Optimizing Aggregator Transformations: Aggregator transformations often slow performance because they must group data before processing it. Aggregator transformations need additional memory to hold intermediate group results. You can use the following guidelines to optimize the performance of an Aggregator transformation: Group by simple columns. When possible, use numbers instead of string and dates in the columns used for the GROUP BY. Avoid complex expressions in the Aggregator expressions. Use sorted input decreases the use of aggregate caches. When you use the Sorted Input option, the Integration Service assumes all data is sorted by group. As the Integration Service reads rows for a group, it performs aggregate calculations. When necessary, it stores group information in memory. Use incremental aggregation: apply captured changes in the source to aggregate calculations in a session. The Integration Service updates the target incrementally, rather than processing the entire source and recalculate the same calculations every time you run the session. You can increase the index and data cache sizes to hold all data in memory without paging to disk. Filter data before you aggregate it: If you use a Filter transformation in the mapping, place the transformation before the Aggregator transformation to reduce unnecessary aggregation Limit Port Connections: Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache. Optimizing Joiner Transformations: Joiner transformations can slow performance because they need additional space at run time to hold intermediary results. You can view Joiner performance counter information to determine whether you need to optimize the Joiner transformations. Tips to improve session performance with the Joiner transformation: .Designate the master source as the source with fewer duplicate key values. When the Integration Service processes a sorted Joiner transformation, it caches rows for one hundred unique keys at a time. If the master source contains many rows with the same key value, the Integration Service must cache more rows, and performance can be slowed. .Designate the master source as the source with the fewer rows. During a session, the Joiner transformation compares each row of the detail source against the master source. The fewer rows in the master, the fewer iterations of the join comparison occur, which speeds the join process. Perform joins in a database when possible. Performing a join in a database is faster than performing a join in the session. The type of database join you use can affect performance. Normal joins are faster than outer joins and result in fewer rows. In some cases, you cannot perform the join in the database, such as joining tables from two different databases or flat file systems. To perform a join in a database, use the following options: 1.Create a pre-session stored procedure to join the tables in a database.2.Use the Source Qualifier to perform the join.
Join sorted data when possible. When you configure the Joiner transformation to use sorted data, the Integration Service improves performance by minimizing disk input and output. You see the greatest performance improvement when you work with large data sets. For an unsorted Joiner transformation, designate the source with fewer rows as the master source. Optimizing Lookup Transformations: If the lookup table is on the same database as the source table in your mapping and caching is not feasible, join the tables in the source database rather than using a Lookup transformation. If you use a Lookup transformation, perform the following tasks to increase performance: Use the optimal database driver: connect to a lookup table using a native database driver or an ODBC driver. Native database drivers provide better session performance than ODBC drivers. Cache lookup tables: When you enable caching, the Integration Service caches the lookup table and queries the lookup cache during the session. When this option is not enabled, the Integration Service queries the lookup table on a row-by-row basis. using a lookup cache can increase session performance for smaller lookup tables. In general, you want to cache lookup tables that need less than 300 MB. Also consider .Optimize the lookup condition. .Index the lookup table. .Optimize multiple lookups. Complete the following tasks to further enhance performance for Lookup transformations: 1.Use the appropriate cache type. 2.Enable concurrent caches 3.Optimize Lookup condition matching 4.Reduce the number of cached rows. 5.Override the ORDER BY statement. 6.Use a machine with more memory. Types of Caches : Use the following types of caches to increase performance: Shared cache. You can share the lookup cache between multiple transformations. You can share an unnamed cache between transformations in the same mapping. You can share a named cache between transformations in the same or different mappings. Persistent cache: If you want to save and reuse the cache files, you can configure the transformation to use a persistent cache. Use this feature when you know the lookup table does not change between session runs. Using a persistent cache can improve performance because the Integration Service builds the memory cache from the cache files instead of from the database. Enable Concurrent Caches: When the Integration Service processes sessions that contain Lookup transformations, the Integration Service builds a cache in memory when it processes the first row of data in a cached Lookup transformation. If there are multiple Lookup transformations in a mapping, the Integration Service creates the caches sequentially when the first row of data is processed by the Lookup transformation. This slows Lookup transformation processing. You can enable concurrent caches to improve performance. When the number of additional concurrent pipelines is set to one or more, the Integration Service builds caches concurrently rather than sequentially. Performance improves greatly when the sessions contain a number of active transformations that may take time to complete, such as Aggregator, Joiner, or Sorter transformations. When you enable multiple concurrent pipelines, the Integration Service no longer waits for active sessions to complete before it builds the cache. Other Lookup transformations in the pipeline also build caches concurrently. Optimize Lookup Condition Matching: When the Lookup transformation matches lookup cache data with the lookup condition, it sorts and orders the data to determine the first matching value and the last matching value. You can configure the transformation to return any value that matches the
lookup condition. When you configure the Lookup transformation to return any matching value, the transformation returns the first value that matches the lookup condition. It does not index all ports as it does when you configure the transformation to return the first matching value or the last matching value. When you use any matching value, performance can improve because the transformation does not index on all ports, which can slow performance. Reducing the Number of Cached Rows You can reduce the number of rows included in the cache to increase performance. Use the Lookup SQL Override option to add a WHERE clause to the default SQL statement. Overriding the ORDER BY Statement: By default, the Integration Service generates an ORDER BY statement for a cached lookup. The ORDER BY statement contains all lookup ports. To increase performance, you can suppress the default ORDER BY statement and enter an override ORDER BY with fewer columns. Using a Machine with More Memory: increase session performance by running the session on an Integration Service machine with a large amount of memory. Increase the index and data cache sizes as high as you can without straining the machine. If the Integration Service machine has enough memory, increase the cache so it can hold all data in memory without paging to disk. Optimizing the Lookup Condition: If you include more than one lookup condition, place the conditions with an equal sign first to optimize lookup performance. Indexing the Lookup Table: The Integration Service needs to query, sort, and compare values in the lookup condition columns. The index needs to include every column used in a lookup condition. You can improve performance for the following types of lookups: 54 Chapter 6: Optimizing Transformations .Cached lookups. To improve performance, index the columns in the lookup ORDER BY statement. The session log contains the ORDER BY statement. .Uncached lookups. To improve performance, index the columns in the lookup condition. The Integration Service issues a SELECT statement for each row that passes into the Lookup transformation. Optimizing Multiple Lookups If a mapping contains multiple lookups, even with caching enabled and enough heap memory, the lookups can slow performance. Tune the Lookup transformations that query the largest amounts of data to improve overall performance. To determine which Lookup transformations process the most data, examine the Lookup_rowsinlookupcache counters for each Lookup transformation. The Lookup transformations that have a large number in this counter might benefit from tuning their lookup expressions. If those expressions can be optimized, session performance improves. Optimizing Sequence Generator Transformations: by creating a reusable Sequence Generator and using it in multiple mappings simultaneously. You may consider configuring the Number of Cached Values to a value greater than 1,000. If you do not have to cache values, set the Number of Cache Values to 0. Sequence Generator transformations that do not use cache are faster than those that require cache. Optimizing Sorter Transformations: 1. Allocate enough memory to sort the data:Informatica recommends allocating at least 8 MB (8,388,608 bytes) of physical memory to sort data using the Sorter transformation. Sorter cache size is set to 8,388,608 bytes by default. If the amount of incoming data is greater than the amount of Sorter cache size, the Integration Service temporarily stores data in the Sorter transformation work directory. The Integration Service requires disk space of at least twice the amount of incoming data when storing data in the work directory. If the amount of
incoming data is significantly greater than the Sorter cache size, the Integration Service may require much more than twice the amount of disk space available to the work directory. Use the following formula to determine the size of incoming data: # input rows ([Sum(column size)] + 16)2.Specify a different work directory for each partition in the Sorter transformation. :When you partition a session with a Sorter transformation, you can specify a different work directory for each partition in the pipeline. To increase session performance, specify work directories on physically separate disks on the Integration Service nodes. Default location is $PMTempDir. Optimizing Source Qualifier Transformations: Use Select Distinct option to filter unnecessary data earlier in the data flow. This can improve performance. Optimizing SQL Transformations: configure the transformation to use external SQL queries or queries that you define in the transformation. When you configure an SQL transformation to run in script mode, the Integration Service processes an external SQL script for each input row. When the transformation runs in query mode, the Integration Service processes an SQL query that you define in the transformation. Each time the Integration Service processes a new query in a session, it calls a function called SQLPrepare to create an SQL procedure and pass it to the database. When the query changes for each input row, it has a performance impact. When the transformation runs in query mode, you can improve performance by constructing a static query in the transformation. A static query statement does not change, although the data in the query clause changes. To create a static query, use parameter binding instead of string substitution in the SQL Editor. When you use parameter binding you set parameters in the query clause to values in the transformation input ports. When an SQL query contains commit and rollback query statements, the Integration Service must recreate the SQL procedure after each commit or rollback. To optimize performance, do not use transaction statements in an SQL transformation query. When you create the SQL transformation, you configure how the transformation connects to the database. You can choose a static connection or you can pass connection information to the transformation at run time. When you configure the transformation to use a static connection, you choose a connection from the Workflow Manager connections. The SQL transformation connects to the database once during the session. When you pass dynamic connection information, the SQL transformation connects to the database each time the transformation processes an input row. Eliminating Transformation Errors:In large numbers, transformation errors slow the performance of the Integration Service. With each transformation error, the Integration Service pauses to determine the cause of the error and to remove the row causing the error from the data flow. Next, the Integration Service typically writes the row into the session log file. Transformation errors occur when the Integration Service encounters conversion errors, conflicting mapping logic, and any condition set up as an error, such as null input. Check the session log to see where the transformation errors occur. If the errors center around particular transformations, evaluate those transformation constraints. If you need to run a session that generates a large number of transformation errors, it is possible to improve performance by setting a lower tracing level. However, this is not a recommended long-term solution to transformation errors. Optimizing Sessions:
Using a Grid : You can use a grid to increase session and workflow performance. A grid is an alias assigned to a group of nodes that allows you to automate the distribution of workflows and sessions across nodes. When you use a grid, the Integration Service distributes workflow tasks and session threads across multiple nodes. Running workflows and sessions on the nodes of a grid provides the following performance gains: Balances the Integration Service workload. Processes concurrent sessions faster. Processes partitions faster. Use pushdown optimization: Increase session performance by pushing transformation logic to the source or target database. Based on the mapping and session configuration, the Integration Service executes SQL against the source or target database instead of processing the transformation logic within the Integration Service. Run sessions and workflows concurrently: You can run independent sessions and workflows concurrently to improve session and workflow performance. If possible, run sessions and workflows concurrently to improve performance. For example, if you load data into an analytic schema, where you have dimension and fact tables, load the dimensions concurrently. Allocate buffer memory: You can increase the buffer memory allocation for sources and targets that require additional memory blocks. If the Integration Service cannot allocate enough memory blocks to hold the data, it fails the session. Optimize caches: You can improve session performance by setting the optimal location and size for the caches Increase the commit interval: Each time the Integration Service commits changes to the target, performance slows. You can increase session performance by increasing the interval at which the Integration Service commits Disable high precision: Performance slows when the Integration Service reads and manipulates data with the high precision datatype. You can disable high precision to improve session performance. Reduce errors tracing: To improve performance, you can reduce the error tracing level, which reduces the number of log events generated by the Integration Service. Remove staging areas: When you use a staging area, the Integration Service performs multiple passes on the data. You can eliminate staging areas to improve session performance Allocating Buffer Memory: When the Integration Service initializes a session, it allocates blocks of memory to hold source and target data. The Integration Service allocates at least two blocks for each source and target partition. Sessions that use a large number of sources and targets might require additional memory blocks. If the Integration Service cannot allocate enough memory blocks to hold the data, it fails the session. You can configure the amount of buffer memory, or you can configure the Integration Service to automatically calculate buffer settings at run time. For information about configuring these settings, You can increase the number of available memory blocks by adjusting the following session parameters: DTM Buffer Size. Increase the DTM buffer size on the Properties tab in the session properties. Default Buffer Block Size. Decrease the buffer block size on the Config Object tab in the session properties. To configure these settings, first determine the number of memory blocks the Integration Service requires to initialize the session. Then, based on default settings, calculate the buffer size and/or the buffer block size to create the required number of session blocks. If you have XML sources or targets
in a mapping, use the number of groups in the XML source or target in the calculation for the total number of sources and targets. For example, you create a session that contains a single partition using a mapping that contains 50 sources and 50 targets. Then you make the following calculations: 1.You determine that the session requires a minimum of 200 memory blocks: [(total number of sources + total number of targets)* 2] = (session buffer blocks) 100 * 2 = 200 2.Based on default settings, you determine that you can change the DTM Buffer Size to 15,000,000, or you can change the Default Buffer Block Size to 54,000: (session Buffer Blocks) = (.9) * (DTM Buffer Size) / (Default Buffer Block Size) * (number of partitions) 200 = .9 * 14222222 / 64000 * 1 or 200 = .9 * 12000000 / 54000 * 1 Note: For a session that contains n partitions, set the DTM Buffer Size to at least n times the value for the session with one partition. The Log Manager writes a warning message in the session log if the number of memory blocks is so small that it causes performance degradation. The Log Manager writes this warning message even if the number of memory blocks is enough for the session to run successfully. The warning message also gives a suggestion for the proper value. Allocating Buffer Memory If you modify the DTM Buffer Size, increase the property by multiples of the buffer block size. Increasing DTM Buffer Size: The Integration Service uses DTM buffer memory to create the internal data structures and buffer blocks used to bring data into and out of the Integration Service. When you increase the DTM buffer memory, the Integration Service creates more buffer blocks, which improves performance during momentary slowdowns.When you increase the DTM buffer memory allocation, consider the total memory available on the Integration Service process system. Note: Reducing the DTM buffer allocation can cause the session to fail early in the process because the Integration Service is unable to allocate memory to the required processes. To increase the DTM buffer size, open the session properties and click the Properties tab. Edit the DTM Buffer Size property in the Performance settings. Increase the property by multiples of the buffer block size, and then run and time the session after each increase. Optimizing the Buffer Block Size: Depending on the session source data, you might need to increase or decrease the buffer block size. If the machine has limited physical memory and the mapping in the session contains a large number of sources, targets, or partitions, you might need to decrease the buffer block size. If you are manipulating unusually large rows of data, you can increase the buffer block size to improve performance. If you do not know the approximate size of the rows, you can determine the configured row size by completing the following steps. To evaluate needed buffer block size: 1.In the Mapping Designer, open the mapping for the session. 2.Open the target instance. 3.Click the Ports tab. 4.Add the precision for all columns in the target. 5.If you have more than one target in the mapping, repeat steps 2 to 4 for each additional target to calculate the precision for each target.6.Repeat steps 2 to 5 for each source definition in the mapping.7.Choose the largest precision of all the source and target precisions for the total precision in the buffer block size calculation. The total precision represents the total bytes needed to move the largest row of data. For example, if the total precision equals 33,000, then the Integration Service requires 33,000 bytes in the buffers to move that row. If the buffer block size is 64,000 bytes, the Integration Service can move only one row at a time.
Ideally, a buffer accommodates at least 100 rows at a time. So if the total precision is greater than 32,000, increase the size of the buffers to improve performance. To increase the buffer block size, open the session properties and click the Config Object tab. Edit the Default Buffer Block Size property in the Advanced settings. Increase the DTM buffer block setting in relation to the size of the rows. As with DTM buffer memory allocation, increasing buffer block size should improve performance. If you do not see an increase, buffer block size is not a factor in session performance. Optimizing Caches: The Integration Service uses the index and data caches for XML targets and Aggregator, Rank, Lookup, and Joiner transformations. Stores transformed data in the data cache before returning it to the pipeline. It stores group information in the index cache. Also, the Integration Service uses a cache to store data for Sorter transformations. You can configure the amount of cache memory using the cache calculator or by specifying the cache size. You can also configure the Integration Service to automatically calculate cache memory settings at run time. If the allocated cache is not large enough to store the data, the Integration Service stores the data in a temporary disk file as it processes the session data. Performance slows each time the Integration Service pages to a temporary file. Examine the performance details to determine how often the Integration Service pages to a file. Perform the following tasks to optimize caches: Limit the number of connected input/output and output only ports. Select the optimal cache directory location. Increase the cache sizes. Use the 64-bit version of PowerCenter to run large cache sessions. Limiting the Number of Connected Ports: For transformations that use data cache, limit the number of connected input/output and output only ports. Limiting the number of connected input/output or output ports reduces the amount of data the transformations store in the data cache. Cache Directory Location: If you run the Integration Service on a grid and only some Integration Service nodes have fast access to the shared cache file directory, configure each session with a large cache to run on the nodes with fast access to the directory. To configure a session to run on a node with fast access to the directory, complete the following steps: 1.Create a PowerCenter resource. 2.Make the resource available to the nodes with fast access to the directory. 3.Assign the resource to the session. If all Integration Service processes in a grid have slow access to the cache files, set up a separate, local cache file directory for each Integration Service process. An Integration Service process may have faster access to the cache files if it runs on the same machine that contains the cache directory. Note: You may encounter performance degradation when you cache large quantities of data on a mapped or mounted drive. Increasing the Cache Sizes: If the allocated cache is not large enough to store the data, the Integration Service stores the data in a temporary disk file as it processes the session data. Each time the Integration Service pages to the temporary file, performance slows. You can examine the performance details to determine when the Integration Service pages to the temporary file. The Transformation_readfromdisk or Transformation_writetodisk counters for any Aggregator, Rank, Lookup, or Joiner transformation indicate the number of times the Integration Service must page to disk to process the transformation. Since the data cache is typically larger than the index cache, increase the data cache more than the index cache. If the session contains a transformation that uses a cache and you run the session on a machine with ample memory, increase the cache sizes so all data can fit in memory.
Using the 64-bit version of PowerCenter If you process large volumes of data or perform memory-intensive transformations, you can use the 64-bit PowerCenter version to increase session performance. The 64-bit version provides a larger memory space that can significantly reduce or eliminate disk input/output.This can improve session performance in the following areas: Caching. With a 64-bit platform, the Integration Service is not limited to the 2 GB cache limit of a 32bit platform. Data throughput. With a larger available memory space, the reader, writer, and DTM threads can process larger blocks of data. Increasing the Commit Interval: The commit interval setting determines the point at which the Integration Service commits data to the targets. Each time the Integration Service commits, performance slows. Therefore, the smaller the commit interval, the more often the Integration Service writes to the target database, and the slower the overall performance. If you increase the commit interval, the number of times the Integration Service commits decreases and performance improves. When you increase the commit interval, consider the log file limits in the target database. If the commit interval is too high, the Integration Service may fill the database log file and cause the session to fail. Therefore, weigh the benefit of increasing the commit interval against the additional time you would spend recovering a failed session. Click the General Options settings in the session properties to review and adjust the commit interval. Disabling High Precision:If a session runs with high precision enabled, disabling high precision might improve session performance.The Decimal datatype is a numeric datatype with a maximum precision of 28. To use a high precision Decimal datatype in a session, configure the Integration Service to recognize this datatype by selecting Enable High Precision in the session properties. However, since reading and manipulating the high precision datatype slows the Integration Service, you can improve session performance by disabling high precision. When you disable high precision, the Integration Service converts data to a double. The Integration Service reads the Decimal row 3900058411382035317455530282 as 390005841138203 x 1013. Click the Performance settings in the session properties to enable high precision. Reducing Error Tracing: To improve performance, you can reduce the number of log events generated by the Integration Service when it runs the session. If a session contains a large number of transformation errors, and you do not need to correct them, set the session tracing level to Terse. At this tracing level, the Integration Service does not write error messages or row-level information for reject data. If you need to debug the mapping and you set the tracing level to Verbose, you may experience significant performance degradation when you run the session. Do not use Verbose tracing when you tune performance. The session tracing level overrides any transformation-specific tracing levels within the mapping. This is not recommended as a long-term response to high levels of transformation errors. Removing Staging Areas When you use a staging area, the Integration Service performs multiple passes on the data. When possible, remove staging areas to improve performance. The Integration Service can read multiple sources with a single pass, which may alleviate the need for staging areas. Performance & Tuning Identifying performance bottleneck from session log (run information)
Identifying performance bottleneck from Edit mapping/config object tab "Collect Performance Data" option. (PERF.LOG) In workflow monitor from Partition details and Performance tab There are main five level performances and tuning: Source, target, Session, mapping, Network/hardware In Informatica tuning most Vital filed is memory setting at session level, t level, Always set memory size in the multiplication of total memory. Setting of memory size in session: Throughput was 40 records per second and when we set __________________memory we got 65000 throughput records per second Push down optimization: partial optimization (source, target) limitation: and full. Performance tuning in Informatica? The goal of performance tuning is optimize session performance so sessions run during the available load window for Informatica server. Network connections: Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance. So avoid network connections. Flat files: move source flat files to the machine that consists of Informatica server. Relational data sources: Minimize the connections to sources, targets and Informatica server to improve session performance. Moving target database into server system may improve session Performance. Staging areas: If u use staging areas u force Informatica server to perform multiple data passes. Removing of staging areas may improve session performance. U can run the multiple Informatica servers against the same repository. Distributing the session load to multiple Informatica servers may improve session performance. Run Informatica server in ASCII data movement mode improves the session performance. Because ASCII data movement mode stores a character value in one byte. Unicode mode takes 2 bytes to store a character. If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes. We can improve the session performance by configuring the network packet size, which allows Data to cross the network at one time. To do this go to server manger, choose server configure database connections. If u r target consists key constraints and indexes u slow the loading of data. To improve the session performance in this case drop constraints and indexes before u run the session and rebuild them after completion of session. Running a parallel sessions by using concurrent batches will also reduce the time of loading the Data. So concurrent batches may also increase the session performance. Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipe lines. In some cases if a session contains an aggregator t, u can use incremental aggregation to improve session performance. Avoid t errors to improve the session performance. If Ur session contains filter t, create that filter t nearer to the sources or u can use filter condition in source qualifier.
Aggregator, Rank and joiner t may often decrease the session performance because they must group data before processing it. To improve session performance in this case use sorted ports option. How do you identify source and target bottleneck? Identifying Target Bottlenecks: The most common performance bottleneck occurs when Informatica server writes to a target database. You can identify target bottlenecks by configuring the session to write to a flat file target. If the session performance increases significantly when you write to a flat file you have a target bottleneck. Consider performing the following tasks to increase performance: * Drop indexes and key constraints. * Increase checkpoint intervals. * Use bulk loading. * Use external loading. * Increase database network packet size. * Optimize target databases. Create multiple partitions. tune the indexes, tune database. For T like joiner & agg u can pass sorted input. Joiner by removing order by (--). Give first preference to integer data type for comparison. Identifying Source Bottlenecks suppose u have relational source with timeout limit and source is having some query and joiner for 3, 4 tables with filter condition, in this case u Informatica will not able to retrieve data in given time limit at this point u will face performance issue. Move source flat files to the machine that consists of Informatica server. Increase database buffer. If you replaces relational source with flat source and performance increases significantly then u have source bottleneck. If the session reads from relational source you can use a filter t a read test mapping or a database query to identify source bottlenecks: * Filter T - measure the time taken to process a given amount of data then add an always false filter t in the mapping after each source qualifier so that no data is processed past the filter t. You have a source bottleneck if the new session runs in about the same time. * Read Test Session - compare the time taken to process a given set of data using the session with that for a session based on a copy of the mapping with all ts after the source qualifier removed with the source qualifiers connected to file targets. You have a source bottleneck if the new session runs in about the same time. * Extract the query from the session log and run it in a query tool. Measure the time taken to return the first row and the time to return all rows. If there is a significant difference in time you can use an optimizer hint to eliminate the source bottleneck. Consider performing the following tasks to increase performance: * Optimize the query. * Use conditional filters. * Increase database network packet size. * Connect to Oracle databases using IPC protocol. Identifying Mapping Bottlenecks If you do not have a source bottleneck add an Always False filter t in the mapping before each target definition so that no data is loaded into the target tables. If the time it takes to run the new session is the same as the original session you have a mapping bottleneck. You can identify mapping bottlenecks by examining performance counters. * Describe mapping level and session level performance tuning measures. Mapping-level optimization may take time to implement, but it can significantly boost session performance. Focus on mapping-level optimization after you optimize the targets and sources. Generally, you reduce the number of ts in the mapping and delete unnecessary links between ts to optimize the mapping. Configure the mapping with the least number of ts and expressions to do the most amount of work possible. Delete unnecessary links between ts to minimize the amount of data moved. You can also perform the following tasks to optimize the mapping:
Optimize the flat file sources: For example, you can improve session performance if the source flat file does not contain quotes or escape characters. Configure single-pass reading: You can use single-pass readings to reduce the number of times the Integration Service reads sources. Optimize Simple Pass Through mappings: use to improve session throughput. Optimize filters:Use Source Qualifier ts and Filter ts to filter data in the mapping. To maximize session performance, only process records that belong in the pipeline. Optimize data type conversions: Can increase performance by eliminating unnecessary data type conversions. Optimize expressions. For example, minimize aggregate function calls and replace common expressions with local variables. Optimize external procedures. If an external procedure needs to alternate reading from input groups, you can block input data to increase session performance. How to optimizing the session: Perform the following tasks to improve overall performance: Use a grid: Can increase performance by using a grid to balance the Integration Service workload. Use pushdown optimization: You can increase session performance by pushing t logic to the source or target database. Run sessions and workflows concurrently: can run independent sessions and workflows concurrently to improve session and workflow performance.Allocate buffer memory: You can increase the buffer memory allocation for sources and targets that require additional memory blocks. If the Integration Service cannot allocate enough memory blocks to hold the data, it fails the session. Optimize caches: can improve session performance by setting the optimal location and size for index and data caches. Increase the commit interval: Each time the Integration Service commits changes to the target, performance slows. You can increase session performance by increasing the interval at which the Integration Service commits changes. Disable high precision: Performance slows when the Integration Service reads and manipulates data with the high precision data type. You can disable high precision to improve session performance. Reduce errors tracing: can reduce the error tracing level, which reduces the number of log events generated by the Integration Service. Remove staging areas. When you use a staging area, the Integration Service performs multiple passes on the data. You can eliminate staging areas to improve session performance. How do you view performance counters: Each t tracks the number of input rows, output rows, and error rows for each session. Some ts also have performance counters. You can use the performance counters to increase session performance in workflow monitor. Types of performance counters Readfromdisk and Writetodisk Counters: If a session contains Aggregator Rank or Joiner ts examine each T_readfromdisk and T_writetodisk counter. If these counters display any number other than zero you can improve session performance by increasing the index and data cache sizes. Note that if the session uses Incremental Aggregation the counters must be examined during the run because Informatica server writes to disk when saving historical data at the end of the run. Rowsinlookupcache Counter: A high value indicates a larger lookup which is more likely to be a bottleneck. Error rows Counters: If a session has large numbers in any of the T_errorrows counters you might improve performance by eliminating the errors. BufferInput_efficiency and BufferOutput_efficiency counters: Any dramatic difference in a given set of BufferInput_efficiency and BufferOutput_efficiency counters indicates inefficiencies that may benefit from tuning Replica TUNING Look-Up T To me, look-up is the single most important (and difficult) t that we need to consider while tuning performance of Informatica jobs. The choice and use of correct type of Look-Up can drastically vary the session performance. To cache or not to cache?
Scenario # 1: There was big lookup table in mapping and they were using cached lookup If the number of records coming from source is comparatively much lesser than the number of records present in the lookup table then you should consider using uncached lookup. This is because less number of records from source ensures less number of database calls. If the lookup table is small (less than 2 million), its generally a good idea to cache the lookup table Scenario # 2: Advise to suppressing default ORDER BY clause in Lookup In ORDER BY clause redundant column was included and it was creating additional processing overhead on database. So I recommend to perform ORDER BY at least on the columns which are being used in the joining condition. This is because Informatica creates its own index on the joining columns and therefore if the joining columns are ordered, Informatica would require less space and time to create the Indexes. When Informatica fires the lookup query to database, it appends a ORDER BY clause at the end of the query. However you can suppress this by appending a comment --at the end of the override query. You should consider following factors regarding default Informatica Lookup ORDER BY Scenario # 3 Advise to use Persistent Lookup Cache if the source data in the underlying lookup tables are not changing between consecutive sessions runs then one may use persistent lookup cache. how to implement persistent cache in Informatica session: Check the following options in Lookup T properties tab: Lookup caching enabled * Lookup cache persistent Once you do that, cache file created by Informatica session will *NOT* be deleted from the Cache directory and the same cache file will be used in all the consecutive runs. Advantage of doing this is you need not spend time building the same cache every time the session executes. However if the source data for the lookup changes meanwhile then you must refresh the cache by either of the following two options: Delete the cache file manually from the Cache directory Or, Check the Recache from lookup source option in the Properties tab of the lookup Senario#4: lookup cache was grown bigger more than 5GB and Informatica created multiple cache files for one lookup maximum file size for each file was 2GB. those were data cache files, Informatica named them as .dat1, .dat2, .dat3 etc. and corresponding index cache files named as .idx1, .idx2, .idx3 etc. so advise to join the lookup source table in the database level itself instead of building lookup cache. Because My personal opinion is breaking one single data or index cache file into multiple files may slow down the lookup performance. And advise to create reusable persistent cache lookup: so can share the same lookup in multiple mapping without rebuilding the cache in each one of them. And you can have one additional mapping with re-cache option enabled for this lookup, which you can run whenever you need to refresh the cache file. With warning that there are some disadvantages of using persistent cache lookup. If the cache file size of your lookup table will grow more than 2GB, most likely Informatica will create multiple cache files for one lookup wherein maximum file size for each file will be 2GB. If those are data cache files, Informatica will name them as .dat1, .dat2, .dat3 etc. and corresponding index cache files will be named as .idx1, .idx2, .idx3 etc Also note that in many flavors of UNIX (e.g. HP-UX 11i), NOLARGEFILES is a default option for the file system and prevents applications or users to create file larger than 2GB. You can check whether LARGEFILE option is enabled in your server by issuing the following command: getconf FILESIZEBITS /mount point name However, please note that irrespective of the fact whether LARGEFILE option is enabled or disabled, Informatica will not create cache sized above 2GB (This is true for both 32-bit and 64-bit versions of Informatica)
Scenario # 5: My personal opinion is breaking one single data or index cache file into multiple files may slow down the lookup performance. Hence if your lookup cache size is more than 2GB then if possible consider the option of joining the lookup source table in the database level itself instead of building lookup cache. 3 years $36 hosting package 3 years $13 mass mailing $4.99 per month