0% found this document useful (0 votes)
125 views53 pages

MDM 104 PerformanceTuningGuide en

Uploaded by

dwarak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views53 pages

MDM 104 PerformanceTuningGuide en

Uploaded by

dwarak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Informatica® Multidomain MDM

10.4

Performance Tuning Guide


Informatica Multidomain MDM Performance Tuning Guide
10.4
March 2020
© Copyright Informatica LLC 2016, 2020

This software and documentation are provided only under a separate license agreement containing restrictions on use and disclosure. No part of this document may be
reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica LLC.

U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial
computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such,
the use, duplication, disclosure, modification, and adaptation is subject to the restrictions and license terms set forth in the applicable Government contract, and, to the
extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License.

Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the United States and many jurisdictions throughout the world. A
current list of Informatica trademarks is available on the web at https://www.informatica.com/trademarks.html. Other company and product names may be trade
names or trademarks of their respective owners.

Portions of this software and/or documentation are subject to copyright held by third parties. Required third party notices are included with the product.

The information in this documentation is subject to change without notice. If you find any problems in this documentation, report them to us at
[email protected].

Informatica products are warranted according to the terms and conditions of the agreements under which they are provided. INFORMATICA PROVIDES THE
INFORMATION IN THIS DOCUMENT "AS IS" WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT.

Publication Date: 2020-03-19


Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Informatica Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Informatica Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Informatica Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Informatica Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Informatica Product Availability Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Informatica Velocity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Informatica Marketplace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Informatica Global Customer Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Chapter 1: Introduction to Performance Tuning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6


Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Factors that Influence the Performance of the MDM Hub. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Acronyms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Chapter 2: Recommendations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Recommendations for Java. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
General Recommendations for Database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Recommendations for Oracle Database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
INIT.ORA Recommendations for Oracle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Recommendations for Microsoft SQL Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Recommendations for IBM Db2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Recommendations for the MDM Hub. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Recommendations for Batch Job Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Multithreaded Batch Job – Process Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Recommendations for the Hub Console Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Recommendations for Data Director and SIF Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Recommendations for Environment Validation Tools and Utilities. . . . . . . . . . . . . . . . . . . . . . . 49

Appendix A: Glossary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Table of Contents 3
Preface
®
See the Informatica Multidomain MDM Performance Tuning Guide to learn how to optimize the overall
performance of Multidomain MDM within the database and the application server environments.

Informatica Resources
Informatica provides you with a range of product resources through the Informatica Network and other online
portals. Use the resources to get the most from your Informatica products and solutions and to learn from
other Informatica users and subject matter experts.

Informatica Network
The Informatica Network is the gateway to many resources, including the Informatica Knowledge Base and
Informatica Global Customer Support. To enter the Informatica Network, visit
https://network.informatica.com.

As an Informatica Network member, you have the following options:

• Search the Knowledge Base for product resources.


• View product availability information.
• Create and review your support cases.
• Find your local Informatica User Group Network and collaborate with your peers.

Informatica Knowledge Base


Use the Informatica Knowledge Base to find product resources such as how-to articles, best practices, video
tutorials, and answers to frequently asked questions.

To search the Knowledge Base, visit https://search.informatica.com. If you have questions, comments, or
ideas about the Knowledge Base, contact the Informatica Knowledge Base team at
[email protected].

Informatica Documentation
Use the Informatica Documentation Portal to explore an extensive library of documentation for current and
recent product releases. To explore the Documentation Portal, visit https://docs.informatica.com.

If you have questions, comments, or ideas about the product documentation, contact the Informatica
Documentation team at [email protected].

4
Informatica Product Availability Matrices
Product Availability Matrices (PAMs) indicate the versions of the operating systems, databases, and types of
data sources and targets that a product release supports. You can browse the Informatica PAMs at
https://network.informatica.com/community/informatica-network/product-availability-matrices.

Informatica Velocity
Informatica Velocity is a collection of tips and best practices developed by Informatica Professional Services
and based on real-world experiences from hundreds of data management projects. Informatica Velocity
represents the collective knowledge of Informatica consultants who work with organizations around the
world to plan, develop, deploy, and maintain successful data management solutions.

You can find Informatica Velocity resources at http://velocity.informatica.com. If you have questions,
comments, or ideas about Informatica Velocity, contact Informatica Professional Services at
[email protected].

Informatica Marketplace
The Informatica Marketplace is a forum where you can find solutions that extend and enhance your
Informatica implementations. Leverage any of the hundreds of solutions from Informatica developers and
partners on the Marketplace to improve your productivity and speed up time to implementation on your
projects. You can find the Informatica Marketplace at https://marketplace.informatica.com.

Informatica Global Customer Support


You can contact a Global Support Center by telephone or through the Informatica Network.

To find your local Informatica Global Customer Support telephone number, visit the Informatica website at
the following link:
https://www.informatica.com/services-and-training/customer-success-services/contact-us.html.

To find online support resources on the Informatica Network, visit https://network.informatica.com and
select the eSupport option.

Preface 5
Chapter 1

Introduction to Performance
Tuning
This chapter includes the following topics:

• Overview, 6
• Factors that Influence the Performance of the MDM Hub, 6
• Acronyms, 7

Overview
You can use the performance tuning recommendations to configure specific parameters that optimize the
performance of the MDM Hub. Experiment with these parameters to arrive at optimum values. You can also
achieve a baseline performance by using these recommendations.

Factors that Influence the Performance of the MDM


Hub
The following table lists the different components that you can fine-tune to optimize the performance of the
MDM Hub:

Components Description

Java Parameters that you can fine-tune in the Java layer. These parameters are applicable
for the MDM Hub and for the Process Servers, including clustered environments.

General Database Parameters such as sizing and storage that you can fine-tune in the database layer.
Recommendations

Oracle Parameters that you can fine-tune in the Oracle database, including the RAC
environment. These parameters are applicable for both the MDM Hub Master
Database schema and any Operational Reference Store (ORS) schema.

6
Components Description

Microsoft SQL Server Parameters that you can fine-tune in the Microsoft SQL Server. These parameters are
applicable for both the MDM Hub Master Database schema and any ORS schema.

IBM Db2 Parameters that you can fine-tune in the IBM Db2 database. These parameters are
applicable for both the MDM Hub Master Database schema and any ORS schema.

The MDM Hub Specific Parameters that you can configure in the MDM Hub. These parameters are applicable
Configuration for both the MDM Hub Master Database settings and any ORS specific settings.

Batch Job Optimization Parameters that you can configure for the better performance of the batch jobs.

Hub Console Optimization Parameters that you can configure for the better performance of the Hub Console.

Informatica Data Director and Parameters that you can configure for the better performance of Informatica Data
Services Integration Director (IDD) and Services Integration Framework (SIF).
Framework Optimization

Environment Validation Tools List of tools and utilities that you can use to verify the current environment to identify
and Utilities the area of improvement.

Acronyms
The following table lists the different acronyms used in this guide:

Term Definition

BO Base Object

EJB Enterprise Java Beans

GC Garbage Collection

HM Hierarchy Manager

JVM Java Virtual Machine

MDM Informatica Master Data Management

PAM Product Availability Matrix

PGA Program Global Area (Oracle database)

RAC Real Application Cluster

RAID Redundant Array of Independent Disks

RMAN Recovery Manager

SAN Storage Area Network

Acronyms 7
Term Definition

SGA System Global Area

SIF Services Integration Framework

SOAP Simple Access Object Protocol

SSD Solid State Drive

8 Chapter 1: Introduction to Performance Tuning


Chapter 2

Recommendations
This chapter includes the following topics:

• Overview, 9
• Recommendations for Java, 10
• General Recommendations for Database, 12
• Recommendations for Oracle Database, 15
• Recommendations for Microsoft SQL Server, 20
• Recommendations for IBM Db2, 21
• Recommendations for the MDM Hub, 25
• Recommendations for Batch Job Optimization, 36
• Recommendations for the Hub Console Optimization, 45
• Recommendations for Data Director and SIF Optimization, 46
• Recommendations for Environment Validation Tools and Utilities, 49

Overview
The recommendations are based on the regular volumes of data with standard hardware. For larger systems,
you can adjust the settings accordingly. You can configure parameters related to Java, Oracle database,
Microsoft SQL server, IBM Db2, and the MDM Hub to optimize the performance of the MDM Hub.

9
Recommendations for Java
You can configure the parameters related to the JVM settings and database connection pool. You can
change or fine-tune the Java parameters to improve the MDM Hub performance.

The following table lists the recommendations for the JVM settings:

Parameter Recommended Setting Description Informatica


Knowledge
Base article
number

JVM '-server' By default, 64-bit Java runs in "server" 144943


optimization JVM. Since Java SE 5.0, with the
mode exception of 32-bit Windows, the server
VM is automatically selected on server-
class machines. The definition of a
server-class machine might change from
release to release. Therefore, check the
appropriate ergonomics document for
the definition for your release.

Heap Size -Xms1024m In a 64-bit environment with heavy SIF 120835


(64-bit) -Xmx8192m usage, keep the max heap size equal to
or greater than 4 GB. If the number of
ORSs is more (irrespective of active or
obsolete), allocate more heap memory.
Informatica recommends to have the
value of Xmx as 8 GB. If more memory is
needed, add another JVM with a new
clustered node.

Code Cache -XX:ReservedCodeCacheSize=256m Maximum size limit for the code cache. 352050
(Oracle JVM)
-XX:codecachetotal=256m (IBM
JVM)

Stack Size -Xss512k Use the platform default. Adjust if 144477 and
(64-bit) required. 207128
Stack Size is the memory used by Java
for each thread it spans. This memory is
outside the heap.
If the memory is too low, the server might
fail with 'StackOverFlow' error.
If the memory is too high, the server
might fail with 'Out of memory' error.
If you need to fine-tune the value, analyze
the heap dump to arrive at an optimal
value. For a 32-bit processor, fine-tune
the value to get additional memory for
the heap to avoid an "Out of memory"
error.
Note: If you use Informatica Address
Verification, then use –Xss2048k.

10 Chapter 2: Recommendations
Parameter Recommended Setting Description Informatica
Knowledge
Base article
number

Garbage Use the default policy, as chosen Complete analysis needs to be done to 144944
Collection automatically by the JVM. determine the garbage collection (GC)
Policy policy, if you decide to change it.
Garbage collectio policies include
Parallel GC and Concurrent Sweep.
For example, while Parallel GC in heavy
real time usage can give longer pauses,
Concurrent Sweep does not do so. You
can use a combination of the policies.
See the Java Guide to set the exact
parameters.

Application Optimal application server sizing. Proper analysis is required to determine -


Server Sizing the application server sizing. Different
factors including average number of
concurrent users, peak number of
concurrent users, current data volume,
rate of data growth, future data volume,
Hierarchy Manager configuration,
Hierarchy Manager relationship
cardinality, and number of Data Director
applications must be considered for
sizing.
For assistance on server sizing, contact
the Informatica Professional Services
team.

Recommendations for Java 11


The following table lists the recommendations for the database connection pool:

Parameter Recommended Setting Description Informatica


Knowledge Base
article or
documentation

CMX_SYSTEM Max Connection: (N+T) × On average, each thread takes 2.5 -


Database 1.5, where: connections. Therefore, multiply the
Connection - N is the number of concurrent number of concurrent threads by 1.5.
Pool IDD and SIF API users.
- T is the number of concurrent
threads in Batch.
Min Connection: 0

ORS Database Max Connection: (N+T) × On average, each thread takes 2.5 121471
Connection 2.5, where: connections. Therefore, multiply the
Pool - N is the number of concurrent number of concurrent threads by 2.5.
IDD and SIF API users. Test Connection on Lease: Disable this
- T is the number of concurrent property to avoid additional database
threads in Batch. cost incurred.
Min Connection: 0 Min Connection: In some instances,
Test Connection on Lease: Disable application servers might have some
Statement Caching: 10 connection leaks with respect to
rollback transaction where a
connection caught in a rollback is not
released to the pool. If such instances
are found, you can set Min Connection
to 0.
Statement Caching (WebLogic):
Initially, set 10 as the minimal number.

General Recommendations for Database


You can configure the parameters related to the database environment, virtual image environment, and
database tables and data.

The following table lists the recommendations related to the database environment:

Parameter Recommended Setting Description

Environment Sharing Always use non-shared The MDM Hub Oracle database instances (both the MDM Hub
environment: Master Database and ORS) need not be shared with other MDM
The Production ORS must Hub installations and must not be shared with other
have exclusive use of the applications.
Oracle instance. Note: Each additional level of sharing compromises the best
performance possible on a particular hardware.
The Production Oracle
instance must have
exclusive use of the host
machine.

Connectivity: The fastest connectivity Connection latency might have a major performance impact.
Application Server to possible. Enable faster connectivity to the data store by using fiber optic
Hub Store databases connections.

12 Chapter 2: Recommendations
Parameter Recommended Setting Description

Connectivity: The fastest connectivity Connection latency might have a major performance impact.
Database server to possible. Enable faster connectivity to the data by using fiber optic
Data File storage connections.
Have a dedicated point to point connection to avoid network
contention.

Database Server Optimal database server You need proper analysis to determine the database server
Sizing sizing. sizing. Different factors including current data volume, rate of
data growth, future data volume, SIF calls, and batch volume
must be considered for sizing.
For assistance on server sizing, contact the Informatica
Professional Services team.

MDM Hub Master Absolute host name or IP To avoid caching issues in multi-node or cluster environments,
Database address use the absolute host name or IP address in place of the default
(cmx_system) host localhost. The host name is configured during the MDM Hub
name in cluster installation.
environments You can update the host name in the DATABASE_HOST column of
the C_REPOS_DATABASE table.

The following table lists the recommendations for the virtual image environment:

Parameter Recommended Setting Description

Data File Storage Use a physical drive instead Use a physical drive to store the data files instead of a
of a virtualized drive. virtualized drive within the image. Use a physical drive to avoid
I/O contention because of a virtualized drive and the latency
caused by the introduction of another layer with no actual
benefit.

Hardware and As good as the equivalent Must be as good as the equivalent physical machine and must
Software physical or standalone meet all of the PAM or sizing requirements.
Specification instance.

CPU Cores 100% allocated to virtual The physical machine, where the virtual image is hosted, must
image. allocate 100% of the CPU cores to the virtual image. Sharing is
not recommended.

General Recommendations for Database 13


The following table lists the recommendations related to the database tables and data:

Parameter Recommended Setting Description

Custom and Backup Do not name the custom tables or backup During HUB server restart and Met
tables created with tables with names starting with C_REPOS%. Migration, the performance of the Hub
names starting with Console degrades if the schema has large
C_REPOS% volume tables with names that start with
C_REPOS%.
Ensure that backup tables are not created
with names starting with 'C_REPOS%'.

Fragmentation Minimize likelihood of fragmentation. Maintain the Oracle schema to ensure that
fragmentation is kept to a minimum.
Monitor and de-fragment whenever the
degree of fragmentation has an impact on
the MDM Hub performance.

High volume of data Perform regular maintenance on the If the number of records in the metadata
in C_REPOS_TABLES following METADATA tables: tables is too high, it might cause issues
with historical data - C_REPOS_AUDIT such as slow startup, out of memory errors,
- C_REPOS_MQ_DATA_CHANGE and performance issues.
- C_REPOS_JOB_CONTROL It is recommended to back up and truncate
- C_REPOS_JOB_METRIC or reduce the data volume on the metadata
- C_REPOS_MET_VALID_RESULT tables.
- C_REPOS_MET_VALID_MSG
- C_REPOS_TASK_ASSIGNMENT_HIST If you enable raw retention on any base
object then you can purge C_REPOS_JOB_*
tables for any date beyond the maximum
raw retention period.
For more information, search the
Informatica Knowledge Base for article
number 141201.

14 Chapter 2: Recommendations
Recommendations for Oracle Database
You can configure the parameters related to the Oracle database environment, tablespace, Oracle table
statistics, RAC recommendations, and Oracle flashback.

The following table lists the recommendations related to the Oracle database environment:

Parameter Recommended Setting Description

C_REPOS_APPLIED_LOCK Enable caching for this table, The application uses this table frequently so you
C_REPOS_APPLIED_LOCK can cache this table to improve performance.

RMAN backups Suppress RMAN backups from RMAN backups are good for a fast backup and
running during batch processing. restore. However, performance is decidedly lower
when the RMAN backup is performed.

Archive Logging Turn off archive logging during the Archive logging is unnecessary during the initial
initial data load. For steady-state data load and adds an overhead. If there is a
operation, you can turn on archive failure during the initial data load, Oracle rolls
logging after the initial data load back the entire transaction (the current batch
ends. cycle of a batch job). The process can be re-run
with no data loss.
During the initial data load, back up the ORS
schema that loads at regular checkpoints with no
jobs running. For example, back up the ORS
schema after major long running jobs have
completed. At an absolute minimum, take
backups after completion of each phase: Stage,
Load, Match, and Merge. You require backups to
safeguard the work already done before you
proceed.
You might enable archive logging for all steady-
state operations (post initial data load).
If you use a Standby database (database
mirroring), disable the standby before doing the
initial data load. When the initial data load is
complete, copy the database to the standby site,
and then enable the standby.

Recommendations for Oracle Database 15


The following table lists the recommendations for the tablespace:

Parameter Recommended Setting Description

Tablespace Use the following recommended Applicable for all tablespaces involved in the MDM Hub including
settings for tablespace: CMX_DATA, CMX_INDX, and CMX_TEMP.
- Locally Managed. Default Tablespace Block Size of 8 KB is good for high API
- Uniform Extent. workload implementations. Use 16 KB as a balanced block size to
- Auto Segment Management. support a mix of API and batch processing (small transactions and
- Default Tablespace Block bulk read and write transactions).
Size of 8 KB or 16 KB.
If you use a Default Tablespace Block Size of 16 KB and the
database was created with default block size of 8 KB, then
configure an appropriate DB_CACHE for it.

Tablespace Use the following Oracle Many small disks perform better than a few large disks if
Storage recommendations for storage of everything else remains equal.
tablespace data files: Use the following steps to improve the performance:
- If you use RAID: use either - RAID1+0 (RAID 10) has a high degree of fault tolerance with
RAID 1+0 or RAID 0. mirroring. Use as many disks as possible. Disks must be as fast
- If you do not use a RAID as possible.
controller: each tablespace - Avoid RAID 5 due to a writing overhead and poor performance if
must comprise multiple data there is a disk failure.
files spread across different - If there is no RAID controller: use multiple single disks to split
disks. several data files over more disks. Do not dedicate data files for
one tablespace to a single disk. Use the fastest disks possible
and stripe the disks. In such cases, each tablespace uses part of
each disk instead of a tablespace being dependent on a single
disk. Similarly, it is better to keep redo and undo logs in a
different physical disk.

The following table lists the recommendations for the Oracle table statistics:

Parameter Recommended Setting Description

Table Analyze the ORS schema on a regular (frequent) Analyze the entire ORS schema on a regular basis
Statistics basis. as a best practice.
Use the following options: Analyze individual tables whenever 10% of the
- Analyze the full schema (perform outside data is changed by using data sampling. This is
business hours). the Oracle recommendation. See Oracle database
- Analyze individual tables whenever 10% of the documentation for details.
data is changed. With sampling there is a trade-off. The best
- Perform unplanned table analysis. execution plans are always chosen when the
- Set the DBMS_STATS .SET_GLOBAL_PREFS statistics represent the entire table. When you
(see Environment Sharing in general database use sampling, the execution plan is as
recommendations). appropriate to the tables as the data sample. For
- With no sharing: example, you can use 10% of the table as
DBMS_STATS.SET_GLOBAL_PREFS( 'DEGREE representative of the table as a whole. If you use
', DBMS_STATS.AUTO_DEGREE); a sample, be sure to use a large enough sample.
- If forced to share: A sample of 1 to 2% is too small unless the table
DBMS_STATS.SET_GLOBAL_PREFS( 'DEGREE is large. A sample of 10% is more representative
', <(number of CPUs on DB Server) on smaller tables. You can switch off sampling
minus 1> ); and perform a full analyze, which provides the
most appropriate execution plans. However, a full
analyze is achieved at the cost of far higher
overhead in terms of the time taken when running
the analyze. It is best to run outside of business
hours to mitigate impact.

16 Chapter 2: Recommendations
The following table lists the recommendations for RAC:

Parameter Recommended Setting Description

Environment Each node must satisfy the same -


recommendations as set for the
standalone environment.

_PKQ Sequence Use NOORDER . Informatica recommends to increase initial data load
Set sequence cache to 20000. performance. These sequences are used to populate
the ROWID_OBJECT in the base object tables.
For more information, search the Informatica
Knowledge Base for article number 115788.

The following table lists the recommendations for the Oracle flashback:

Parameter Recommended Description


Setting

Flash recovery Fast file system. Use a fast file system for your flash recovery area, preferably without
area operating system file caching.

Disk spindles As needed. Configure enough disk spindles for the file system to hold the flash
recovery area.

Striped storage Smaller stripe size. If flash recovery area does not have non-volatile RAM, opt for striped
volume storage volume with smaller stripe size such as 128k.
This will allow each write to the flashback logs to be spread across
multiple spindles, improving performance.

For more information, search the Informatica Knowledge Base for article number 333718.

INIT.ORA Recommendations for Oracle


The INIT.ORA recommendations are based on the standard hardware of 24 GB RAM with 8-core CPU Oracle
server.

The following table lists the basic initialization parameters for the Oracle database:

INIT.ORA Parameter Applicable to version Value Description

cursor_sharing EXACT Only allows statements with


identical text to share the same
cursor.

db_block_checking FALSE To avoid additional overhead.

db_block_size 8192 This parameter affects the


maximum value of the
FREELISTS storage parameter
for tables and indexes. Oracle
uses one database block for each
freelist group.

Recommendations for Oracle Database 17


INIT.ORA Parameter Applicable to version Value Description

db_cache_size 2000M Reduces additional overhead on


dynamic allocation.

db_file_multiblock 0 To be auto-determined by Oracle


_read_count database.

db_writer_processe 1 Oracle guideline.


s (Single Core)

db_writer_processe CPU / 8 Oracle guideline.


s (Multi Core)

disk_async_io TRUE Oracle guideline.

filesystemio_optio SETALL This bypasses file system buffer


ns cache, especially on Linux.

java_pool_size 0 To be auto-determined by Oracle


database.

large_pool_size 400M If undefined, RMAN would use the


SHARED POOL.

log_buffer 10M A value of 10 MB is a reasonable


initial size. Increase based on
Oracle AWR report taken under
typical heavy load. Section:Cache
Sizes > Log Buffer.

memory_target Two thirds of available Two thirds of the available


memory. physical memory for Oracle. One
third is reserved for system and
other processes.

memory_max_target Two thirds of available Two thirds of the available


memory. physical memory for Oracle. One
third is reserved for system and
other processes.

open_cursors 1000 The MDM Hub uses parallel


processing opening up multiple
cursors.

parallel_adaptive_ TRUE The MDM Hub uses multi-


multi_user sessions for the same Oracle
instance.

Processes 1000 Sufficient Oracle processes are


allocated to support connection,
parallel thread, internal process,
and other usage. If set too small,
some processes might fail to run.

18 Chapter 2: Recommendations
INIT.ORA Parameter Applicable to version Value Description

Recyclebin OFF Set the value of the recyclebin


parameter to OFF. The MDM Hub
has many temporary tables that
the recycle bin spends time trying
to maintain when dropped.

shared_pool_size 400M Use 400 MB initially. Increase this


according to the value seen in the
Oracle AWR report for this
instance taken under typical
heavy load. Refer to
section:Cache Sizes > Shared
Pool Size.

streams_pool_size 0 The MDM Hub does not use


Oracle streams functionality.
Disable the functionality to
prevent Oracle from reserving
memory for it.

utl_file_dir Do not set this Use an Oracle Directory object


instead of this parameter. For
more information, see the
Informatica Knowledge Base
article 90456.

workareas_size_pol AUTO To be auto-determined by Oracle.


icy

pga_aggregate_targ 11g 0 in case of Automatic Set PGA explicitly for MMM. For
et Memory Management AMM, you need not set PGA.
(AMM) or One third of
memory allocated to
Oracle in case of Manual
Memory Management
(MMM).

sga_target 11g Two thirds of For SGA, Informatica


Max_Memory in AMM or recommends to allocate two
two thirds of allocated thirds of memory available for
memory to Oracle in Oracle regardless of AMM or
MMM. MMM.
In AMM configuration, the setting
of SGA explicitly always reserves
two thirds of memory for SGA.
This makes Buffer Cache ready
for large data loaded into it
during the batch. If you do not
SGA, AMM might allocate memory
for something else. The
remaining memory might also not
be enough for a big batch and it
might run slower.

Recommendations for Oracle Database 19


INIT.ORA Parameter Applicable to version Value Description

optimizer_capture_ False Setting this parameter to false


sql_plan_baselines makes SQL plan management to
not recalculate the execution plan
for each repeatable SQL
statement.

optimizer_index_ca 0 Setting this value to 0 defaults


ching the behavior of cost-based
optimization to favor nested loop
joins and IN-list iterators.

optimizer_index_co 100% To make use of default


st_adj optimization based on table
indexes, setting to default value
evaluates index access paths at
the regular cost. Default value is
100%.

optimizer_adaptive 12c FALSE (Scope = BOTH). Enables or disables all of the


_features adaptive optimizer features,
including adaptive plan.

optimizer_use_sql_ TRUE Enables or disables the use of


plan_ baselines SQL plan baselines.

For more information about the INIT.ORA parameters, see Informatica knowledge base article 90408.

Recommendations for Microsoft SQL Server


You can configure parameters related to the Microsoft SQL Server to optimize the performance of the MDM
Hub.

The following table lists the recommendations related to the Microsoft SQL Server environment:

Database Parameters Recommended Description


Setting

AUTO_UPDATE_STATISTICS_ASYNC ON To enforce statistics update in asynchronous mode.

PARAMETERIZATION FORCED To parameterize SQL statements. Parameterized


statements reduce the frequency of query compilations
and recompilations to improve performance.

READ_COMMITTED_SNAPSHOT ON To allow other sessions to see data that is not yet a


committed transaction.

20 Chapter 2: Recommendations
Recommendations for IBM Db2
You can configure the parameters related to the IBM Db2 environment, registry variables, and database file
configuration parameters.

The following table lists the recommendations related to the IBM Db2 environment:

Parameter Recommended Description


Setting

Physical Disk Different physical To reduce the amount of blocked input/output, you can increase input/
Drives drives for different output parallelism. Achieve input/output parallelism by storing user data
(Tablespace) tablespaces. tablespaces, temporary tablespaces, and transaction logs on different
physical disk drives. Batch operations can access all the paths in parallel,
which increases the throughput by reducing the input wait times and output
wait times.

Physical Disk Different physical If more physical disk drives are available, you can increase input/output
Drives drives for different parallelism by extending parallelism to the container level. To increase
(Container) containers. input/output parallelism, place all containers for a tablespace on different
physical disks. The IBM Db2 prefetchers and input/output cleaners access
these containers in parallel without blocking each other, thereby increasing
the throughput.

Processing When you process a large data set, use the following command to rebind
Large Dataset packages:
db2 bind @db2cli.lst blocking all grant public sqlerror
continue CLIPKG 10
Perform the step every time you process a large data set. It is better to do
so after the initial data load.

Reorganize Routinely determine and re-organize the match key tables (C_<Base
Match Tables Object>_STRP).
To improve the performance of the SearchMatch API, you need to
reorganize match key tables based on their primary key column, SSA_KEY.
To determine if a match key table needs reorganization, perform a
reorganization check and analyze the results. The cluster ratio of the
primary key index appears in the CLUSTERRATIO column, F4, of the
reorganization check result. The cluster ratio must be close to 100% for
optimal performance.
Determine when to reorganize the match key table by noting the cluster
ratio at which you observe degradation in the SearchMatch API
performance. Use the IBM DB2 REORGCHK and REORG commands to
reorganize tables.
Update the match key table statistics so that the IBM Db2 optimizer can use
the table layout that the reorganization generates.

Recommendations for IBM Db2 21


The following table lists the recommendations for the registry variables:

Database Parameters Recommended Setting Description

DB2_INLIST_TO_NLJN NO Configures the optimizer to prefer or not


prefer nested loop joins.
The Db2 SQL compiler might rewrite an IN
list predicate as a join. The rewrite might
provide better performance if you define an
index on the joined columns. When the
optimizer does not have accurate statistics,
it might not be able determine the best join
for the rewritten join. Set the variable to NO
to prevent the optimizer from favoring
nested loop joins in such cases.

DB2_ANTIJOIN YES Configures the optimizer to transform


subqueries into anti-joins. The MDM Hub
has queries that use NOT EXISTS
subqueries. Set the registry variable to YES
to look for possibilities to transform NOT
EXISTS subqueries into anti-joins that IBM
Db2 processes more efficiently.

DB2_REDUCED_OPTIMIZATION REDUCE_LOCKING Configures reduced optimization features or


rigid use of optimization features at the
specified optimization level. Set the registry
variable to REDUCE_LOCKING to favor
NLJOIN over MSJOIN whenever possible to
reduce the amount of locking on the outer
table.

DB2_EXTENDED_OPTIMIZATION ON, Configures whether or not the query


ENHANCED_MULTIPLE_DISTINCT, optimizer uses optimization extensions to
improve query performance.
IXOR,
The ON, ENHANCED_MULTIPLE_DISTINCT,
SNHD and SNHD values specify different
optimization extensions.

DB2_HAS_JOIN YES Configures hash join as a possible join


method when compiling an access plan.
Tune hash join to get the best performance.
For best performance of a hash join, avoid
hash loops and overflows to disk.
To tune hash join performance, perform the
following tasks:
- Estimate the maximum amount of
memory available for the sheapthres
configuration parameter.
- Tune the sortheap configuration
parameter.

22 Chapter 2: Recommendations
The following table lists the recommendations for the database file configuration parameters:

Database Recommended Description


Parameters Setting

LOCKLIST AUTOMATIC Allocates the amount of storage to the lock list of a database.
Multiple MDM Hub processes use "locks at the row-level" to complete
tasks and to support concurrency. The number of locks that IBM Db2
needs to acquire depends on the number of rows to process. If the
incoming volume differs greatly in size, set the parameter to AUTOMATIC
to allow the database manager to determine the appropriate value. If you
conservatively tune the LOCKLIST parameter value, lock escalations can
occur, and some of the MDM Hub operations can fail due to lock timeouts.

MAXLOCKS AUTOMATIC Configures the percentage of the lock list that one application can use.
Most MDM Hub processes run under the scope of a single application.
Such single applications can acquire many row-level locks, consuming
most of the available lock list. If the incoming volume differs greatly in
size, predicting the MAXLOCKS parameter value is difficult. Set the
parameter to AUTOMATIC to allow the database manger to determine the
appropriate value. If you conservatively tune the MAXLOCKS parameter
value, lock escalations can occur, and some of the MDM Hub operations
can fail due to lock timeouts.

CATALOGCACHE_SZ 25000 or higher Configures the maximum memory that the catalog cache can use from the
shared memory of the database.
IBM Db2 stores system catalog information in the catalog cache. The MDM
Hub comprises many dynamic SQL queries that reference multiple
metadata objects. If the catalog cache is large, IBM Db2 can retain
information for some of the metadata objects from the system catalogs in
the memory. If subsequent dynamic SQL queries require the same
metadata objects, the compilation is quick. The MDM Hub comprises many
frequently accessed metadata objects. Therefore, you must set the
CATALOGCACHE_SZ parameter value to 25000 or higher.

LOGBUFSZ 4096 or higher Configures the amount of the database heap to use as a buffer for log
records before writing the records to disk.
The MDM Hub creates logs for most of its operations in the IBM Db2
transaction logs. IBM Db2 buffers the log records in the log buffer before
writing them to the disk.
If the log buffer is large, IBM Db2 writes the log records to the disk less
frequently. This makes disk input/output for log records more efficient.
The default value for the database configuration parameter is not
sufficient for an average MDM Hub environment. Set the parameter to
4096 pages or higher.

Recommendations for IBM Db2 23


Database Recommended Description
Parameters Setting

LOGFILSIZ 128000 or Configures the number of log records written to the log files.
higher A single MDM Hub transaction can contain many DML queries resulting in
many log records that might span across many log files. A large log file
size avoids the need to create new log files frequently. If IBM Db2 creates
new log files frequently, it adversely influences the performance of input/
output bound systems.
The total log space for a database is equal to the total number of primary
and secondary log files multiplied by the log file size. The database must
have adequate log space to ensure that the MDM Hub transactions do not
run out of log space and fail. If the MDM Hub transactions fail, the
database needs more time to roll back the transactions. Set LOGFILSIZ to
128000 or higher to ensure that the MDM Hub transactions do not fail and
need a roll back. Also, you must consider the number of primary and
secondary logs for a complete equation of log space.

LOGPRIMARY 100 Configures the number of primary log files to be pre-allocated. IBM Db2
creates primary log files when you activate the database. If the
uncommitted transaction exhausts the primary log space, IBM Db2 creates
secondary log files as needed. Set the LOGPRIMARY parameter value to
100 to efficiently handle the MDM Hub processes. Secondary log files act
as a backup in cases where long transactions can exhaust the entire
primary log space.

LOGSECOND 100 Configures the number of secondary log files that IBM Db2 can create and
use for recovery log files.
Log file creation can adversely impact the performance based on the size
of the log file. If you allocate sufficient primary log space, transaction
performance increases because the database does not create secondary
log files frequently. Set the LOGSECOND parameter value to 100 to cover
unexpected long transactions due to large incoming volumes, especially
during batch processes. The sum of the values of LOGPRIMARY and
LOGSECOND must be 200.

PCKCACHESZ 128000 Configures the package cache size which is allocated out of the database
shared memory.
The MDM Hub has many dynamic SQL statements. Each dynamic SQL
statement has a compiled package associated with it. IBM Db2 caches
these packages in the package cache memory. You must configure an
appropriate package cache size to avoid package cache overflows, which
adversely influence performance.
Experiment with the values for the package cache size. Initially, set the
parameter value to 50000 and monitor the different phases of the MDM
Hub processes. If you observe frequent package overflows, tune the
parameter again.

STMHEAP AUTOMATIC Configures the limit of the statement heap, which is used during the
compilation of an SQL statement.
If the statement heap is not sufficient, it might prevent the optimizer from
evaluating all possible access plans for an SQL query. This might result in
a suboptimal plan and adversely influence performance. Set the STMTHEAP
parameter to automatic to allow the optimizer to weigh all possible access
plans for the compilation an SQL query.

24 Chapter 2: Recommendations
Database Recommended Description
Parameters Setting

SORTHEAP AUTOMATIC Configures the sort heap size.


The MDM Hub processes perform many sorts. If the sort heap size is not
sufficient, large sorts can spill from the memory to disk. Disk input/output
is slower compared to memory, and such sort spills can cause queries to
run longer. Sort spills to disk can adversely influence performance and is
more evident with larger spills. Set the parameter value to AUTOMATIC for
the memory tuner to dynamically size the memory area as the sort
requirements change.

SHEAPTHRES_SHR AUTOMATIC Configures the limit on the total amount of database shared memory that
the sort memory consumers can use at a time. Set the SHEAPTHRES_SHR
parameter to AUTOMATIC if you set the SORTHEAP parameter to
AUTOMATIC.

UTIL_HEAP_SZ 50000 or higher Configures the maximum amount of memory that the BACKUP, RESTORE,
and LOAD utilities can use simultaneously. During some batch operations,
the MDM Hub uses the IBM Db2 LOAD utility to move data between tables.
The LOAD utility uses the utility heap to complete the data movement
process. The size of the utility heap has an impact on the performance of
the LOAD operation. Set the UTIL_HEAP_SZ parameter to an appropriate
value to provide better throughput for the MDM Hub processes.

Recommendations for the MDM Hub


You might be able to improve performance by changing how the MDM Hub operates within the environment.

MDM Hub Environment


The following table lists the recommendations related to the MDM Hub environment:

Parameter Recommended Setting Description

Obsolete Data Remove obsolete items. Obsolete Data Director applications and Operational Reference
Director applications Stores schemas impact the performance of server startups,
and Operational run time memory, and Security Access Manager profile
Reference Stores caching.
databases

Order of Configure the security Configuration > Security Providers > Authentication
authentication providers in order with the Providers.
provider in Security first provider being the The MDM Hub authenticates the user based on the order of
Providers provider that the security providers configured. If most of the users are
authenticates the heaviest authenticated by using the custom security provider (if
user load. applicable), it is recommended to move it to the first position.
Note: Each authentication request has a cost of few
milliseconds associated with it. The number of authentication
requests is reduced significantly by using the User Profile
Cache.

Recommendations for the MDM Hub 25


Application Server
The following table lists the recommendations for the application server configuration:

Parameter Recommended Description


Setting

Maximum thread 300 or higher For example, in JBoss set the following property in the standalone-
count for the full.xml file:
thread pool
<thread-pools> <thread-pool
name="default"> <max-threads count="300"/>
</thread-pool></thread-pools>

Maximum 300 or higher For example, in JBoss set the following property in the standalone-
connections in full.xml file:
HTTP connection
pool <connector name="http" protocol="HTTP/1.1"
scheme="http" socket-binding="http"
max-connections="300"/>

JDBC logging OFF For example, in JBoss, set the following log level property in the
level standalone-full.xml file:
<subsystem xmlns="urn:jboss:domain:logging:1.2">:
<logger
category="com.microsoft.sqlserver.jdbc"> <level
name="OFF"/></logger>

Transaction Greater than 3600 Set the transaction timeout to at least 3600 seconds (1 hour).
timeout seconds. For example, in JBoss set the following property in the standalone-
full.xml file:
<coordinator-environment
default-timeout="3600"/>

Operational Reference Store


The following table lists the recommendations for the ORS configuration:

Parameter Recommended Setting Description

Production Enable this property in Production. [Configuration > Database > Database
Mode Properties]. Enable this property to remove
additional overhead of pre-scheduled
daemons that refresh the metadata cache.

Batch API Enable if both real time and batches are used. [Configuration > Database > Database
Inter- Properties]. Enabling this configuration has
operability an impact on performance. Enable the
configuration if batches and real time API
calls are used or if Data Director is used. If
your application uses neither real time API
updates nor Data Director, do not enable API
Batch Interoperability.
Tip: During Initial Data Load, disable this
property for faster loading of data.

26 Chapter 2: Recommendations
Parameter Recommended Setting Description

Auditing Disable the auditing completely. Auditing introduces additional overhead. You
must disable auditing completely.

Write lock cmx.server.writelock.monitor.interval=10 When more than one Hub Console uses the
monitor same ORS, a write lock on a Hub Server does
Interval not disable caching on the other Hub Servers.
The unit is in seconds.
For more information, see the Multidomain
MDM Configuration Guide.

Schema Design
The following table lists the recommendations for the schema design:

Parameter Recommended Setting Description

Child Base Avoid too many child base The performance of load, tokenize, and automerge batch jobs
Objects objects for a particular decreases as the number of child base objects for a base object
parent base object. increases.

Match columns Avoid too many match The performance of tokenize and match jobs decreases with the
columns. increase in the number of match columns.

Lookup Enable Lookup Indicator only Schema > [base object] > Advanced > Lookup indicator. Enabling
Indicator for 'Lookup' tables and not lookup indicator for non-lookup base objects unnecessarily caches
for any other base objects the base object data in the memory. Doing so results in out of
unrelated to lookup. memory errors, slow Data Director performance, and slower rate of
lookup cache refresh.

Lookup Display Configure Lookup Display For high volume lookup tables:
Name Name to be the same as the If you set the lookup display name to any column other than the
lookup column. column on which the relationship is built, SIF PUT calls must send
the lookup display name values in the SIF call. When inserting data
into the base object, the lookup value is validated by querying the
lookup table. The order of lookup is predefined: the lookup display
column value comes first followed by the actual column value
second. In high volume lookup tables this becomes an overhead.

History Enable History if you want to If you enable History for a base object, the MDM Hub additionally
retain historical data for the maintains history tables for base objects and for cross-reference
specific base object. tables. The MDM Hub already maintains some system history
Otherwise, disable it. tables to provide detailed change-tracking options, including
merge and unmerge history. The system history tables are always
maintained.
Over a period of time, history in the database keeps growing.
Consider keeping months or at most a few years of history in the
system and to preserve database access performance.

History To avoid very large history For more information, search the Informatica Knowledge Base for
tables that cause article number 306525.
performance issues, you can
partition the tables.

Recommendations for the MDM Hub 27


Parameter Recommended Setting Description

Cross Enable Cross Reference Schema > [base object] > Advanced > Enable History of Cross
Reference Promotion History if you Reference Promotion. Enabling history incurs performance cost
Promotion want to retain historical data both to real time and to batch operations. Use the history option
History for the specific base object. cautiously and if required.

Trust Configure trust only for A higher number of trust columns and validation rules on a single
required columns. base object incur higher overhead during the Load process and the
Merge process.
If the more trusted and validated columns are implemented on a
particular base object:
Longer SQL statements (in terms of lines of code) are generated to
update the _CTL control table and the _VCT validation control
table.
Minimize the number of trust and validation columns to conserve
good performance.

Case Enable Case Insensitive Enabling Case Insensitive Search for non-VARCHAR2 columns
Insensitive Search only for VARCHAR2 hinders performance.
Search columns.
Ensure that you do not
include any column with a
data type other than
VARCHAR in the Search
Query.

Message Avoid configuring multiple Do an in-depth analysis before configuring message triggers.
Trigger Setup message triggers for There is a performance cost associated with them during the
different event types. execution of load jobs.

Tune Message Trigger Query. The best approach to tuning the query used in the Package Views
is to use Explain Plan. Add custom indexes wherever required to
avoid full table scans, and analyze tables/schema on a regular
basis. When you use Explain Plan, retrieve the plan by wrapping the
query around an outer query that contains a "where" clause for a
rowid_object equal to.
For more information about message triggers, search the
Informatica Knowledge Base for article number 142115.

Throughput can be greatly The Message Queue Monitoring settings have a major impact
improved if you increase the related to the throughput message posting time. Configure these
Receive Batch Size and settings from the Hub Console in the Master Reference Manager
reduce the Message Check (MRM) Master Database (CMX_SYSTEM) in the Configuration
Interval. section.

Avoid unnecessary column Do not select "Trigger message if change on any column" if you do
selection in message trigger. not need to monitor all the columns. Also, try to minimize the
selection of columns.

28 Chapter 2: Recommendations
Parameter Recommended Setting Description

'Read Use the following The 'Read Database' cleanse function incurs a performance
Database' recommended settings: overhead compared to using a similar MDM Cleanse Function to
Cleanse - Use with caution. perform the same function. The performance overhead is more
Function - Enable ‘cache’ if used. pronounced on a high volume table. The overhead is caused by the
creation of a new database connection and the corresponding
transmit to, processing by, and receipt of the results from the
database. These would otherwise be managed within the Process
Server application layer.
If use of this function cannot be avoided, if applicable, enable
caching behavior of the Read Database function. Pass a Boolean
'false' value to the 'clear cache' input field of the Read Database
function'. Doing so reduces performance lag by enabling future
operations to use the cached value rather than creating a new
database connection on each access of the function.

Cleanse Do not make it very complex. The performance of batch jobs increases with reduced number and
Functions reduced complexity of cleanse functions.

Timeline The Dynamic Timeline or Versioning must only be enabled on those Entity base objects
(Versioning) Versioning must be enabled (regular MDM Hub base object) which strictly need it to maintain
on an Entity base object the fastest performance possible. With the Versioning
(regular base object, not a functionality, the additional associated metadata and processing
Hierarchy Manager carry a significant amount of complex processing when running
Relationship base object) if any process on a version-enabled base object. Enabling versioning
strictly required. on a base object brings an additional performance cost to all
Versioning has a processing performed on that base object.
performance impact on the For more information, search the Informatica Knowledge Base for
base object associated with article numbers 138458 and 140206.
it.
For Hierarchy Manager
Relationship base objects,
versioning is enabled with no
option to disable.

State Disable state management if State Management carries an associated performance overhead.
Management you do not require it. If you use Data Director with workflows, you must enable State
Management.
However, enabling History for State Management Promotion at the
cross-reference level is optional.

Delta Detection Enable it only on the Delta Detection carries a sizable associated overhead on
minimum number of columns performance.
that strictly need it. If a Landing Table has only new and updated records in every
staging job, you can disable delta detection. If you want to enable
Delta Detection the least impactful approach is to use the
last_update_date. If you need additional columns, for each
additional column you enable, analyze if the involvement of this
additional column is worth the associated performance overhead.
Avoid blindly enabling Delta Detection on all the columns.

Cleanse Minimize the complexity of Minimized the complexity in the mapping to have better
Mappings Mappings. performance.
If you use a Cleanse List in a Cleanse mapping, use static data in
the Cleanse List.
Consider using lookup tables only for dynamic data.

Recommendations for the MDM Hub 29


Parameter Recommended Setting Description

Validation Optimize Validation Rule SQL The detection piece of each Validation Rule SQL runs against every
Rules code. record during the Load process to determine if it applies.
Poorly performing Validation Rule SQL influences performance on
every Loading record.

User Exits Optimize user exit code for User exit code influences performance if not optimized.
performance. Applicable to both Data Director and batch user exits.

Packages Optimize the SQL code These MDM Hub Packages are used in Data Director, SIF API calls,
written in each MDM Queries Data Manager, Merge Manager, and in search operations.
which is called from an MDM If you do not tune the MDM Hub for performance, it results in an
Hub Package. expensive operation whenever it is called.

Custom Use caution when adding Index management has a performance cost associated.
Indexes custom indexes. Each index Perform the following steps to improve the performance cost:
added has an associated 1. Get a log of the real queries run on the base object (content
cost. Ensure that gain data) and base object shadow tables (content metadata) on a
received outweighs the cost typical day. Ignore temporary T$% tables and system C_REPOS_
of each additional custom % tables;
index. 2. Identify indexes which exist on these tables to avoid
unnecessary overlap.
3. Before adding any indexes, review a regular day of logs and take
an inventory of:
a. SIF API call duration.
b. Data Director process durations.
c. Batch jobs:
a. Duration of each batch job.
b. Duration of each cycle within that batch job.
c. Duration of longest running statements within a batch job.
4. Consider the longest running process for potential benefit from
a custom index.
5. Consider adding indexes so the longest running SQL query or
queries hit the new index in their execution plan. Avoid indexing
fields which have many updates or inserts.
After each new custom index added return to Step 3 and assess if
there is still potential to improve performance through adding
more custom indexes.

Parallel Degree Between one and number of Parallel degree is an advanced base object property. For optimum
on Base Object CPU cores on database performance of batch jobs, set a value between one and the
machine. number of processor cores on the database server machine.
For more information, search the Informatica Knowledge Base for
article number 181313.

30 Chapter 2: Recommendations
Match and Merge
The following table lists the recommendations for the match and merge configuration:

Parameter Recommended Description


Setting

Match Path Filter Filter on root path If you need to exclude records from the match process, filter
instead of at the on the root path instead of at the match rule level.
match rule level. When you filter at the root level, it excludes the records from
tokenization and they, therefore, do not participate in the
match.

Check for missing children Use it with This match patch property indicates if parent records must be
caution. considered for matching based on the existence of child
records.
If you need a fuzzy match on a base object, tokenization of a
parent base object record must occur. Tokenization of a
parent base object record occurs if all child base objects that
have the option to check for missing children disabled have a
related child base object record. If a parent base object
record has a child, where the option to check for missing
children is disabled yet contains no record, the parent record
is not tokenized.
The MDM Hub performs an outer join between the parent and
the child tables when the option to check for missing children
is enabled. This option has an impact on the performance on
each match path component on which the option is enabled.
Therefore, when not needed, it is more efficient to disable this
option.

Match Key The tighter the The width of the match key determines the number of rows in
key the better the the tokenization table (number of tokenized records which are
performance. used to match each record to be matched) and the number of
records to be considered for each match candidates. Usually
the standard key width is enough.
Search Level: Use the narrowest possible search level to
generate acceptable matches. Usually the typical search level
is enough.
Match Rules: For each match rule, add one or more exact
match columns to act as a filter to improve the performance
for each rule.

Dynamic Match Analysis Change if Although DMAT helps improve performance, take care when
Threshold (DMAT) required. setting this limit. If you set the level too low it might cause
Default is 0. under matching. It is recommended that clients first analyze
the data to assess why a particular search range contains a
large count. Sometimes the reason might be due to a noise-
word or phrase like "do not send" or a valid-word or phrase
such as "John."
When setting this value, identify any large ranges that are
causing bottlenecks and then use the "Comparison Max
Range" count to set the DMAT.
Proper analysis is required to change this value. See the
Informatica Knowledge Base article 90740 on changing this
value.
For more information, search the Informatica Knowledge Base
for article number 90740.

Recommendations for the MDM Hub 31


Parameter Recommended Description
Setting

STRIP_CTAS_DELETE_RATIO Change if C_REPOS_TABLE [base object] >


required. STRIP_CTAS_DELETE_RATIO.
Default is 10%. Proper analysis is required if you decide to change the default
10% value.
If the volume of data change in the _STRP table is more than
this percentage, tokenization would instead use 'Create Table
… As Select …' code to recreate the _STRP table with the
needed changes rather than delete and/or insert operations to
arrive at the same result in less time. The optimal value might
vary for each implementation and it depends on the size of
the table and the percentage of records that must be updated.

COMPLETE_STRIP_RATIO Change if [Model > Schema > [base object] Advanced > Complete
required. Tokenize Ratio].
Default is 60%. Proper analysis is required if you decide to change the default
60% value.
If the volume of data change in the _STRP table is more than
this percentage, the tokenization process would drop and re-
create the entire _STRP table rather than (re)tokenizing only
the updated records.

AUTOMERGE_CTAS_RATIO Change if C_REPOS_TABLE [base object] > AUTOMERGE_CTAS_RATIO.


required. Proper analysis is required before you decide to change the
Default is -1. default -1 value.
If the volume of records queued for merge is greater than this
percentage, automerge uses the 'Create Table As Select'
option for faster merging instead of the regular delete/insert
operations.
Note: Default value is -1 which indicates that this Create Table
as Select (CTAS) feature is OFF for Automerge.

For more information about tuning match and merge, search the Informatica Knowledge Base for article
number 357214.

Services Integration Framework (SIF) APIs


The following table lists the recommendations for the SIF API:

Parameter Recommended Setting Description

Protocol Use EJB protocol over EJB Protocol is faster and reliable.
HTTP or SOAP. For more information, search the Informatica Knowledge Base for
article number 138526.

Disable Set to True if not required. SIF Request parameter: disablePaging


Paging If paging is not required (if results are not going to return many
records), it is better to set this flag to true. If set to false (default
value), this would incur two database calls for each SIF call.

Return Total Do not set any value. SIF request parameter: returnTotal
If you do not require total count, it is better to not set this flag. If set
to true, it would incur two database calls for each SIF call.

32 Chapter 2: Recommendations
Hub Server Properties
The following table lists the recommendations for the Hub Server properties:

Parameter Recommended Description


Setting

User Profile True cmx.server.provider.userprofile.cacheable


Cache Default is true. This property is found in the cmxserver.properties file.
When you set this flag to true, once a user profile is authenticated, it is cached.
Set the flag to true to suppress the need for explicit user authentication
requests for every SIF call.

User Profile 60000 cmx.server.provider.userprofile.lifespan


Life Span Default is 60000. This property is found in the cmxserver.properties file.
Time to retain the cached user (milliseconds) before refreshing. A few minutes
is adequate-avoid setting this to longer durations.

Security 5 clock ticks cmx.server.sam.cache.resources.refresh_interval


Access Default is 5 clock This property is found in the cmxserver.properties file.
Manager ticks at a rate of
(SAM) cache Refreshes the SAM cache after the specified clock ticks. To specify the number
60,000 of milliseconds for 1 clock tick, use the cmx.server.clock.tick_interval
refresh milliseconds for 1
interval property.
clock tick, which is
equivalent to 5 For more information, see the Multidomain MDM Configuration Guide.
minutes.

Cleanse 30 (seconds) cmx.server.poller.monitor.interval


Poller Default is 30. This property is found in the cmxserver.properties file.
For every number of seconds configured, the MDM Hub Server would poll the
availability of the Process Server and accordingly flag the status of the Process
Server as valid or invalid.
For more information, search the Informatica Knowledge Base for article
number 151925.

Infinispan
The following table lists the recommendations for Infinispan parameters, which are located in the
inifinspanConfig.xml file:

Parameter Recommended Description


Setting

expiration 86400000 Maximum lifespan of a cache entry in milliseconds. When a cache entry
lifespan (milliseconds) exceeds its lifespan, the entry expires within the cluster.
You can increase the lifespan for the following caches:
DISABLE_WHEN_LOCK, DATA_OBJECTS, and REPOS_OBJECTS. For example,
you can increase a lifespan from one hour (3600000) to one day (86400000).
Each cache has its own default value for this parameter. To find the default
values, open the inifinspanConfig.xml file.

expiration 300000 Maximum interval for checking the lifespan.


interval (milliseconds) For example, you can increase an interval from five seconds (5000) to five
minutes (300000).

Recommendations for the MDM Hub 33


For more information about Infinispan parameters, search the Informatica Knowledge Base for article
number 509572.

Logging
The following table lists the recommendations for the logging:

Parameter Recommended Description


Setting

Hub Server Set to ERROR mode. Change the log4j.xml file to use ERROR mode. If clustered, update the
Logging log4j.xml file in all nodes. For JBoss, use <JBoss node>/conf/jboss-
log4j.xml. For other application servers, update <INFAHOME>/hub/
server/conf/log4j.xml. Once the log4j configuration file is updated,
changes are reflected in the log within a few minutes.

Process Server Set to ERROR mode. Change the log4j.xml file to use ERROR mode. If clustered, update the
Logging log4j.xml file in all nodes. For JBoss, use <node>/conf/jboss-
log4j.xml. For other application servers, update the <INFAHOME>/hub/
cleanse/conf/log4j.xml file. Once the log4j configuration file is
updated, changes are reflected in the log within a few minutes.

For more information about logging, search the Informatica Knowledge Base for article number 120879.

Search
The following table lists the recommendations for search:

Parameter Recommended Description


Setting

Limit Do as needed. Do not index unnecessary searchable fields. Multiple searchable fields
Searchable increase the indexing and searching time, so configure only the required
Fields fields as searchable fields. Also keep only the required fields and facets.
Facets should only be on the fields with low entropy. Also limit the number
of fuzzy fields.

Task Assignment
The following table lists the recommendations for task assignments:

Parameter Recommended Description


Setting

task.creation.batch.size Default is 1000. In MDM 10.0 and earlier, the default value is 50.
Available in cmxserver.properties.
Sets the maximum number of records to process for each
match table.
If more tasks need to be assigned on the run, you can
increase this value.

34 Chapter 2: Recommendations
Operational Reference Store and SIF APIs
The following table lists the recommendations for ORS-specific SIF API generation:

Parameter Recommended Description


Setting

Required objects As needed. ORS specific API generation depends on the number of objects selected. It
is preferable to add only the required objects to gain performance during
the SIF API generation.

SIF API (Java Default is 256m. sif.jvm.heap.size


Doc Generation) Available in cmxserver.properties.
Heap Size
Sets the heap size used during the creation of Java Doc. As Java Doc
creation takes a lot of heap memory, you can increase this to a higher value
if required. Note that this heap size setting is not connected to the heap
size of the MDM application, which is set during the server startup.

Informatica Data Quality


The following table lists the recommendations for Informatica Data Quality cleansing:

Parameter Recommended Description


Setting

Batch Size Default is 50. cmx.server.cleanse.number_of_recs_batch


Available in cmxcleanse.properties.
If the workflow supports minibatch, then you can set this value to any desirable
value depending on the number of records to be cleansed at a time. If this attribute
is set, then MDM automatically groups the records for cleansing.
Note: This property can be used in other cleansing engines if they support
minibatch.

For more information, search the Informatica Knowledge Base for article number 153419.

Recommendations for the MDM Hub 35


Initial Data Load
The following table lists the recommendations for the initial data load:

Parameter Recommended Description


Setting

Database/ Distributed Matching: Set the cmx.server.match.distributed_match=1


Environment parameter in the cmxcleanse.properties file. This often improves
settings performance by spreading the match load across multiple servers. Ensure that
your Cleanse Match servers are configured so that match processing is
performed in the Batch mode on the servers you expect to spread the match
across.
UNDO Tablespace: The Match process utilizes a lot of UNDO tablespace. Ensure
that sufficient UNDO tablespace is available in the database. Adjust your batch
match size accordingly.
Database Archive Log: Switch off or Disable to improve the performance during
IDL.
Batch API Interoperability: Disable during IDL.
Application server performance: JVM, thread counts, block size. Refer to the
corresponding sections in the guide.
Performance parameter: Set the siperian.performance parameter in the
LOG4J.XML file to OFF.
Match Rules Settings: Performance of the match job majorly depends on the
match rule configuration. Use the default settings of match configuration. Change
it only if it is required and extensively tested. Make the search level as exhaustive
and key width as extended.
Connection pool size: Ensure that there is sufficient connection pool available for
the sizing. Refer to connection pool recommendations in this guide.
Indexes: Disable any custom indexes
Constraints: Choose 'Allow constraints to be disabled' option to disable the
constraints on NI indexes on base object and all indexes on XREF
History: Disable history
Analyze Schema: Analyze database schema prior to IDL
Batch API Interoperability: Disable this flag.
Production mode: Enable production mode flag.

For more information, search the Informatica Knowledge Base for article numbers 158622 and 158822.

Recommendations for Batch Job Optimization


A batch job is a program in the MDM Hub that you can run to complete a discrete unit of work. You can
launch batch jobs individually or as a group from the Hub Console or with the SIF APIs. You can configure
settings to optimize the performance of batch jobs.

36 Chapter 2: Recommendations
The following table lists the different batch job parameters and their recommended settings to achieve a
base-level performance:

Parameter Recommended Description


Setting

Cleanse Start with the Available in “Process Server > Threads for Cleanse Processing”.
Thread Count number of Total number of threads used by the Master or Slave Process Server when
Used in the cores available. executing. Generate Match Tokens after Load, Match, and Stage jobs.
following Based on CPU
batch jobs: utilization,
- Match Job number of
- Generate threads can be
Match increased.
Tokens Default is 1.
process on
Load job
- Stage job

Threads for Specify a value Available in “Process Server > Threads for Batch Processing”.
Batch that is Maximum number of threads to use for a batch process.
Processing equivalent to
four times the For example, if the host machine has 16 CPU cores, set the Threads for Batch
Used in the Processing in the Process Server registration to 64. Applicable only if the
following number of CPU
cores on the Process Server is marked for batch processing.
batch jobs: Note: From the total number of threads available on the Process Server, dedicate
- Automerge system on
which the n threads for Batch jobs by setting a value for the property number of threads for
Job Batch processing.
- Load Job Process Server
- Batch is deployed.
Delete Default is 20.
- Batch
Unmerge
- Batch
Revalidate

Controller 300000 (5 com.informatica.mdm.loadbalance.ControllerThread.timeout


Thread Time minutes). This property is found in the cmxcleanse.properties file.
Out Default is When distributing the load to different slave Process Servers, after the last block
Used in the 300000. is sent to a slave Process Server, all slave Process Servers which are processing
following the blocks MUST complete the job within the timeout period.
batch jobs: Note: If not completed, such blocks are marked with ‘No Action’ in the batch
- Automerge result. Note that, the batch is not marked as failed because the remaining blocks
Job are successfully loaded.
- Load Job
- Batch
Delete
- Batch
Unmerge
- Batch
Recalculate

Recommendations for Batch Job Optimization 37


Parameter Recommended Description
Setting

Load analyze Default is 10. cmx.server.batch.load.analyze_threshold_rate


threshold rate Available in cmxserver.properties
Used in the For ORACLE only. Available from MDM 10.0 HotFix 1.
following
batch jobs: Specifies the frequency that the MDM Hub gathers analytical statistics for tables
- Automerge affected by a batch Load job. Set to 0 to disable statistic collection. Set to 1 to
Job collect statistics only at the end of a Load job for base object and cross-
- Load Job reference tables.
- Batch For example, if the threshold is 10, then statistics would be gathered at every
Delete 10^n records. For example, new statistics would be gathered whenever the insert
- Batch record count reaches 100, 1000, 10000, and so on.
Unmerge
- Batch
Recalculate

Recycler 300000 (5 com. informatica.mdm.batchserver.RecyclerThread.max_idling


Thread Max minutes). This property is found in the cmxcleanse.properties file.
Idling Default is If a slave Process Server is processing a block of batch job and is idle for a
Used in the 300000 (5 duration specified in this attribute then the specific thread is marked as 'dead.'
following minutes). Note: If a slave Process Server is timed out as noted earlier, the corresponding
batch jobs: block is marked with ‘No Action’ in the batch result. Note that the batch is not
- Automerge marked as failed as the remaining blocks are successfully loaded.
Job
- Load Job
- Batch
Delete
- Batch
Unmerge
- Batch
Recalculate

Automerge: Default is 1. cmx.server.automerge.threads_per_job


Automerge This property is found in the cmxserver.properties file.
Threads Per Maximum number of threads distributed across different Process Servers to
Job process the automerge job.
For example, if this value is 20, automerge would be distributed across two
Process Servers each with 10. The distribution depends on factors such as CPU
weightage of the Process Server and other jobs running on the Process Server.
This value must be less than the value in 'Threads for Batch' attribute specified
for the Process Server.
Note: The optimum value for a database server with a 16 core processor and a
solid-state drive (SSD) set up in a RAID is 20. Based on CPU utilization on
different Process Servers, you can increase the threads.

Automerge: Default is 250. cmx.server.automerge.block_size


Automerge This property is found in the cmxserver.properties file.
Block Size Maximum number of records to be sent for merges to each Process Server in
one block.
For example, consider the scenario of two Process Servers with 1000 records to
be merged. If this value is 250, each Process Server gets 250 records first
followed by another 250 records next.
Increasing this value can provide performance improvement based on how
powerful the application servers and database servers are.

38 Chapter 2: Recommendations
Parameter Recommended Description
Setting

Load: Default is 1. cmx.server.batch.threads_per_job


Batch Threads This property is found in the cmxserver.properties file.
Per Job Maximum number of threads distributed across different Process Servers to
process the load job.
For example, if this value is 20 then load process would be distributed across
two Process Servers each with 10. The distribution depends on factors such as
CPU weightage of the Process Server and other jobs running on the Process
Server.
This value must be less than the value in 'Threads for Batch' attribute specified
for the Process Server.
Note: The optimum value for a database server with a 16 core processor and a
solid-state drive (SSD) set up in a redundant array of independent disks (RAID) is
20. Based on CPU utilization on different Process Servers, you can increase the
threads.

Load: Default is 250. cmx.server.batch.load.block_size


Batch Block This property is found in the cmxserver.properties file.
Size Maximum number of records to be sent for load, to each Process Server in one
block.
For example, consider the scenario of two Process Servers with 1000 records to
be loaded. If this value is 250, each Process Server gets 250 records first
followed by another 250 records next.
Increasing this value can provide performance improvement based on how
powerful the application servers and database servers are.

Load: Same as See 'Threads for Cleanse Processing' attribute described earlier.
Threads per "Threads for Note that, this thread attribute is different from the core threads per job attribute
job for cleanse of the load job described earlier.
generate processing".
If 'Generate Match Tokens on Load' is not selected, this attribute does not have
tokens, if any impact on the performance of the Load job.
'Generate
Match Tokens
on Load'
attribute is
enabled on the
base object

Batch Same property, cmx.server.batch.threads_per_job


Recalculate re-used from This property is found in the cmxserver.properties file.
(SIF API LOAD Job. See
Request): LOAD Job Same property, re-used from LOAD Job. See LOAD Job section for more details.
Recalculate section for
Threads Per more details.
Job

Recommendations for Batch Job Optimization 39


Parameter Recommended Description
Setting

Batch Default is 250. cmx.server.batch.recalculate.block_size


Recalculate This property is found in the cmxserver.properties file.
(SIF API
Request): Maximum number of records to be sent, to recalculate BVT, to each Process
Server in one block.
Recalculate
Block Size For example, consider the scenario of two Process Servers with 1000 records to
be recalculated. If this value is 250, each Process Server gets 250 records first
followed by another 250 records next.
Increasing this value can provide performance improvement based on how
powerful the application servers and database servers are.

Batch Same property, cmx.server.batch.threads_per_job


Recalculate re-used from Available in cmxserver.properties
(SIF API LOAD Job.
Request): Refer to LOAD Same property, re-used from LOAD Job. Refer to LOAD Job section for more
Job section for details.
Threads Per
Job more details.

Batch Default is 250. cmx.server.batch.batchunmerge.block_size


Unmerge (SIF This property is found in the cmxserver.properties file.
API Request):
Maximum number of records to be sent for unmerges, to each Process Server in
Unmerge Block one block.
Size
For example, consider the scenario of two Process Servers with 1000 records to
be unmerged. If this value is 250, each Process Server gets 250 records first
followed by another 250 records next.
Increasing this value can provide performance improvement based on how
powerful the application servers and database servers are.

Batch Delete Same property, cmx.server.batch.threads_per_job


(SIF API re-used from This property is found in the cmxserver.properties file.
Request): LOAD Job. See
LOAD Job Same property, re-used from LOAD Job. See LOAD Job section for more details.
Threads per
job section for
more details.

Batch Delete Default is 250. cmx.server.batch.delete.block_size


(SIF API This property is found in the cmxserver.properties file.
Request):
Maximum number of records to be sent for deletion, to each Process Server in
Delete Batch one block.
Block Size
For example, consider the scenario of two Process Servers with 1000 records to
be deleted. If this value is 250, each Process Server first gets 250 records and
then another 250 records.
Increasing this value can provide performance improvement. This performance
improvement depends on how powerful the application servers and database
servers are.

40 Chapter 2: Recommendations
Parameter Recommended Description
Setting

Tokenize: Default is true. cmx.server.tokenize.file_load


Tokenization This property is found in the cmxcleanse.properties file.
File Loader Applicable for Oracle and Db2.
Option
If true, Db2 file loader or Oracle SQL Loader is used to load the records during
the tokenization job.
If file writing is causing performance issue, this can be changed to false,
thereby, data is directly written to the database every time instead of file loader
option. Generally, file loader is faster than the direct database write. You might
choose the option according to your environment.

Stage: See 'Cleanse See 'Cleanse Thread Count' attribute described earlier.
Threads per Thread Count'
job attribute
described
earlier.

Stage: Default is 1000. cmx.server.cleanse.min_size_for_distribution


Cleanse This property is found in the cmxcleanse.properties file.
Minimum The MDM Hub distributes the cleanse job across different cleanse server only if
Distribution the number of records is higher than this minimum size.
When distributing the load, each slave Process Server would use the Cleanse
Thread Count for the number of worker threads.

Stage: Default is false. cmx.server.java_jdbc_loader


Stage JDBC Usually, file Applicable for Oracle and Db2.
Loader writing must be Default is false.
faster than the
direct database This property is found in the cmxcleanse.properties file.
writing. If true, Db2 and Oracle use direct database connections during the stage job
instead of Db2 file loader or Oracle SQL loader options
If file writing is causing performance issue, this can be changed to true. On
doing so, data gets directly written to the database every time instead of file
loader option. Note that, generally, file loader is faster than the direct database
write. You might choose the option according to your environment.

Match: See 'Cleanse See 'Cleanse Thread Count' attribute described earlier.
Threads per Thread Count'
job attribute
described
earlier.

Match: Enable this flag cmx.server.match.distributed_match


Match to 1, if the MDM This property is found in the cmxcleanse.properties file.
Distribution Hub has to
distribute the The MDM Hub distributes the match job across different cleanse server only if
Flag this value is set to 1.
match job load
across different When distributing the load, each slave Process Server would use the Cleanse
cleanse Thread Count for the number of worker threads.
servers.

Recommendations for Batch Job Optimization 41


Parameter Recommended Description
Setting

Match: Default is true. cmx.server.match.file_load


Match File Usually, file Applicable for Oracle and Db2. Default is true.
Loader Option writing must be This property is found in the cmxcleanse.properties file.
faster than the
direct database If true, Db2 file loader or Oracle SQL Loader is used to load the records during
writing. the tokenization job.
If file writing is causing performance issue, this can be changed to false,
thereby, data will be directly written to the database every time instead of file
loader option. Generally, file loader is faster than the direct database write. You
might choose the option according to your environment.

Match: Default is 250. cmx.server.match.loader_batch_size


Match Loader This property is found in the cmxcleanse.properties file.
Batch Size Applicable if JDBC load is used in match processing instead of file loader
option.
Maximum number of records to be sent for match in each worker thread.
Increasing this value can provide performance improvement based on how
powerful the application servers and database servers are.

Match: Default is 20 Hub Console > Base Object > Max Elapsed Match Minutes.
Match Elapsed (minutes). The execution timeout in minutes when executing a match rule. If this time is
Time reached, the match process will exit. This must be increased only if the match
rule and the data are very complex. Generally rules must be able to complete
within 20 minutes.

Match: Default is Hub Console > Base Object > Match/Merge Setup > Number of rows per match
Match Batch 20000000. job batch cycle.
Size Maximum number of records to be processed by the MDM Hub for matching.
This number would affect the duration of match process.
Also, lower the match batch size, you have to run the match process more times.
Note: When running large Match jobs with large match batch sizes, if there is a
failure of the application server or the database, you must re-run the entire
batch.

Match: Default is 5000. max_records_per_ranger_node


Maximum This property is found in the cmxcleanse.properties file.
records per Number of records per match ranger node (limits memory use). Ranger is an
ranger node internal component used within the match process where sorting and merging
operations are performed based on this maximum records attribute.
You can optimize this value to get better performance based on the memory
available in your application server.

42 Chapter 2: Recommendations
Parameter Recommended Description
Setting

Initially Index 10000. cmx.server.batch.smartsearch.initial.block_size


Smart Search Default is 250. Available in cmxserver.properties.
Data:
Maximum number of records that the "Initially Index Smart Search Data" batch
Block Size job can process in each block. This property is not applicable through regular
indexing outside this specific batch job.
When you index a large data set, you can set the value to 10000.
Note: This property is available only from MDM 10.0 Hot Fix 2.

Initially Index Default is 1. cmx.server.batch.threads_per_job


Smart Search Same property, Available in cmxserver.properties.
Data: re-used from Maximum number of threads distributed across different Process Servers to
Smart search LOAD Job. process the batch job "Initially Index Smart Search Data". You can increase this
threads value to achieve more performance during this batch job. This property is not
applicable for regular indexing outside this specific batch job.

Multithreaded Batch Job – Process Flow


The following image shows the process flow of Automerge, Load, Batch Delete, Batch Unmerge, and Batch
Recalculate jobs:

Master Process Server


The following list describes the properties to configure for the multithreaded batch jobs:

• Threads allocated in Slave Process Server #1 + Threads allocated in Slave Process Server #2 + … [All
Slave Process Servers] must be equal to the specific “Threads Per Job” parameter in
cmxcleanse.properties file.

Recommendations for Batch Job Optimization 43


The following properties are related to threads:
- cmx.server.automerge.threads_per_job

- cmx.server.batch.threads_per_job
• Each Slave Process Server would get number of records as specified in the block size.
The following properties are related to block_size:
- cmx.server.automerge.block_size

- cmx.server.batch.block_size

- cmx.server.batch.recalculate.block_size

- cmx.server.batch.batchunmerge.block_size

- cmx.server.batch.delete.block_size
• After the last block is sent to the next available Slave Process Server, all Slave Process Servers that
process the blocks MUST complete the job within the timeout period.
com.informatica.mdm.loadbalance.ControllerThread.timeout

Slave Process Server


The following list describes the properties to configure for multi-threaded batch jobs:

• Threads allocated for Batch Job 1 + Threads allocated for Batch Job 2 + … [All parallel batch jobs] must
not exceed “Threads for Batch Processing” of the specific Process Server.
• The com.informatica.mdm.batchserver.RecyclerThread.max_idling property specifies the idle time for
a Process Server thread. The Process Server recycles the thread when it is idle for more than the
configured value.

44 Chapter 2: Recommendations
Recommendations for the Hub Console Optimization
The Hub Console parameters can be optimized for better performance.

The following table lists the recommendations for the Hub Console optimization:

Parameter Recommended Setting Description

Client Java As specified in the Ensure that the client Java version is as specified in the PAM
Version (JRE) Product Availability Matrix document for the product.
(PAM). It is not required for the Hub Console to pick the latest JRE that the
client box has. The JRE version selected depends on the PATH
variable. To ensure that the Hub Console uses the correct JRE,
temporarily enable the console log as listed in the Informatica
Knowledge Base article.
When enabled, the next launch of Hub Console opens a Java console
window where you can see the used JRE version in the top of the
console.

Console log Disable (Default). Disable the logging by launching the javaws -viewer option on the
client box in the run command.
In the javaws window, go to Java Control Panel in the Advanced tab
and perform the following steps:
- Clear ‘Enable tracing’ and ‘Enable logging’ options in the Debugging
section.
- Check ‘Hide Console’ option in the Java Console section.

High Data See the parameter "High Having huge number of records in the tables listed in the referenced
Volume Volume of Data in section might cause multiple issues on Hub console.
C_REPOS_TABLES with The following lists are the issues that might be include:
historical data" in - batch groups screen loads too slowly.
Database - General - base object record is saved too slowly.
Recommendation Section. - cleanse function is saved too slowly.

Network Good network connection. Hub Console communicates with the Hub Server (application server)
latency frequently. Therefore, the Hub Console must process a considerable
amount of data to and from the Hub Server. Have good network
connection between the client box and the application server.

For more information, search the Informatica Knowledge Base for article numbers 310913 and 139923.

Recommendations for the Hub Console Optimization 45


Recommendations for Data Director and SIF
Optimization
The following table lists the recommendations for Data Director and Services Integration Framework (SIF)
optimization:

Parameter Recommended Setting Description

Cache: Default is 300000 (5 lookupCacheUpdatePeriod in


Lookup Cache minutes). CMX_SYSTEM.C_REPOS_DS_PREF_DETAIL
Update Period The amount of time the lookup data is cached in the server. Change it to
a higher value if your lookup data is not going to change in real time
frequently. Keep a higher value as it would improve the performance as
the cache refresh is reduced.

Cache: Default is 600000 (10 samCacheUpdatePeriod in CMX_SYSTEM.C_REPOS_DS_PREF_DETAIL


SAM Cache minutes). The amount of time the data about Security Access Manager (SAM)
Update Period roles (no users) is cached in the server. Change it to a higher value if
your role data is not going to change in real time very frequently.
Keeping a higher value improves the performance, as the cache refresh
is reduced.

Cache: Default is 5000 (5 threadSchedulerIdleTime in CMX_SYSTEM.C_REPOS_DS_PREF_DETAIL


Cache Refresh seconds). Time interval of the daemon thread that checks and triggers the Lookup
Daemon cache and the Security Access Manager cache operations based on
Thread their update period. That is, every 5 seconds, the daemon thread would
Monitor (both check and trigger the lookup cache and Security Access Manager cache
Lookup and if their update period is reached.
SAM Cache) This period can be increased to improve performance, if it is not
necessary to check the SAM and Lookup Cache time periods every 5
seconds.

Search Default is 100. serverPageSize in CMX_SYSTEM.C_REPOS_DS_PREF_DETAIL


Operation: Number of records to be retrieved from the Hub Server when a search
Maximum operation is performed. Note that even though Hub Server shows 10
number of records per page, internally it loads these many numbers of records
records (given by this parameter). And the Hub Server caches the remaining
retrieved at a records that are not shown in the view.
time Note: This parameter is used in Search Results and in functional areas
such as potential matches, hierarchy relationship records, wherever a
list of records is retrieved for the user.
A higher value could make the search results performance better,
provided the Hub Server can handle the load and cache.

Search Default is 5000. maxSearchResultsExportedRows in


Operation: CMX_SYSTEM.C_REPOS_DS_PREF_DETAIL
Maximum Number of rows that are exported when using the export option in the
number of search results page. Do not increase the value unless there is a valid
records during use case. The greater the value, the lower will be the performance of the
search results export operation.
export

46 Chapter 2: Recommendations
Parameter Recommended Setting Description

Search Default is false. case.insensitive.search


Operation: Adds performance This property is found in the cmxserver.properties file.
Case overhead when enabled. If set to true, users can enable the case insensitiveness attribute for
Insensitive individual columns in the base object, to enable case insensitive query
Search search in Data Director. Enabling this flag would incur a new index
created for such columns. As index management has its own
performance overhead, use this flag with caution.
Note: If there is any system column that you want to include with case
insensitive search, create a functional index manually for such columns.
For more information, search the Informatica Knowledge Base for
article numbers 154132 and 154139.

Search Default is true. isEffectiveDateIncluded in CMX_SYSTEM.C_REPOS_DS_PREF_DETAIL


Operation: By default, effective date is automatically populated with current date.
Default This may impact performance for normal queries that do not need
effective date effective date. In such scenarios, you can set this flag to false to
remove the default value.
For more information, search the Informatica Knowledge Base for
article number 347953.

Search As applicable. cmx.server.remove_duplicates_in_search_query_results


Operation: Default is true. Default is true.
Remove This property is found in the cmxserver.properties file.
Duplicates
Flag By default, Data Director filters the duplicate results from the search
query that joins child records. However, this flag if enabled, can affect
the performance depending on the time it takes to de-duplicate the
search results. If duplicates are not an issue in the search result, you
can disable this flag.
For more information, search the Informatica Knowledge Base for
article number 122829.

Data View: Default is false. asyncChildLoading in CMX_SYSTEM.C_REPOS_DS_PREF_DETAIL


Asynchronous Adds performance If true, all child records are opened (expanded) automatically when you
Child Load overhead when enabled. open a parent subject area.

Hierarchy: Default is 10000. sif.api.hm.flyover.max.record.count


Maximum This property is found in the cmxserver.properties file.
records in This number denotes the maximum number of relationships that are to
Relationship be shown in Relationship Flyover in Data Director. Internally, the best
Flyover version of the truth (BVT) calculation is done for all such relationships.
Therefore, having a high number here would impact the performance of
relationship flyover.
For more information, search the Informatica Knowledge Base for
article number 157628.

Hierarchy: Default is false. hmInactiveRelationshipsAvailable in


Inactive CMX_SYSTEM.C_REPOS_DS_PREF_DETAIL
relationship If true, Hierarchy Manager in the Data Director would show all the
records inactive relationship records also. Do not keep this value as true unless
there is a valid use case to do so.

Recommendations for Data Director and SIF Optimization 47


Parameter Recommended Setting Description

Task Context: Same as the number of maxParallelPromoteThreads in


Parallel Hub Server Cores. CMX_SYSTEM.C_REPOS_DS_PREF_DETAIL
Promote Default is 1.
Threads Determines the maximum number of parallel threads when a task is
approved (Promote API).
This is helpful when you promote parent, child, and grandchild subject
areas together.

Task Context: Same as the number of maxParallelBvtThreads in CMX_SYSTEM.C_REPOS_DS_PREF_DETAIL


Parallel BVT Hub Server Cores. Default is 1.
Threads Determines the maximum number of parallel threads when a task is
viewed (PreviewBVT API).

Task Context: Default is 25. sip.task.maximum.assignment


Maximum This property is found in the cmxserver.properties file.
tasks Maximum number of tasks assigned to each user when automatic task
assigned to a assignment is enabled. Keeping this a high number would impact the
user performance of task context screens in Data Director.
For more information, search the Informatica Knowledge Base for
article number 134833.

Bulk Import: Same as the number of maxImportThreads in CMX_SYSTEM.C_REPOS_DS_PREF_DETAIL


Bulk Import Hub Server Cores. Default is 5.
Threads Maximum number of threads to be used while using the Bulk Import
module.

Bulk Import: As applicable. PUT user exits are called for each and every record individually.
User Exit Having a complex logic in the PUT user exit would deteriorate the
performance. Use user exit with caution.

Bulk Import: As applicable. More number of children, foreign key relationships, and lookup columns
Children, in the data import would impact the performance.
Foreign Keys,
and Lookups

48 Chapter 2: Recommendations
Parameter Recommended Setting Description

Search temptableTimeToLive This property is found in the cmxserver.properties file.


Operation and default is 20. Applicable for SearchQuery, SearchHMQuery, GetOneHop, and
HM Hierarchy: MaxRowCount default is GetEntityGraph SIF APIs (also the relevant Data Director modules) when
BVT 5000. effective dates are specified.
Optimization searchQuery.buildBvtTemp.MaxRowCount: Specifies the maximum
Limit number of records to be used in the BVT calculation when processing a
search operation with effective date filters.
sif.search.result.query.temptableTimeToLive.seconds:
Specifies the number of seconds, the temporary table of the specific
API must reside in the server. That is, the earlier mentioned APIs would
store the full results in its own temporary tables which are cleaned-up
automatically based on this timeout. Pagination would work based on
the data from this temporary table instead of re-querying the API
repeatedly from the core tables. For search operations involving larger
number of records, it is recommended to increase this timeout value.
Example:
If a hierarchy entity can have more than 10,000 related records, set the
values as shown:
sif.search.result.query.temptableTimeToLive.seconds=360
0 searchQuery.buildBvtTemp.MaxRowCount =100000
For more information, see the Multidomain MDM Configuration Guide
and the Multidomain MDM Services Integration Framework Guide.

Traffic Default is TRUE. cmx.bdd.server.traffic.compression_enabled


compression This property is found in the cmxserver.properties file.
Specifies if Data Director server traffic compression is enabled.
For more information, see the Multidomain MDM Configuration Guide.

SearchMatch Default is 1. cmx.server.match.searcher_thread_count


API: Available in cmxcleanse.properties.
Thread count Configures the thread count for the SearchMatch API.
Optimal value can be decided based on your environment. You may
need to test with an incremental value of thread count until you see an
optimal performance.

Recommendations for Environment Validation Tools


and Utilities
You can verify the current environment and identify the area of improvement by using different tools and
utilities.

Recommendations for Environment Validation Tools and Utilities 49


The following table lists the recommendations related to the database:

Parameter Recommended Description


Setting

Test IO (utility) Available from Test the datafile input/output (I/O) capability of the data files of the
applicable for Informatica Support. database tablespaces by referring to the Informatica Knowledge Base
Oracle article number 506035. The test result must meet or exceed the
standards documented for 'Good Performance' seen in the Informatica
Knowledge Base article number 506035.

Ping/TraceRt Response Latency Run a ping or tracert (Trace Route) to MDM Database (CMX_SYSTEM)
must be less than 10 and individual ORS from application server box.
milliseconds.

Test specific Reasonable MB/ Follow the instructions in the Informatica Knowledge Base article
disk partition IO second. number 139805 to test all disk partitions involved in the MDM Hub,
including partitions where tablespaces reside and where the database
debug log is written. If greater than 10 Mb/second, that specific
partition needs to be fixed.

The following table lists the recommendations for the Hub Server and the Process Server:

Parameter Recommended Setting Description

Ping/TraceRt Response Latency must be Run a ping or tracert (Trace Route) to the MDM Hub Server or the
less than 10 milliseconds. Process Server from client box or the Hub Server box. Repeat this
step for the following boxes wherever applicable:
- All nodes in a cluster
- Load balancer
- Web Server

50 Chapter 2: Recommendations
Appendix A

Glossary
_PKQ sequence
Sequence that is used to populate the ROWID_OBJECT of base object record. For example, the C_PARTY BO
uses C_PARTY_PKQ sequence is used to populate the party records.

<INFAHOME>
Physical location where the Hub Server and the Process Server are installed.

heap size
Amount of memory allocated to Java processes which are created on the same JVM.

Hub Server
The server that manages core and common services for the MDM Hub.

master database
Database instance that stores metadata to manage individual domain schemas called ORS schemas. The
database instance is unique to each the MDM Hub environment.

MaxPermGen
A JVM parameter that indicates size of the maximum memory where class metadata information is loaded.

ORS
Operational Reference Store. A database instance where you store domain data.

PermGen
A JVM parameter that specifies the size of initial memory where class metadata information is loaded.

Process Server
The server that cleanses and matches data and performs batch jobs such as load, recalculate BVT, and
revalidate.

response latency
The time duration between request and response.

Tracert
Network diagnostic tool that displays the route to a particular destination with the transit delay information.
user profile
An internal object of the MDM Hub that stores the user details including authentication and associated roles.

Xms
A JVM parameter that specifies the initial Java heap size.

Xmx
A JVM parameter that specifies the maximum Java heap size.

XREF
Cross reference. Data that relates the base object data with the relevant source information.

Xss
A JVM parameter that specifies the memory assigned for stacking the threads created within the Java
process.

52 Glossary
Index

A MDM Hub recommendations (continued)


JBoss Cache See Infinispan
acronyms 7 metadata cache See Infinispan
ORS configuration 25
ORS specific SIF API generation 25

G
glossary 51 O
Oracle database recommendations

I INIT.ORA recommendations 17
RAC recommendations 15
IBM Db2 recommendations
database file configuration parameters 21
R
J recommendations
batch job optimization 36
Java recommendations Data Director and SIF optimization 46
database connection pool 10 database 12
JVM settings 10 environment validation tools and utilities 49
Hub Console optimization 45
IBM Db2 21

M Java 10
MDM Hub 25
MDM Hub recommendations Microsoft SQL Server 20
Hub Server properties 25 Oracle database 15
Infinispan 25

53

You might also like