MDM 104 PerformanceTuningGuide en
MDM 104 PerformanceTuningGuide en
10.4
This software and documentation are provided only under a separate license agreement containing restrictions on use and disclosure. No part of this document may be
reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica LLC.
U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial
computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such,
the use, duplication, disclosure, modification, and adaptation is subject to the restrictions and license terms set forth in the applicable Government contract, and, to the
extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License.
Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the United States and many jurisdictions throughout the world. A
current list of Informatica trademarks is available on the web at https://www.informatica.com/trademarks.html. Other company and product names may be trade
names or trademarks of their respective owners.
Portions of this software and/or documentation are subject to copyright held by third parties. Required third party notices are included with the product.
The information in this documentation is subject to change without notice. If you find any problems in this documentation, report them to us at
[email protected].
Informatica products are warranted according to the terms and conditions of the agreements under which they are provided. INFORMATICA PROVIDES THE
INFORMATION IN THIS DOCUMENT "AS IS" WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT.
Chapter 2: Recommendations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Recommendations for Java. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
General Recommendations for Database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Recommendations for Oracle Database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
INIT.ORA Recommendations for Oracle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Recommendations for Microsoft SQL Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Recommendations for IBM Db2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Recommendations for the MDM Hub. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Recommendations for Batch Job Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Multithreaded Batch Job – Process Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Recommendations for the Hub Console Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Recommendations for Data Director and SIF Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Recommendations for Environment Validation Tools and Utilities. . . . . . . . . . . . . . . . . . . . . . . 49
Appendix A: Glossary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Table of Contents 3
Preface
®
See the Informatica Multidomain MDM Performance Tuning Guide to learn how to optimize the overall
performance of Multidomain MDM within the database and the application server environments.
Informatica Resources
Informatica provides you with a range of product resources through the Informatica Network and other online
portals. Use the resources to get the most from your Informatica products and solutions and to learn from
other Informatica users and subject matter experts.
Informatica Network
The Informatica Network is the gateway to many resources, including the Informatica Knowledge Base and
Informatica Global Customer Support. To enter the Informatica Network, visit
https://network.informatica.com.
To search the Knowledge Base, visit https://search.informatica.com. If you have questions, comments, or
ideas about the Knowledge Base, contact the Informatica Knowledge Base team at
[email protected].
Informatica Documentation
Use the Informatica Documentation Portal to explore an extensive library of documentation for current and
recent product releases. To explore the Documentation Portal, visit https://docs.informatica.com.
If you have questions, comments, or ideas about the product documentation, contact the Informatica
Documentation team at [email protected].
4
Informatica Product Availability Matrices
Product Availability Matrices (PAMs) indicate the versions of the operating systems, databases, and types of
data sources and targets that a product release supports. You can browse the Informatica PAMs at
https://network.informatica.com/community/informatica-network/product-availability-matrices.
Informatica Velocity
Informatica Velocity is a collection of tips and best practices developed by Informatica Professional Services
and based on real-world experiences from hundreds of data management projects. Informatica Velocity
represents the collective knowledge of Informatica consultants who work with organizations around the
world to plan, develop, deploy, and maintain successful data management solutions.
You can find Informatica Velocity resources at http://velocity.informatica.com. If you have questions,
comments, or ideas about Informatica Velocity, contact Informatica Professional Services at
[email protected].
Informatica Marketplace
The Informatica Marketplace is a forum where you can find solutions that extend and enhance your
Informatica implementations. Leverage any of the hundreds of solutions from Informatica developers and
partners on the Marketplace to improve your productivity and speed up time to implementation on your
projects. You can find the Informatica Marketplace at https://marketplace.informatica.com.
To find your local Informatica Global Customer Support telephone number, visit the Informatica website at
the following link:
https://www.informatica.com/services-and-training/customer-success-services/contact-us.html.
To find online support resources on the Informatica Network, visit https://network.informatica.com and
select the eSupport option.
Preface 5
Chapter 1
Introduction to Performance
Tuning
This chapter includes the following topics:
• Overview, 6
• Factors that Influence the Performance of the MDM Hub, 6
• Acronyms, 7
Overview
You can use the performance tuning recommendations to configure specific parameters that optimize the
performance of the MDM Hub. Experiment with these parameters to arrive at optimum values. You can also
achieve a baseline performance by using these recommendations.
Components Description
Java Parameters that you can fine-tune in the Java layer. These parameters are applicable
for the MDM Hub and for the Process Servers, including clustered environments.
General Database Parameters such as sizing and storage that you can fine-tune in the database layer.
Recommendations
Oracle Parameters that you can fine-tune in the Oracle database, including the RAC
environment. These parameters are applicable for both the MDM Hub Master
Database schema and any Operational Reference Store (ORS) schema.
6
Components Description
Microsoft SQL Server Parameters that you can fine-tune in the Microsoft SQL Server. These parameters are
applicable for both the MDM Hub Master Database schema and any ORS schema.
IBM Db2 Parameters that you can fine-tune in the IBM Db2 database. These parameters are
applicable for both the MDM Hub Master Database schema and any ORS schema.
The MDM Hub Specific Parameters that you can configure in the MDM Hub. These parameters are applicable
Configuration for both the MDM Hub Master Database settings and any ORS specific settings.
Batch Job Optimization Parameters that you can configure for the better performance of the batch jobs.
Hub Console Optimization Parameters that you can configure for the better performance of the Hub Console.
Informatica Data Director and Parameters that you can configure for the better performance of Informatica Data
Services Integration Director (IDD) and Services Integration Framework (SIF).
Framework Optimization
Environment Validation Tools List of tools and utilities that you can use to verify the current environment to identify
and Utilities the area of improvement.
Acronyms
The following table lists the different acronyms used in this guide:
Term Definition
BO Base Object
GC Garbage Collection
HM Hierarchy Manager
Acronyms 7
Term Definition
Recommendations
This chapter includes the following topics:
• Overview, 9
• Recommendations for Java, 10
• General Recommendations for Database, 12
• Recommendations for Oracle Database, 15
• Recommendations for Microsoft SQL Server, 20
• Recommendations for IBM Db2, 21
• Recommendations for the MDM Hub, 25
• Recommendations for Batch Job Optimization, 36
• Recommendations for the Hub Console Optimization, 45
• Recommendations for Data Director and SIF Optimization, 46
• Recommendations for Environment Validation Tools and Utilities, 49
Overview
The recommendations are based on the regular volumes of data with standard hardware. For larger systems,
you can adjust the settings accordingly. You can configure parameters related to Java, Oracle database,
Microsoft SQL server, IBM Db2, and the MDM Hub to optimize the performance of the MDM Hub.
9
Recommendations for Java
You can configure the parameters related to the JVM settings and database connection pool. You can
change or fine-tune the Java parameters to improve the MDM Hub performance.
The following table lists the recommendations for the JVM settings:
Code Cache -XX:ReservedCodeCacheSize=256m Maximum size limit for the code cache. 352050
(Oracle JVM)
-XX:codecachetotal=256m (IBM
JVM)
Stack Size -Xss512k Use the platform default. Adjust if 144477 and
(64-bit) required. 207128
Stack Size is the memory used by Java
for each thread it spans. This memory is
outside the heap.
If the memory is too low, the server might
fail with 'StackOverFlow' error.
If the memory is too high, the server
might fail with 'Out of memory' error.
If you need to fine-tune the value, analyze
the heap dump to arrive at an optimal
value. For a 32-bit processor, fine-tune
the value to get additional memory for
the heap to avoid an "Out of memory"
error.
Note: If you use Informatica Address
Verification, then use –Xss2048k.
10 Chapter 2: Recommendations
Parameter Recommended Setting Description Informatica
Knowledge
Base article
number
Garbage Use the default policy, as chosen Complete analysis needs to be done to 144944
Collection automatically by the JVM. determine the garbage collection (GC)
Policy policy, if you decide to change it.
Garbage collectio policies include
Parallel GC and Concurrent Sweep.
For example, while Parallel GC in heavy
real time usage can give longer pauses,
Concurrent Sweep does not do so. You
can use a combination of the policies.
See the Java Guide to set the exact
parameters.
ORS Database Max Connection: (N+T) × On average, each thread takes 2.5 121471
Connection 2.5, where: connections. Therefore, multiply the
Pool - N is the number of concurrent number of concurrent threads by 2.5.
IDD and SIF API users. Test Connection on Lease: Disable this
- T is the number of concurrent property to avoid additional database
threads in Batch. cost incurred.
Min Connection: 0 Min Connection: In some instances,
Test Connection on Lease: Disable application servers might have some
Statement Caching: 10 connection leaks with respect to
rollback transaction where a
connection caught in a rollback is not
released to the pool. If such instances
are found, you can set Min Connection
to 0.
Statement Caching (WebLogic):
Initially, set 10 as the minimal number.
The following table lists the recommendations related to the database environment:
Environment Sharing Always use non-shared The MDM Hub Oracle database instances (both the MDM Hub
environment: Master Database and ORS) need not be shared with other MDM
The Production ORS must Hub installations and must not be shared with other
have exclusive use of the applications.
Oracle instance. Note: Each additional level of sharing compromises the best
performance possible on a particular hardware.
The Production Oracle
instance must have
exclusive use of the host
machine.
Connectivity: The fastest connectivity Connection latency might have a major performance impact.
Application Server to possible. Enable faster connectivity to the data store by using fiber optic
Hub Store databases connections.
12 Chapter 2: Recommendations
Parameter Recommended Setting Description
Connectivity: The fastest connectivity Connection latency might have a major performance impact.
Database server to possible. Enable faster connectivity to the data by using fiber optic
Data File storage connections.
Have a dedicated point to point connection to avoid network
contention.
Database Server Optimal database server You need proper analysis to determine the database server
Sizing sizing. sizing. Different factors including current data volume, rate of
data growth, future data volume, SIF calls, and batch volume
must be considered for sizing.
For assistance on server sizing, contact the Informatica
Professional Services team.
MDM Hub Master Absolute host name or IP To avoid caching issues in multi-node or cluster environments,
Database address use the absolute host name or IP address in place of the default
(cmx_system) host localhost. The host name is configured during the MDM Hub
name in cluster installation.
environments You can update the host name in the DATABASE_HOST column of
the C_REPOS_DATABASE table.
The following table lists the recommendations for the virtual image environment:
Data File Storage Use a physical drive instead Use a physical drive to store the data files instead of a
of a virtualized drive. virtualized drive within the image. Use a physical drive to avoid
I/O contention because of a virtualized drive and the latency
caused by the introduction of another layer with no actual
benefit.
Hardware and As good as the equivalent Must be as good as the equivalent physical machine and must
Software physical or standalone meet all of the PAM or sizing requirements.
Specification instance.
CPU Cores 100% allocated to virtual The physical machine, where the virtual image is hosted, must
image. allocate 100% of the CPU cores to the virtual image. Sharing is
not recommended.
Custom and Backup Do not name the custom tables or backup During HUB server restart and Met
tables created with tables with names starting with C_REPOS%. Migration, the performance of the Hub
names starting with Console degrades if the schema has large
C_REPOS% volume tables with names that start with
C_REPOS%.
Ensure that backup tables are not created
with names starting with 'C_REPOS%'.
Fragmentation Minimize likelihood of fragmentation. Maintain the Oracle schema to ensure that
fragmentation is kept to a minimum.
Monitor and de-fragment whenever the
degree of fragmentation has an impact on
the MDM Hub performance.
High volume of data Perform regular maintenance on the If the number of records in the metadata
in C_REPOS_TABLES following METADATA tables: tables is too high, it might cause issues
with historical data - C_REPOS_AUDIT such as slow startup, out of memory errors,
- C_REPOS_MQ_DATA_CHANGE and performance issues.
- C_REPOS_JOB_CONTROL It is recommended to back up and truncate
- C_REPOS_JOB_METRIC or reduce the data volume on the metadata
- C_REPOS_MET_VALID_RESULT tables.
- C_REPOS_MET_VALID_MSG
- C_REPOS_TASK_ASSIGNMENT_HIST If you enable raw retention on any base
object then you can purge C_REPOS_JOB_*
tables for any date beyond the maximum
raw retention period.
For more information, search the
Informatica Knowledge Base for article
number 141201.
14 Chapter 2: Recommendations
Recommendations for Oracle Database
You can configure the parameters related to the Oracle database environment, tablespace, Oracle table
statistics, RAC recommendations, and Oracle flashback.
The following table lists the recommendations related to the Oracle database environment:
C_REPOS_APPLIED_LOCK Enable caching for this table, The application uses this table frequently so you
C_REPOS_APPLIED_LOCK can cache this table to improve performance.
RMAN backups Suppress RMAN backups from RMAN backups are good for a fast backup and
running during batch processing. restore. However, performance is decidedly lower
when the RMAN backup is performed.
Archive Logging Turn off archive logging during the Archive logging is unnecessary during the initial
initial data load. For steady-state data load and adds an overhead. If there is a
operation, you can turn on archive failure during the initial data load, Oracle rolls
logging after the initial data load back the entire transaction (the current batch
ends. cycle of a batch job). The process can be re-run
with no data loss.
During the initial data load, back up the ORS
schema that loads at regular checkpoints with no
jobs running. For example, back up the ORS
schema after major long running jobs have
completed. At an absolute minimum, take
backups after completion of each phase: Stage,
Load, Match, and Merge. You require backups to
safeguard the work already done before you
proceed.
You might enable archive logging for all steady-
state operations (post initial data load).
If you use a Standby database (database
mirroring), disable the standby before doing the
initial data load. When the initial data load is
complete, copy the database to the standby site,
and then enable the standby.
Tablespace Use the following recommended Applicable for all tablespaces involved in the MDM Hub including
settings for tablespace: CMX_DATA, CMX_INDX, and CMX_TEMP.
- Locally Managed. Default Tablespace Block Size of 8 KB is good for high API
- Uniform Extent. workload implementations. Use 16 KB as a balanced block size to
- Auto Segment Management. support a mix of API and batch processing (small transactions and
- Default Tablespace Block bulk read and write transactions).
Size of 8 KB or 16 KB.
If you use a Default Tablespace Block Size of 16 KB and the
database was created with default block size of 8 KB, then
configure an appropriate DB_CACHE for it.
Tablespace Use the following Oracle Many small disks perform better than a few large disks if
Storage recommendations for storage of everything else remains equal.
tablespace data files: Use the following steps to improve the performance:
- If you use RAID: use either - RAID1+0 (RAID 10) has a high degree of fault tolerance with
RAID 1+0 or RAID 0. mirroring. Use as many disks as possible. Disks must be as fast
- If you do not use a RAID as possible.
controller: each tablespace - Avoid RAID 5 due to a writing overhead and poor performance if
must comprise multiple data there is a disk failure.
files spread across different - If there is no RAID controller: use multiple single disks to split
disks. several data files over more disks. Do not dedicate data files for
one tablespace to a single disk. Use the fastest disks possible
and stripe the disks. In such cases, each tablespace uses part of
each disk instead of a tablespace being dependent on a single
disk. Similarly, it is better to keep redo and undo logs in a
different physical disk.
The following table lists the recommendations for the Oracle table statistics:
Table Analyze the ORS schema on a regular (frequent) Analyze the entire ORS schema on a regular basis
Statistics basis. as a best practice.
Use the following options: Analyze individual tables whenever 10% of the
- Analyze the full schema (perform outside data is changed by using data sampling. This is
business hours). the Oracle recommendation. See Oracle database
- Analyze individual tables whenever 10% of the documentation for details.
data is changed. With sampling there is a trade-off. The best
- Perform unplanned table analysis. execution plans are always chosen when the
- Set the DBMS_STATS .SET_GLOBAL_PREFS statistics represent the entire table. When you
(see Environment Sharing in general database use sampling, the execution plan is as
recommendations). appropriate to the tables as the data sample. For
- With no sharing: example, you can use 10% of the table as
DBMS_STATS.SET_GLOBAL_PREFS( 'DEGREE representative of the table as a whole. If you use
', DBMS_STATS.AUTO_DEGREE); a sample, be sure to use a large enough sample.
- If forced to share: A sample of 1 to 2% is too small unless the table
DBMS_STATS.SET_GLOBAL_PREFS( 'DEGREE is large. A sample of 10% is more representative
', <(number of CPUs on DB Server) on smaller tables. You can switch off sampling
minus 1> ); and perform a full analyze, which provides the
most appropriate execution plans. However, a full
analyze is achieved at the cost of far higher
overhead in terms of the time taken when running
the analyze. It is best to run outside of business
hours to mitigate impact.
16 Chapter 2: Recommendations
The following table lists the recommendations for RAC:
_PKQ Sequence Use NOORDER . Informatica recommends to increase initial data load
Set sequence cache to 20000. performance. These sequences are used to populate
the ROWID_OBJECT in the base object tables.
For more information, search the Informatica
Knowledge Base for article number 115788.
The following table lists the recommendations for the Oracle flashback:
Flash recovery Fast file system. Use a fast file system for your flash recovery area, preferably without
area operating system file caching.
Disk spindles As needed. Configure enough disk spindles for the file system to hold the flash
recovery area.
Striped storage Smaller stripe size. If flash recovery area does not have non-volatile RAM, opt for striped
volume storage volume with smaller stripe size such as 128k.
This will allow each write to the flashback logs to be spread across
multiple spindles, improving performance.
For more information, search the Informatica Knowledge Base for article number 333718.
The following table lists the basic initialization parameters for the Oracle database:
18 Chapter 2: Recommendations
INIT.ORA Parameter Applicable to version Value Description
pga_aggregate_targ 11g 0 in case of Automatic Set PGA explicitly for MMM. For
et Memory Management AMM, you need not set PGA.
(AMM) or One third of
memory allocated to
Oracle in case of Manual
Memory Management
(MMM).
For more information about the INIT.ORA parameters, see Informatica knowledge base article 90408.
The following table lists the recommendations related to the Microsoft SQL Server environment:
20 Chapter 2: Recommendations
Recommendations for IBM Db2
You can configure the parameters related to the IBM Db2 environment, registry variables, and database file
configuration parameters.
The following table lists the recommendations related to the IBM Db2 environment:
Physical Disk Different physical To reduce the amount of blocked input/output, you can increase input/
Drives drives for different output parallelism. Achieve input/output parallelism by storing user data
(Tablespace) tablespaces. tablespaces, temporary tablespaces, and transaction logs on different
physical disk drives. Batch operations can access all the paths in parallel,
which increases the throughput by reducing the input wait times and output
wait times.
Physical Disk Different physical If more physical disk drives are available, you can increase input/output
Drives drives for different parallelism by extending parallelism to the container level. To increase
(Container) containers. input/output parallelism, place all containers for a tablespace on different
physical disks. The IBM Db2 prefetchers and input/output cleaners access
these containers in parallel without blocking each other, thereby increasing
the throughput.
Processing When you process a large data set, use the following command to rebind
Large Dataset packages:
db2 bind @db2cli.lst blocking all grant public sqlerror
continue CLIPKG 10
Perform the step every time you process a large data set. It is better to do
so after the initial data load.
Reorganize Routinely determine and re-organize the match key tables (C_<Base
Match Tables Object>_STRP).
To improve the performance of the SearchMatch API, you need to
reorganize match key tables based on their primary key column, SSA_KEY.
To determine if a match key table needs reorganization, perform a
reorganization check and analyze the results. The cluster ratio of the
primary key index appears in the CLUSTERRATIO column, F4, of the
reorganization check result. The cluster ratio must be close to 100% for
optimal performance.
Determine when to reorganize the match key table by noting the cluster
ratio at which you observe degradation in the SearchMatch API
performance. Use the IBM DB2 REORGCHK and REORG commands to
reorganize tables.
Update the match key table statistics so that the IBM Db2 optimizer can use
the table layout that the reorganization generates.
22 Chapter 2: Recommendations
The following table lists the recommendations for the database file configuration parameters:
LOCKLIST AUTOMATIC Allocates the amount of storage to the lock list of a database.
Multiple MDM Hub processes use "locks at the row-level" to complete
tasks and to support concurrency. The number of locks that IBM Db2
needs to acquire depends on the number of rows to process. If the
incoming volume differs greatly in size, set the parameter to AUTOMATIC
to allow the database manager to determine the appropriate value. If you
conservatively tune the LOCKLIST parameter value, lock escalations can
occur, and some of the MDM Hub operations can fail due to lock timeouts.
MAXLOCKS AUTOMATIC Configures the percentage of the lock list that one application can use.
Most MDM Hub processes run under the scope of a single application.
Such single applications can acquire many row-level locks, consuming
most of the available lock list. If the incoming volume differs greatly in
size, predicting the MAXLOCKS parameter value is difficult. Set the
parameter to AUTOMATIC to allow the database manger to determine the
appropriate value. If you conservatively tune the MAXLOCKS parameter
value, lock escalations can occur, and some of the MDM Hub operations
can fail due to lock timeouts.
CATALOGCACHE_SZ 25000 or higher Configures the maximum memory that the catalog cache can use from the
shared memory of the database.
IBM Db2 stores system catalog information in the catalog cache. The MDM
Hub comprises many dynamic SQL queries that reference multiple
metadata objects. If the catalog cache is large, IBM Db2 can retain
information for some of the metadata objects from the system catalogs in
the memory. If subsequent dynamic SQL queries require the same
metadata objects, the compilation is quick. The MDM Hub comprises many
frequently accessed metadata objects. Therefore, you must set the
CATALOGCACHE_SZ parameter value to 25000 or higher.
LOGBUFSZ 4096 or higher Configures the amount of the database heap to use as a buffer for log
records before writing the records to disk.
The MDM Hub creates logs for most of its operations in the IBM Db2
transaction logs. IBM Db2 buffers the log records in the log buffer before
writing them to the disk.
If the log buffer is large, IBM Db2 writes the log records to the disk less
frequently. This makes disk input/output for log records more efficient.
The default value for the database configuration parameter is not
sufficient for an average MDM Hub environment. Set the parameter to
4096 pages or higher.
LOGFILSIZ 128000 or Configures the number of log records written to the log files.
higher A single MDM Hub transaction can contain many DML queries resulting in
many log records that might span across many log files. A large log file
size avoids the need to create new log files frequently. If IBM Db2 creates
new log files frequently, it adversely influences the performance of input/
output bound systems.
The total log space for a database is equal to the total number of primary
and secondary log files multiplied by the log file size. The database must
have adequate log space to ensure that the MDM Hub transactions do not
run out of log space and fail. If the MDM Hub transactions fail, the
database needs more time to roll back the transactions. Set LOGFILSIZ to
128000 or higher to ensure that the MDM Hub transactions do not fail and
need a roll back. Also, you must consider the number of primary and
secondary logs for a complete equation of log space.
LOGPRIMARY 100 Configures the number of primary log files to be pre-allocated. IBM Db2
creates primary log files when you activate the database. If the
uncommitted transaction exhausts the primary log space, IBM Db2 creates
secondary log files as needed. Set the LOGPRIMARY parameter value to
100 to efficiently handle the MDM Hub processes. Secondary log files act
as a backup in cases where long transactions can exhaust the entire
primary log space.
LOGSECOND 100 Configures the number of secondary log files that IBM Db2 can create and
use for recovery log files.
Log file creation can adversely impact the performance based on the size
of the log file. If you allocate sufficient primary log space, transaction
performance increases because the database does not create secondary
log files frequently. Set the LOGSECOND parameter value to 100 to cover
unexpected long transactions due to large incoming volumes, especially
during batch processes. The sum of the values of LOGPRIMARY and
LOGSECOND must be 200.
PCKCACHESZ 128000 Configures the package cache size which is allocated out of the database
shared memory.
The MDM Hub has many dynamic SQL statements. Each dynamic SQL
statement has a compiled package associated with it. IBM Db2 caches
these packages in the package cache memory. You must configure an
appropriate package cache size to avoid package cache overflows, which
adversely influence performance.
Experiment with the values for the package cache size. Initially, set the
parameter value to 50000 and monitor the different phases of the MDM
Hub processes. If you observe frequent package overflows, tune the
parameter again.
STMHEAP AUTOMATIC Configures the limit of the statement heap, which is used during the
compilation of an SQL statement.
If the statement heap is not sufficient, it might prevent the optimizer from
evaluating all possible access plans for an SQL query. This might result in
a suboptimal plan and adversely influence performance. Set the STMTHEAP
parameter to automatic to allow the optimizer to weigh all possible access
plans for the compilation an SQL query.
24 Chapter 2: Recommendations
Database Recommended Description
Parameters Setting
SHEAPTHRES_SHR AUTOMATIC Configures the limit on the total amount of database shared memory that
the sort memory consumers can use at a time. Set the SHEAPTHRES_SHR
parameter to AUTOMATIC if you set the SORTHEAP parameter to
AUTOMATIC.
UTIL_HEAP_SZ 50000 or higher Configures the maximum amount of memory that the BACKUP, RESTORE,
and LOAD utilities can use simultaneously. During some batch operations,
the MDM Hub uses the IBM Db2 LOAD utility to move data between tables.
The LOAD utility uses the utility heap to complete the data movement
process. The size of the utility heap has an impact on the performance of
the LOAD operation. Set the UTIL_HEAP_SZ parameter to an appropriate
value to provide better throughput for the MDM Hub processes.
Obsolete Data Remove obsolete items. Obsolete Data Director applications and Operational Reference
Director applications Stores schemas impact the performance of server startups,
and Operational run time memory, and Security Access Manager profile
Reference Stores caching.
databases
Order of Configure the security Configuration > Security Providers > Authentication
authentication providers in order with the Providers.
provider in Security first provider being the The MDM Hub authenticates the user based on the order of
Providers provider that the security providers configured. If most of the users are
authenticates the heaviest authenticated by using the custom security provider (if
user load. applicable), it is recommended to move it to the first position.
Note: Each authentication request has a cost of few
milliseconds associated with it. The number of authentication
requests is reduced significantly by using the User Profile
Cache.
Maximum thread 300 or higher For example, in JBoss set the following property in the standalone-
count for the full.xml file:
thread pool
<thread-pools> <thread-pool
name="default"> <max-threads count="300"/>
</thread-pool></thread-pools>
Maximum 300 or higher For example, in JBoss set the following property in the standalone-
connections in full.xml file:
HTTP connection
pool <connector name="http" protocol="HTTP/1.1"
scheme="http" socket-binding="http"
max-connections="300"/>
JDBC logging OFF For example, in JBoss, set the following log level property in the
level standalone-full.xml file:
<subsystem xmlns="urn:jboss:domain:logging:1.2">:
<logger
category="com.microsoft.sqlserver.jdbc"> <level
name="OFF"/></logger>
Transaction Greater than 3600 Set the transaction timeout to at least 3600 seconds (1 hour).
timeout seconds. For example, in JBoss set the following property in the standalone-
full.xml file:
<coordinator-environment
default-timeout="3600"/>
Production Enable this property in Production. [Configuration > Database > Database
Mode Properties]. Enable this property to remove
additional overhead of pre-scheduled
daemons that refresh the metadata cache.
Batch API Enable if both real time and batches are used. [Configuration > Database > Database
Inter- Properties]. Enabling this configuration has
operability an impact on performance. Enable the
configuration if batches and real time API
calls are used or if Data Director is used. If
your application uses neither real time API
updates nor Data Director, do not enable API
Batch Interoperability.
Tip: During Initial Data Load, disable this
property for faster loading of data.
26 Chapter 2: Recommendations
Parameter Recommended Setting Description
Auditing Disable the auditing completely. Auditing introduces additional overhead. You
must disable auditing completely.
Write lock cmx.server.writelock.monitor.interval=10 When more than one Hub Console uses the
monitor same ORS, a write lock on a Hub Server does
Interval not disable caching on the other Hub Servers.
The unit is in seconds.
For more information, see the Multidomain
MDM Configuration Guide.
Schema Design
The following table lists the recommendations for the schema design:
Child Base Avoid too many child base The performance of load, tokenize, and automerge batch jobs
Objects objects for a particular decreases as the number of child base objects for a base object
parent base object. increases.
Match columns Avoid too many match The performance of tokenize and match jobs decreases with the
columns. increase in the number of match columns.
Lookup Enable Lookup Indicator only Schema > [base object] > Advanced > Lookup indicator. Enabling
Indicator for 'Lookup' tables and not lookup indicator for non-lookup base objects unnecessarily caches
for any other base objects the base object data in the memory. Doing so results in out of
unrelated to lookup. memory errors, slow Data Director performance, and slower rate of
lookup cache refresh.
Lookup Display Configure Lookup Display For high volume lookup tables:
Name Name to be the same as the If you set the lookup display name to any column other than the
lookup column. column on which the relationship is built, SIF PUT calls must send
the lookup display name values in the SIF call. When inserting data
into the base object, the lookup value is validated by querying the
lookup table. The order of lookup is predefined: the lookup display
column value comes first followed by the actual column value
second. In high volume lookup tables this becomes an overhead.
History Enable History if you want to If you enable History for a base object, the MDM Hub additionally
retain historical data for the maintains history tables for base objects and for cross-reference
specific base object. tables. The MDM Hub already maintains some system history
Otherwise, disable it. tables to provide detailed change-tracking options, including
merge and unmerge history. The system history tables are always
maintained.
Over a period of time, history in the database keeps growing.
Consider keeping months or at most a few years of history in the
system and to preserve database access performance.
History To avoid very large history For more information, search the Informatica Knowledge Base for
tables that cause article number 306525.
performance issues, you can
partition the tables.
Cross Enable Cross Reference Schema > [base object] > Advanced > Enable History of Cross
Reference Promotion History if you Reference Promotion. Enabling history incurs performance cost
Promotion want to retain historical data both to real time and to batch operations. Use the history option
History for the specific base object. cautiously and if required.
Trust Configure trust only for A higher number of trust columns and validation rules on a single
required columns. base object incur higher overhead during the Load process and the
Merge process.
If the more trusted and validated columns are implemented on a
particular base object:
Longer SQL statements (in terms of lines of code) are generated to
update the _CTL control table and the _VCT validation control
table.
Minimize the number of trust and validation columns to conserve
good performance.
Case Enable Case Insensitive Enabling Case Insensitive Search for non-VARCHAR2 columns
Insensitive Search only for VARCHAR2 hinders performance.
Search columns.
Ensure that you do not
include any column with a
data type other than
VARCHAR in the Search
Query.
Message Avoid configuring multiple Do an in-depth analysis before configuring message triggers.
Trigger Setup message triggers for There is a performance cost associated with them during the
different event types. execution of load jobs.
Tune Message Trigger Query. The best approach to tuning the query used in the Package Views
is to use Explain Plan. Add custom indexes wherever required to
avoid full table scans, and analyze tables/schema on a regular
basis. When you use Explain Plan, retrieve the plan by wrapping the
query around an outer query that contains a "where" clause for a
rowid_object equal to.
For more information about message triggers, search the
Informatica Knowledge Base for article number 142115.
Throughput can be greatly The Message Queue Monitoring settings have a major impact
improved if you increase the related to the throughput message posting time. Configure these
Receive Batch Size and settings from the Hub Console in the Master Reference Manager
reduce the Message Check (MRM) Master Database (CMX_SYSTEM) in the Configuration
Interval. section.
Avoid unnecessary column Do not select "Trigger message if change on any column" if you do
selection in message trigger. not need to monitor all the columns. Also, try to minimize the
selection of columns.
28 Chapter 2: Recommendations
Parameter Recommended Setting Description
'Read Use the following The 'Read Database' cleanse function incurs a performance
Database' recommended settings: overhead compared to using a similar MDM Cleanse Function to
Cleanse - Use with caution. perform the same function. The performance overhead is more
Function - Enable ‘cache’ if used. pronounced on a high volume table. The overhead is caused by the
creation of a new database connection and the corresponding
transmit to, processing by, and receipt of the results from the
database. These would otherwise be managed within the Process
Server application layer.
If use of this function cannot be avoided, if applicable, enable
caching behavior of the Read Database function. Pass a Boolean
'false' value to the 'clear cache' input field of the Read Database
function'. Doing so reduces performance lag by enabling future
operations to use the cached value rather than creating a new
database connection on each access of the function.
Cleanse Do not make it very complex. The performance of batch jobs increases with reduced number and
Functions reduced complexity of cleanse functions.
Timeline The Dynamic Timeline or Versioning must only be enabled on those Entity base objects
(Versioning) Versioning must be enabled (regular MDM Hub base object) which strictly need it to maintain
on an Entity base object the fastest performance possible. With the Versioning
(regular base object, not a functionality, the additional associated metadata and processing
Hierarchy Manager carry a significant amount of complex processing when running
Relationship base object) if any process on a version-enabled base object. Enabling versioning
strictly required. on a base object brings an additional performance cost to all
Versioning has a processing performed on that base object.
performance impact on the For more information, search the Informatica Knowledge Base for
base object associated with article numbers 138458 and 140206.
it.
For Hierarchy Manager
Relationship base objects,
versioning is enabled with no
option to disable.
State Disable state management if State Management carries an associated performance overhead.
Management you do not require it. If you use Data Director with workflows, you must enable State
Management.
However, enabling History for State Management Promotion at the
cross-reference level is optional.
Delta Detection Enable it only on the Delta Detection carries a sizable associated overhead on
minimum number of columns performance.
that strictly need it. If a Landing Table has only new and updated records in every
staging job, you can disable delta detection. If you want to enable
Delta Detection the least impactful approach is to use the
last_update_date. If you need additional columns, for each
additional column you enable, analyze if the involvement of this
additional column is worth the associated performance overhead.
Avoid blindly enabling Delta Detection on all the columns.
Cleanse Minimize the complexity of Minimized the complexity in the mapping to have better
Mappings Mappings. performance.
If you use a Cleanse List in a Cleanse mapping, use static data in
the Cleanse List.
Consider using lookup tables only for dynamic data.
Validation Optimize Validation Rule SQL The detection piece of each Validation Rule SQL runs against every
Rules code. record during the Load process to determine if it applies.
Poorly performing Validation Rule SQL influences performance on
every Loading record.
User Exits Optimize user exit code for User exit code influences performance if not optimized.
performance. Applicable to both Data Director and batch user exits.
Packages Optimize the SQL code These MDM Hub Packages are used in Data Director, SIF API calls,
written in each MDM Queries Data Manager, Merge Manager, and in search operations.
which is called from an MDM If you do not tune the MDM Hub for performance, it results in an
Hub Package. expensive operation whenever it is called.
Custom Use caution when adding Index management has a performance cost associated.
Indexes custom indexes. Each index Perform the following steps to improve the performance cost:
added has an associated 1. Get a log of the real queries run on the base object (content
cost. Ensure that gain data) and base object shadow tables (content metadata) on a
received outweighs the cost typical day. Ignore temporary T$% tables and system C_REPOS_
of each additional custom % tables;
index. 2. Identify indexes which exist on these tables to avoid
unnecessary overlap.
3. Before adding any indexes, review a regular day of logs and take
an inventory of:
a. SIF API call duration.
b. Data Director process durations.
c. Batch jobs:
a. Duration of each batch job.
b. Duration of each cycle within that batch job.
c. Duration of longest running statements within a batch job.
4. Consider the longest running process for potential benefit from
a custom index.
5. Consider adding indexes so the longest running SQL query or
queries hit the new index in their execution plan. Avoid indexing
fields which have many updates or inserts.
After each new custom index added return to Step 3 and assess if
there is still potential to improve performance through adding
more custom indexes.
Parallel Degree Between one and number of Parallel degree is an advanced base object property. For optimum
on Base Object CPU cores on database performance of batch jobs, set a value between one and the
machine. number of processor cores on the database server machine.
For more information, search the Informatica Knowledge Base for
article number 181313.
30 Chapter 2: Recommendations
Match and Merge
The following table lists the recommendations for the match and merge configuration:
Match Path Filter Filter on root path If you need to exclude records from the match process, filter
instead of at the on the root path instead of at the match rule level.
match rule level. When you filter at the root level, it excludes the records from
tokenization and they, therefore, do not participate in the
match.
Check for missing children Use it with This match patch property indicates if parent records must be
caution. considered for matching based on the existence of child
records.
If you need a fuzzy match on a base object, tokenization of a
parent base object record must occur. Tokenization of a
parent base object record occurs if all child base objects that
have the option to check for missing children disabled have a
related child base object record. If a parent base object
record has a child, where the option to check for missing
children is disabled yet contains no record, the parent record
is not tokenized.
The MDM Hub performs an outer join between the parent and
the child tables when the option to check for missing children
is enabled. This option has an impact on the performance on
each match path component on which the option is enabled.
Therefore, when not needed, it is more efficient to disable this
option.
Match Key The tighter the The width of the match key determines the number of rows in
key the better the the tokenization table (number of tokenized records which are
performance. used to match each record to be matched) and the number of
records to be considered for each match candidates. Usually
the standard key width is enough.
Search Level: Use the narrowest possible search level to
generate acceptable matches. Usually the typical search level
is enough.
Match Rules: For each match rule, add one or more exact
match columns to act as a filter to improve the performance
for each rule.
Dynamic Match Analysis Change if Although DMAT helps improve performance, take care when
Threshold (DMAT) required. setting this limit. If you set the level too low it might cause
Default is 0. under matching. It is recommended that clients first analyze
the data to assess why a particular search range contains a
large count. Sometimes the reason might be due to a noise-
word or phrase like "do not send" or a valid-word or phrase
such as "John."
When setting this value, identify any large ranges that are
causing bottlenecks and then use the "Comparison Max
Range" count to set the DMAT.
Proper analysis is required to change this value. See the
Informatica Knowledge Base article 90740 on changing this
value.
For more information, search the Informatica Knowledge Base
for article number 90740.
COMPLETE_STRIP_RATIO Change if [Model > Schema > [base object] Advanced > Complete
required. Tokenize Ratio].
Default is 60%. Proper analysis is required if you decide to change the default
60% value.
If the volume of data change in the _STRP table is more than
this percentage, the tokenization process would drop and re-
create the entire _STRP table rather than (re)tokenizing only
the updated records.
For more information about tuning match and merge, search the Informatica Knowledge Base for article
number 357214.
Protocol Use EJB protocol over EJB Protocol is faster and reliable.
HTTP or SOAP. For more information, search the Informatica Knowledge Base for
article number 138526.
Return Total Do not set any value. SIF request parameter: returnTotal
If you do not require total count, it is better to not set this flag. If set
to true, it would incur two database calls for each SIF call.
32 Chapter 2: Recommendations
Hub Server Properties
The following table lists the recommendations for the Hub Server properties:
Infinispan
The following table lists the recommendations for Infinispan parameters, which are located in the
inifinspanConfig.xml file:
expiration 86400000 Maximum lifespan of a cache entry in milliseconds. When a cache entry
lifespan (milliseconds) exceeds its lifespan, the entry expires within the cluster.
You can increase the lifespan for the following caches:
DISABLE_WHEN_LOCK, DATA_OBJECTS, and REPOS_OBJECTS. For example,
you can increase a lifespan from one hour (3600000) to one day (86400000).
Each cache has its own default value for this parameter. To find the default
values, open the inifinspanConfig.xml file.
Logging
The following table lists the recommendations for the logging:
Hub Server Set to ERROR mode. Change the log4j.xml file to use ERROR mode. If clustered, update the
Logging log4j.xml file in all nodes. For JBoss, use <JBoss node>/conf/jboss-
log4j.xml. For other application servers, update <INFAHOME>/hub/
server/conf/log4j.xml. Once the log4j configuration file is updated,
changes are reflected in the log within a few minutes.
Process Server Set to ERROR mode. Change the log4j.xml file to use ERROR mode. If clustered, update the
Logging log4j.xml file in all nodes. For JBoss, use <node>/conf/jboss-
log4j.xml. For other application servers, update the <INFAHOME>/hub/
cleanse/conf/log4j.xml file. Once the log4j configuration file is
updated, changes are reflected in the log within a few minutes.
For more information about logging, search the Informatica Knowledge Base for article number 120879.
Search
The following table lists the recommendations for search:
Limit Do as needed. Do not index unnecessary searchable fields. Multiple searchable fields
Searchable increase the indexing and searching time, so configure only the required
Fields fields as searchable fields. Also keep only the required fields and facets.
Facets should only be on the fields with low entropy. Also limit the number
of fuzzy fields.
Task Assignment
The following table lists the recommendations for task assignments:
task.creation.batch.size Default is 1000. In MDM 10.0 and earlier, the default value is 50.
Available in cmxserver.properties.
Sets the maximum number of records to process for each
match table.
If more tasks need to be assigned on the run, you can
increase this value.
34 Chapter 2: Recommendations
Operational Reference Store and SIF APIs
The following table lists the recommendations for ORS-specific SIF API generation:
Required objects As needed. ORS specific API generation depends on the number of objects selected. It
is preferable to add only the required objects to gain performance during
the SIF API generation.
For more information, search the Informatica Knowledge Base for article number 153419.
For more information, search the Informatica Knowledge Base for article numbers 158622 and 158822.
36 Chapter 2: Recommendations
The following table lists the different batch job parameters and their recommended settings to achieve a
base-level performance:
Cleanse Start with the Available in “Process Server > Threads for Cleanse Processing”.
Thread Count number of Total number of threads used by the Master or Slave Process Server when
Used in the cores available. executing. Generate Match Tokens after Load, Match, and Stage jobs.
following Based on CPU
batch jobs: utilization,
- Match Job number of
- Generate threads can be
Match increased.
Tokens Default is 1.
process on
Load job
- Stage job
Threads for Specify a value Available in “Process Server > Threads for Batch Processing”.
Batch that is Maximum number of threads to use for a batch process.
Processing equivalent to
four times the For example, if the host machine has 16 CPU cores, set the Threads for Batch
Used in the Processing in the Process Server registration to 64. Applicable only if the
following number of CPU
cores on the Process Server is marked for batch processing.
batch jobs: Note: From the total number of threads available on the Process Server, dedicate
- Automerge system on
which the n threads for Batch jobs by setting a value for the property number of threads for
Job Batch processing.
- Load Job Process Server
- Batch is deployed.
Delete Default is 20.
- Batch
Unmerge
- Batch
Revalidate
38 Chapter 2: Recommendations
Parameter Recommended Description
Setting
Load: Same as See 'Threads for Cleanse Processing' attribute described earlier.
Threads per "Threads for Note that, this thread attribute is different from the core threads per job attribute
job for cleanse of the load job described earlier.
generate processing".
If 'Generate Match Tokens on Load' is not selected, this attribute does not have
tokens, if any impact on the performance of the Load job.
'Generate
Match Tokens
on Load'
attribute is
enabled on the
base object
40 Chapter 2: Recommendations
Parameter Recommended Description
Setting
Stage: See 'Cleanse See 'Cleanse Thread Count' attribute described earlier.
Threads per Thread Count'
job attribute
described
earlier.
Match: See 'Cleanse See 'Cleanse Thread Count' attribute described earlier.
Threads per Thread Count'
job attribute
described
earlier.
Match: Default is 20 Hub Console > Base Object > Max Elapsed Match Minutes.
Match Elapsed (minutes). The execution timeout in minutes when executing a match rule. If this time is
Time reached, the match process will exit. This must be increased only if the match
rule and the data are very complex. Generally rules must be able to complete
within 20 minutes.
Match: Default is Hub Console > Base Object > Match/Merge Setup > Number of rows per match
Match Batch 20000000. job batch cycle.
Size Maximum number of records to be processed by the MDM Hub for matching.
This number would affect the duration of match process.
Also, lower the match batch size, you have to run the match process more times.
Note: When running large Match jobs with large match batch sizes, if there is a
failure of the application server or the database, you must re-run the entire
batch.
42 Chapter 2: Recommendations
Parameter Recommended Description
Setting
• Threads allocated in Slave Process Server #1 + Threads allocated in Slave Process Server #2 + … [All
Slave Process Servers] must be equal to the specific “Threads Per Job” parameter in
cmxcleanse.properties file.
- cmx.server.batch.threads_per_job
• Each Slave Process Server would get number of records as specified in the block size.
The following properties are related to block_size:
- cmx.server.automerge.block_size
- cmx.server.batch.block_size
- cmx.server.batch.recalculate.block_size
- cmx.server.batch.batchunmerge.block_size
- cmx.server.batch.delete.block_size
• After the last block is sent to the next available Slave Process Server, all Slave Process Servers that
process the blocks MUST complete the job within the timeout period.
com.informatica.mdm.loadbalance.ControllerThread.timeout
• Threads allocated for Batch Job 1 + Threads allocated for Batch Job 2 + … [All parallel batch jobs] must
not exceed “Threads for Batch Processing” of the specific Process Server.
• The com.informatica.mdm.batchserver.RecyclerThread.max_idling property specifies the idle time for
a Process Server thread. The Process Server recycles the thread when it is idle for more than the
configured value.
44 Chapter 2: Recommendations
Recommendations for the Hub Console Optimization
The Hub Console parameters can be optimized for better performance.
The following table lists the recommendations for the Hub Console optimization:
Client Java As specified in the Ensure that the client Java version is as specified in the PAM
Version (JRE) Product Availability Matrix document for the product.
(PAM). It is not required for the Hub Console to pick the latest JRE that the
client box has. The JRE version selected depends on the PATH
variable. To ensure that the Hub Console uses the correct JRE,
temporarily enable the console log as listed in the Informatica
Knowledge Base article.
When enabled, the next launch of Hub Console opens a Java console
window where you can see the used JRE version in the top of the
console.
Console log Disable (Default). Disable the logging by launching the javaws -viewer option on the
client box in the run command.
In the javaws window, go to Java Control Panel in the Advanced tab
and perform the following steps:
- Clear ‘Enable tracing’ and ‘Enable logging’ options in the Debugging
section.
- Check ‘Hide Console’ option in the Java Console section.
High Data See the parameter "High Having huge number of records in the tables listed in the referenced
Volume Volume of Data in section might cause multiple issues on Hub console.
C_REPOS_TABLES with The following lists are the issues that might be include:
historical data" in - batch groups screen loads too slowly.
Database - General - base object record is saved too slowly.
Recommendation Section. - cleanse function is saved too slowly.
Network Good network connection. Hub Console communicates with the Hub Server (application server)
latency frequently. Therefore, the Hub Console must process a considerable
amount of data to and from the Hub Server. Have good network
connection between the client box and the application server.
For more information, search the Informatica Knowledge Base for article numbers 310913 and 139923.
46 Chapter 2: Recommendations
Parameter Recommended Setting Description
Bulk Import: As applicable. PUT user exits are called for each and every record individually.
User Exit Having a complex logic in the PUT user exit would deteriorate the
performance. Use user exit with caution.
Bulk Import: As applicable. More number of children, foreign key relationships, and lookup columns
Children, in the data import would impact the performance.
Foreign Keys,
and Lookups
48 Chapter 2: Recommendations
Parameter Recommended Setting Description
Test IO (utility) Available from Test the datafile input/output (I/O) capability of the data files of the
applicable for Informatica Support. database tablespaces by referring to the Informatica Knowledge Base
Oracle article number 506035. The test result must meet or exceed the
standards documented for 'Good Performance' seen in the Informatica
Knowledge Base article number 506035.
Ping/TraceRt Response Latency Run a ping or tracert (Trace Route) to MDM Database (CMX_SYSTEM)
must be less than 10 and individual ORS from application server box.
milliseconds.
Test specific Reasonable MB/ Follow the instructions in the Informatica Knowledge Base article
disk partition IO second. number 139805 to test all disk partitions involved in the MDM Hub,
including partitions where tablespaces reside and where the database
debug log is written. If greater than 10 Mb/second, that specific
partition needs to be fixed.
The following table lists the recommendations for the Hub Server and the Process Server:
Ping/TraceRt Response Latency must be Run a ping or tracert (Trace Route) to the MDM Hub Server or the
less than 10 milliseconds. Process Server from client box or the Hub Server box. Repeat this
step for the following boxes wherever applicable:
- All nodes in a cluster
- Load balancer
- Web Server
50 Chapter 2: Recommendations
Appendix A
Glossary
_PKQ sequence
Sequence that is used to populate the ROWID_OBJECT of base object record. For example, the C_PARTY BO
uses C_PARTY_PKQ sequence is used to populate the party records.
<INFAHOME>
Physical location where the Hub Server and the Process Server are installed.
heap size
Amount of memory allocated to Java processes which are created on the same JVM.
Hub Server
The server that manages core and common services for the MDM Hub.
master database
Database instance that stores metadata to manage individual domain schemas called ORS schemas. The
database instance is unique to each the MDM Hub environment.
MaxPermGen
A JVM parameter that indicates size of the maximum memory where class metadata information is loaded.
ORS
Operational Reference Store. A database instance where you store domain data.
PermGen
A JVM parameter that specifies the size of initial memory where class metadata information is loaded.
Process Server
The server that cleanses and matches data and performs batch jobs such as load, recalculate BVT, and
revalidate.
response latency
The time duration between request and response.
Tracert
Network diagnostic tool that displays the route to a particular destination with the transit delay information.
user profile
An internal object of the MDM Hub that stores the user details including authentication and associated roles.
Xms
A JVM parameter that specifies the initial Java heap size.
Xmx
A JVM parameter that specifies the maximum Java heap size.
XREF
Cross reference. Data that relates the base object data with the relevant source information.
Xss
A JVM parameter that specifies the memory assigned for stacking the threads created within the Java
process.
52 Glossary
Index
G
glossary 51 O
Oracle database recommendations
I INIT.ORA recommendations 17
RAC recommendations 15
IBM Db2 recommendations
database file configuration parameters 21
R
J recommendations
batch job optimization 36
Java recommendations Data Director and SIF optimization 46
database connection pool 10 database 12
JVM settings 10 environment validation tools and utilities 49
Hub Console optimization 45
IBM Db2 21
M Java 10
MDM Hub 25
MDM Hub recommendations Microsoft SQL Server 20
Hub Server properties 25 Oracle database 15
Infinispan 25
53