Informatica MDM Training 2
Informatica MDM Training 2
Management (MDM)
1
Topic 4: Load Process
2
Objectives
• Configure Trust
• Configure Relationships
• Configure Lookups
3
Trust
• A mechanism for measuring the confidence factor associated with each cell based on
its source system, change history, and other business rules
• Ensures that the most reliable data at the cell level is consolidated based on data
characteristics
4
Trust
100 62
Doug McDougal Grp 56
1-555-901-4670
Winners Survive:
71McDougal Group 37
201-10810
5
Trust
75
50
Base Object cells only updated where new data has higher
trust weighting:
ROWID_O Name Phone
BJECT
100 DMcD Group 1-555-901-4670
6
Trust
• If trust is switched off for a column, then the most recently updated value from any
source is the survived value in the base object
7
Trust
• Source of the Data – Each trust enabled column must have a maximum and minimum
trust weighting assigned for each source system
• Decay period for Data – Each trust enabled column must have a decay period
assigned for each source system that tells how long the trust weighting takes to drop
from maximum trust to minimum trust
• Decay Type – Each trust enabled column must have a decay type assigned for each
source system that tells how the trust value decreases from maximum to minimum
during the decay period
Decay Types
8
Trust
Trust Demo
9
Validation Rules
• Reserve Minimum Trust can be set to avoid having trust scores below the minimum
trust value
endif;
• Validation check can be done on any column in a base object and Downgrade can be
applied to any other columns in the base object
1
Validation Rules
middle_name is null
• Downgrade trust on Address Line 1, City, State, Zip, and Valid Address Ind if
valid_address_ind = ‘False’
1
Validation Rules
1
Relationships
• Types of relationships:
• One to Many Relationship
• Many to Many Relationship
1
Relationship
One-to-many Relationship
• One table (the child) contains a foreign key column, which matches a unique key
column of another table (the parent)
• One-to-many relationships are always defined from the child table in the relationship
(i.e. the referencing table rather than the referenced table).
1
Relationship
Many-to-many Relationship
• A base object acts as an intersection table between another two base objects
• The intersection table has a one-to-many relationship with the other two base objects
1
Lookups
• Lookups are translation of source’s primary or foreign key value into corresponding
base object key value
1
Lookups
Automatic Lookups
• MRM automatically handles lookups loading/updating the primary key of a Base Object
Customer Cross-Reference
1
Lookups
Staging Table for Address data from CRM Customer Base Object
System (C_STG_CRM_ADDR):
ROWID_OBJECT FULL_NAME
PKEY_SRC_OBJECT CRM_ID
Customer Cross-Reference
1
Lookups
• The foreign key value stored on the cross-reference (X-ref) is the same as the value
stored on the base object
• However, it makes it difficult to tie child X-ref’s back to their original parent X-ref
• Shadow foreign key is an additional column added to the X-ref for every foreign key
defined on the base object
1
Lookups
PKEY_SRC_
CUST_ID 10810 JOHN J HANCOCK
OBJECT
Address Cross-Reference
ROWID_ ROWID_ PKEY_SRC_
CUST_ID S_CUST_ID ADDRESS
OBJECT SYSTEM OBJECT
2
Load Process
2
Load Process
• Apply Updates
• Apply Inserts
Register STRIP_ON_LOAD_IND = 0
Process
LOAD
Updates
job
STRIP_ON_LOAD_IND = 1
Tokenize
STRIP_ON_LOAD_IND= 0
End
Process Inserts
LOAD
job
STRIP_ON_LOAD_IND = 1
Tokenize
2
Load Process
Updates
• The update process may update the Base Object depending on trust:
• For columns not flagged for trust, update happens if incoming data has new LUD
• For columns flagged for trust, load job compares trust weightings of staging table data
to trust weightings of existing data in base object to determine what can be updated
• If history flag is switched on for the Base Object, then the update process writes to the
history tables of Base Object and XREF
2
Load Process
Inserts
• Load job applies inserts for records that do not exist in the XREF table
• New records are inserted into base object and XREF with CONSOLIDATION_IND = 4
• If history flag is switched on for the Base Object, then the insert process writes to the
history tables of Base Object and XREF
2
Load Process
Rejects
• Referential Integrity is maintained among base objects in the consolidated data model
• Rejects will occur in the load process if any records violate the RI constraint
• Parent records do not exist
• Child records are loaded before the parent records
• Lookup has been defined incorrectly
• Rejected records are inserted in the reject table of Staging table
staging_table_name_REJ
2
Topic 5: Match Process
2
Objectives
2
Match & Search Strategy
Match Process
2
Match & Merge Overview
• Nicknames
• Synonyms
• Abbreviations
2
Match & Merge Overview
• Match rules also tell MRM if two matching records are similar enough to automatically
merge/link, or if they should be reviewed by a data steward
3
Match & Merge Overview
3
Match/Search Strategy
Exact
• Does not allow for any variations in the data in the match columns
• Very simple match process, therefore fast
Fuzzy
• Allows for variations in spelling, formats, word order, nicknames, synonyms, etc.
• More complex match process, therefore slower
3
Match/Search Strategy
Fuzzy
Register
MATCH Fuzzy or Generate Search for Match
job Exact? Keys Candidates
Exact
End
MATCH
job
3
Match Path
Match Path
• A Match Path represents the base object which will provide data for matching purpose
• Traverse the hierarchy between records across multiple base objects or within a single
base object
• Foreign Key Relationships between tables are used to traverse the relationships
• By default, MDM does an inner join between the base objects defined in the Match
Path
• The join therefore excludes rows that don’t have corresponding rows in the joined
tables
• To include those records, switch on “Check for Missing Children” – MDM will then do an
outer join instead of an inner join
3
Match Path
3
Match Path
3
Match Column
• Examples:
Full Name
Generation
Address
Phone
• Provider column(s) is the base object columns that provide the data for the match
column:
• Can be a single column or a concatenation of columns
• Must be a VARCHAR / CHAR column to concatenate
• Date column is also supported for matching
3
Match Column
CUSTOME Address
R_ROWID
3
Exact Match/Search Strategy
3
Exact Match/Search Strategy
Match Columns
4
Exact Match/Search Strategy
• Match rules are flagged either for auto merge/auto link or for manual merge/link
• Matches resulting from auto merge/auto link rules will result in the records being
automatically merged/linked by the system when the auto merge/auto link batch job
runs
• Matches resulting from manual merge/link rules will be queued for review by a data
steward
4
Exact Match/Search Strategy
1
OR 2
2 records match if
they have the
same values in
Match Col 2 and in
Match Col 3
4
Exact Match/Search Strategy
• NULL Matches non-NULL: Use In the above example the effects of Null
Matching on the Generation column are
this flag to specify the match shown
columns in a match rule that
should be regarded as matches
when one of the values being
compared is NULL and the other
is not
4
Exact Match/Search Strategy
4
Exact Match/Search Strategy
4
Exact Match/Search Strategy
Match Rule
4
Fuzzy Match/Search Strategy
4
Fuzzy Match/Search Strategy
Population
• Each population also has a large number of uncommon names that tend to have the
most error and variability
• Match needs to account for both of these situations in the way that the keys are built,
to give optimal search performance for both
• Defines how to build keys and perform searches on name and address
4
Fuzzy Match/Search Strategy
Population
4
Fuzzy Match/Search Strategy
Match Key
• Match key is used to search for match candidates
• It is a fixed-length, compressed, and encoded value
• Built from a combination of the words and numbers in a name or address
• For one name or address, multiple SSA match keys are generated
• Match Key Properties:
• Key Type
• Key Width
• Path Component
• Match Column Contents
5
Fuzzy Match/Search Strategy
• The match key type describes important characteristics about a column to MDM Hub
5
Fuzzy Match/Search Strategy
∙ Determines the degree of variance that will be supported in the key values
∙ Represents tradeoff between match precision and the space used by match key
records
Key Width Description
• Aims for balance between Limited and Extended i.e. balance between
Standard
disk usage/performance and search completeness
5
Fuzzy Match/Search Strategy
• Contains the column that forms the basis for defining the Match Key
• The column(s) from Path Component that provide data to the Match Key
5
Fuzzy Match/Search Strategy
5
Fuzzy Match/Search Strategy
Match Key
5
Fuzzy Match/Search Strategy
Match Column
5
Fuzzy Match/Search Strategy
Match Column
5
Fuzzy Match/Search Strategy
• They are logical grouping of Match Rules that collectively act on a base object for
identifying duplicates
• Each rule set has a Search Level and can comprise of one or more Match Rules
5
Fuzzy Match/Search Strategy
• Determines how many match candidates are returned in the search phase of match
process
Typical • The appropriate level of search level for typical data sets
5
Fuzzy Match/Search Strategy
6
Fuzzy Match/Search Strategy
Match Rule
6
Fuzzy Match/Search Strategy
• Determines how precise the match is i.e. how similar a candidate record is to the
queued record to be considered a match
6
Fuzzy Match/Search Strategy
Match Rule
6
Fuzzy Match/Search Strategy
Symbol Description
Column_1 (Fuzzy) Indicates that Column_1 is a fuzzy match column
Column_1 (Fuzzy) (+2) Indicates that the fuzzy column, Column_1, has had its weighting in the rule manually
increased
Column_4 (≠) Indicates that non-equal match (anti-match) is switched on for Column_4. Can be combined
with null match: Column_4 (≠ Ø)
6
Match Server Architecture
6
Topic 6: Merge Process
6
Objectives
6
Merge Process
Merge Process
6
Merge Process
Merge
• An immutable source means that the source system is seen as a distinct source
• All records coming from this source always have a consolidation indicator of 1
• If two immutable records must be merged, then a data steward needs to perform a
manual verification in order to allow that change. The data steward will have to choose
the key that remains
Distinct Systems
∙ Records from source marked as Distinct will not merge amongst themselves
7
Merge Process
Un-Merge Process
• By default, unmerging parent records does not unmerge associated child records
• Unmerge Child When Parent Unmerges option allows you to specify what happens if
records in the parent base object are unmerged
7
Topic 7: Batch Process
7
Objectives
• Scheduling Considerations
7
Batch Process
Batch Viewer
• Shows job completion status (Success / Failure / Warning) with associated message
• Useful for starting the run of a single job, or running jobs that don’t often need to run
(e.g. Synchronize Trust job after changing Trust settings)
7
Batch Process
Batch Viewer
7
Executing Stored Procedures
Stored Procedures
• All public MRM batch processes can be executed through stored procedures
• Can easily be integrated with any job scheduling software – Tivoli, CA Unicenter etc.
• The full list of public batch processes per user-defined object can be found in
C_REPOS_TABLE_OBJECT_V
7
Job Status & Job Statistics
• Job status & statistics can be viewed in the Batch Tool or query the C_REPOS_JOB*
tables directly
7
Scheduling Considerations
Stage Jobs
• If cleanse server machine has enough CPU and memory to handle multiple cleanse
servers, then parallelize stage jobs
Load Jobs
• If large number of Loads run for a short batch window, then need to Load separate
targets in parallel and check all dependencies before each Load starts
Match/Merge Jobs
• Determine whether to run match-merge once per object per batch window, or after
every source load
• Consider whether to tokenize after load. Can switch off the STRIP_ON_LOAD indicator
so that the strip process does not run as part of the load
7
Batch Group
Batch Group
• A batch group is a collection of individual batch jobs (e.g. Stage, Load, Match, etc.) that
can be executed with a single command
• Each batch job in a group can be executed sequentially or in parallel to other jobs
• History logs can be viewed across all Batch Groups, based on their execution status by
clicking on the appropriate node under the “Logs By Status” node
• A batch group that contains stage jobs may encounter rejected records. These can be
viewed by selecting the log record for the stage job that contains the rejected record,
then clicking the “View Rejects” button
7
Batch Group
Batch Group