DQ 100 ExceptionManagementGuide en
DQ 100 ExceptionManagementGuide en
0)
This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?License, http://
www.stlport.org/doc/ license.html, http://asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT, http://hsqldb.org/web/hsqlLicense.html, http://
httpunit.sourceforge.net/doc/ license.html, http://jung.sourceforge.net/license.txt , http://www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/
license.html, http://www.libssh2.org, http://slf4j.org/license.html, http://www.sente.ch/software/OpenSourceLicense.html, http://fusesource.com/downloads/licenseagreements/fuse-message-broker-v-5-3- license-agreement; http://antlr.org/license.html; http://aopalliance.sourceforge.net/; http://www.bouncycastle.org/licence.html;
http://www.jgraph.com/jgraphdownload.html; http://www.jcraft.com/jsch/LICENSE.txt; http://jotm.objectweb.org/bsd_license.html; . http://www.w3.org/Consortium/Legal/
2002/copyright-software-20021231; http://www.slf4j.org/license.html; http://nanoxml.sourceforge.net/orig/copyright.html; http://www.json.org/license.html; http://
forge.ow2.org/projects/javaservice/, http://www.postgresql.org/about/licence.html, http://www.sqlite.org/copyright.html, http://www.tcl.tk/software/tcltk/license.html, http://
www.jaxen.org/faq.html, http://www.jdom.org/docs/faq.html, http://www.slf4j.org/license.html; http://www.iodbc.org/dataspace/iodbc/wiki/iODBC/License; http://
www.keplerproject.org/md5/license.html; http://www.toedter.com/en/jcalendar/license.html; http://www.edankert.com/bounce/index.html; http://www.net-snmp.org/about/
license.html; http://www.openmdx.org/#FAQ; http://www.php.net/license/3_01.txt; http://srp.stanford.edu/license.txt; http://www.schneier.com/blowfish.html; http://
www.jmock.org/license.html; http://xsom.java.net; http://benalman.com/about/license/; https://github.com/CreateJS/EaselJS/blob/master/src/easeljs/display/Bitmap.js;
http://www.h2database.com/html/license.html#summary; http://jsoncpp.sourceforge.net/LICENSE; http://jdbc.postgresql.org/license.html; http://
protobuf.googlecode.com/svn/trunk/src/google/protobuf/descriptor.proto; https://github.com/rantav/hector/blob/master/LICENSE; http://web.mit.edu/Kerberos/krb5current/doc/mitK5license.html; http://jibx.sourceforge.net/jibx-license.html; https://github.com/lyokato/libgeohash/blob/master/LICENSE; https://github.com/hjiang/jsonxx/
blob/master/LICENSE; https://code.google.com/p/lz4/; https://github.com/jedisct1/libsodium/blob/master/LICENSE; http://one-jar.sourceforge.net/index.php?
page=documents&file=license; https://github.com/EsotericSoftware/kryo/blob/master/license.txt; http://www.scala-lang.org/license.html; https://github.com/tinkerpop/
blueprints/blob/master/LICENSE.txt; http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html; https://aws.amazon.com/asl/; https://github.com/
twbs/bootstrap/blob/master/LICENSE; https://sourceforge.net/p/xmlunit/code/HEAD/tree/trunk/LICENSE.txt; https://github.com/documentcloud/underscore-contrib/blob/
master/LICENSE, and https://github.com/apache/hbase/blob/master/LICENSE.txt.
This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php), the Common Development and Distribution
License (http://www.opensource.org/licenses/cddl1.php) the Common Public License (http://www.opensource.org/licenses/cpl1.0.php), the Sun Binary Code License
Agreement Supplemental License Terms, the BSD License (http:// www.opensource.org/licenses/bsd-license.php), the new BSD License (http://opensource.org/
licenses/BSD-3-Clause), the MIT License (http://www.opensource.org/licenses/mit-license.php), the Artistic License (http://www.opensource.org/licenses/artisticlicense-1.0) and the Initial Developers Public License Version 1.0 (http://www.firebirdsql.org/en/initial-developer-s-public-license-version-1-0/).
This product includes software copyright 2003-2006 Joe WaInes, 2006-2007 XStream Committers. All rights reserved. Permissions and limitations regarding this
software are subject to terms available at http://xstream.codehaus.org/license.html. This product includes software developed by the Indiana University Extreme! Lab.
For further information please visit http://www.extreme.indiana.edu/.
This product includes software Copyright (c) 2013 Frank Balluffi and Markus Moeller. All rights reserved. Permissions and limitations regarding this software are subject
to terms of the MIT license.
See patents at https://www.informatica.com/legal/patents.html.
DISCLAIMER: Informatica LLC provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the implied
warranties of noninfringement, merchantability, or use for a particular purpose. Informatica LLC does not warrant that this software or documentation is error free. The
information provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation is
subject to change at any time without notice.
NOTICES
This Informatica product (the "Software") includes certain drivers (the "DataDirect Drivers") from DataDirect Technologies, an operating company of Progress Software
Corporation ("DataDirect") which are subject to the following terms and conditions:
1. THE DATADIRECT DRIVERS ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT
INFORMED OF THE POSSIBILITIES OF DAMAGES IN ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT
LIMITATION, BREACH OF CONTRACT, BREACH OF WARRANTY, NEGLIGENCE, STRICT LIABILITY, MISREPRESENTATION AND OTHER TORTS.
Part Number: DQ-EXC-USG-10000-0001
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Informatica Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Informatica My Support Portal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Informatica Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Informatica Product Availability Matrixes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Informatica Web Site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Informatica How-To Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Informatica Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Informatica Support YouTube Channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Informatica Marketplace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Informatica Velocity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Informatica Global Customer Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Table of Contents
Table of Contents
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Table of Contents
Preface
The Informatica Exception Management Guide describes how to use Exception Managment in the Analyst
tool.
Exception Management is an Analyst tool feature that you can use to view and update data quality exception
records in a Human task. Exceptions are records that might contain bad data or duplicate data. Use
Exception Management to resolve data errors and to consolidate clusters of duplicate records into a single
record.
Informatica Resources
Informatica My Support Portal
As an Informatica customer, the first step in reaching out to Informatica is through the Informatica My Support
Portal at https://mysupport.informatica.com. The My Support Portal is the largest online data integration
collaboration platform with over 100,000 Informatica customers and partners worldwide.
As a member, you can:
Search the Knowledge Base, find product documentation, access how-to documents, and watch support
videos.
Find your local Informatica User Group Network and collaborate with your peers.
Informatica Documentation
The Informatica Documentation team makes every effort to create accurate, usable documentation. If you
have questions, comments, or ideas about this documentation, contact the Informatica Documentation team
through email at [email protected]. We will use your feedback to improve our
documentation. Let us know if we can contact you regarding your comments.
The Documentation team updates documentation as needed. To get the latest documentation for your
product, navigate to Product Documentation from https://mysupport.informatica.com.
Informatica Marketplace
The Informatica Marketplace is a forum where developers and partners can share solutions that augment,
extend, or enhance data integration implementations. By leveraging any of the hundreds of solutions
available on the Marketplace, you can improve your productivity and speed up time to implementation on
your projects. You can access Informatica Marketplace at http://www.informaticamarketplace.com.
Informatica Velocity
You can access Informatica Velocity at https://mysupport.informatica.com. Developed from the real-world
experience of hundreds of data management projects, Informatica Velocity represents the collective
knowledge of our consultants who have worked with organizations from around the world to plan, develop,
deploy, and maintain successful data management solutions. If you have questions, comments, or ideas
about Informatica Velocity, contact Informatica Professional Services at [email protected].
Preface
Preface
CHAPTER 1
Introduction to Exception
Management
This chapter includes the following topics:
Task Types, 10
Task Types
The type of task that a workflow assigns to you depends on the type of data quality issues that the source
database tables contain. The records in the tables might contain errors, null values, or values that are
10
inaccurate in the current data project. The tables might contain records that are redundant because they
contain different versions of the same information.
You can work on the following types of task instance in the Analyst tool:
Correct exceptions task
Contains records that might include errors or null values. Analyze and fix any error that you find. When a
record is free of errors, update the record status to indicate that the record is valid.
Correct duplicates task
Contains records that might contain duplicate information. The task sorts the records into clusters. A
cluster is a group of records that represent the same business entity in the source data set. Analyze
each cluster, and define a preferred version of the record that the cluster represents. Update the cluster
status to indicate that you reviewed the preferred record.
If a record is not a duplicate of any record in the cluster, move the record to another cluster. You can
create a cluster that contains a single unique record.
Review exceptions task
Contains a set of exception records that an earlier user analyzed in a correct exceptions task. Review
the work done by the earlier user, and verify that the record data and the record status are correct.
Review duplicates task
Contains a set of clusters that an earlier user worked on in a correct duplicates task. Review the work
done by the earlier user, and verify that the preferred record data and the cluster status are correct.
Note: Records cannot pass from a task that corrects or reviews exception records to a task that corrects or
reviews duplicate records. The database tables that contain exception records and duplicate records have
different structures.
11
When a Human task contains multiple steps, the exception records pass from one step to another. The
developer who configures the workflow defines the sequence of the steps. Each step identifies a set of
Analyst tool users.
Start Workspace
The Start workspace displays the tasks that own and any task that you administer. Use the Start workspace
options to open tasks, to perform actions on tasks, and to review the task metadata.
The user role that you use to log in to the Analyst tool determines the tasks that you can view in the Start
workspace. Select the My Tasks view to display the tasks that you own. Select the Task Administration
view to display any task that you administer. You can administer tasks if the workflow that creates the tasks
identifies you as a business administrator.
The following image shows a list of tasks in the My Tasks view:
2.
Refresh
Refreshes the workspace data.
3.
Actions menu
Opens a list of actions that you can perform on the tasks that you select.
4.
Column headings
Lists the names of the columns that describe the task instances.
5.
Filter
Uses the values that you enter to filter the list of records.
6.
Task name
Shows the name of the task. To open a task, select the task name.
13
7.
Comments
Opens an editor that you can use to enter a comment or to read a comment about the task.
Correct duplicates. Examine a cluster of duplicate records and create a preferred record from the
values in the records.
Review exceptions. Review the work of another user in a task that corrects exceptions.
Review duplicates. Review the work of another user in a task that creates a preferred record from a
cluster of records.
Note: The task types might include Voting tasks. Business Glossary users work on Voting tasks. Voting
tasks are not a part of the exception management process.
Due Date
The deadline for the task. The Human task defines the due date for a correct exceptions task and a
correct duplicates task. The Analyst tool calculates the due date for a correct exceptions task and a
correct duplicates task.
Status
The status of the task
The task can have one of the following statuses:
Owner
The name of the current task owner.
Created
The date on which the workflow created the task.
14
Exceptions Workspace
The Exceptions workspace is a temporary workspace that appears when you view or open a task. The
Exceptions workspace contains a Data Editing tab and a Data Audit tab.
The Data Editing tab displays the task data and the options that you can use to update the records or
clusters in the task. The tab also displays the metadata columns that the Analyst tool uses to track the
updates that you and other users make to each record or cluster.
The Data Audit tab displays an audit trail of the changes that you and other users made to the task data.
You can view the fields that users changed, the identity of the user who changed each record, and the date
of the change.
When you finish work on a task, you can close the Exceptions workspace.
15
might perform a task on a data set and pass the results to a colleague who assumes the data stewardship
duties.
You ask a developer to configure one or more mappings to find and fix the errors in the data set.
The mappings also calculate a numeric score for each record in the data set. The scores represent the
data quality of the records. Some records have marginal scores that indicate that the mappings cannot
verify all of the data quality issues that the records contain.
2.
3.
The developer configures an additional mapping that reads the numeric scores. The developer adds the
mapping to a workflow that includes a Mapping task and a Human task.
The Mapping task runs the mapping. The Mapping task writes the records to different tables based on
the scores that they contain.
The Human task distributes the records with marginal scores to tasks that you and other users can
open in the Analyst tool.
You log in to the Analyst tool, and you open a task. The task organizes the exception records in one or
more tables. Each table can contain 100 records.
You perform one the following actions on each record:
You correct the errors in the record, or you decide that the current record is correct.
You update the record status to indicate that the record is valid.
You determine that the record does not contain any valid data.
You update the record status to indicate that the record is not valid.
You decide that you cannot verify the accuracy of the record.
You update the record status to indicate that the record needs further analysis by another user or by
another Informatica process.
Note: Before you update a record, verify that the task is open in edit mode. To enter edit mode, click the
Edit button in the open task.
4.
When you finish work on all of the records in the task, you update the task status. The task status
indicates that the records are ready for the next stage in the data quality process.
The next stage for the data depends on the configuration of the Human task. For example, the Human
task might include additional steps that assign the records to other users for review.
When the Human task completes, the next stage of the workflow begins.
16
You ask a developer to configure one or more mappings to identify the duplicate records.
The mappings calculate a set of numeric scores that represent the levels of duplication between the data
values in the records. High scores indicate duplicate records, and low scores indicate unique records.
Some records have marginal scores that indicate that the duplicate status of the records is uncertain.
2.
3.
The developer configures an additional mapping that reads the numeric scores. The developer adds the
mapping to a workflow that includes a Mapping task and a Human task.
The Mapping task runs the mapping. The Mapping task writes the records to different tables based on
the scores that they contain.
The Human task distributes the records with marginal scores to tasks that you and other users can
open in the Analyst tool.
4.
You examine the data values in each column of record data. You select the most accurate value in
each column and promote the value to the preferred record.
You can edit the values that you select, and you can search for records that contain common values
in other clusters.
If a record does not belong in the current cluster, you move it to another cluster or you create a
cluster for the record.
You update the cluster status to indicate that you reviewed the cluster. You complete the task when
you verify the current preferred record in every cluster.
Note: Before you update a record, verify that the task is open in edit mode. To enter edit mode, click the
Edit button in the open task.
5.
When you finish work on all of the clusters in the task, you update the task status. The task status
indicates that the records are ready for the next stage in the data quality process.
The next stage for the data depends on the configuration of the Human task. For example, the Human
task might include additional steps that assign the clusters to other users for review.
When the Human task completes, the next stage of the workflow begins.
17
2.
In the Address field, enter the URL for the Analyst tool:
http[s]://<fully qualified host name>:<port number>/analyst/
3.
If the domain uses LDAP or native authentication, enter a login name and a password on the login page.
4.
5.
18
CHAPTER 2
19
2.
3.
4.
Update the status of each record to reflect the current record data.
Choose one of the following options:
Accept. You determine that the current data is acceptable to the business.
Reject. You determine that the current data is unacceptable to the business.
5.
Update the record to indicate that you reviewed the record data. You can set a status value of Reviewed
independently from the other status values.
Use the filter options to show or hide records with a common status value.
6.
Optionally, add a note to the record. For example, you might add a note to explain why you rejected a
record.
7.
When you finish work on all records in the task, update the task status.
2.
Review each record. Examine the record data, and examine the status indicators that the previous user
set for each record.
If you agree with the current content of the records, make no change. If you disagree with the content
of any record, update the record.
If you agree with the current record status, make no change. If you disagree with the record status,
update the status.
Use the filter options to show or hide records with a common status value.
20
3.
Verify the review status of each record. The review status indicates that you approve or reject the record.
The review status supersedes the status that a previous user applies to the record.
4.
Optionally, add a note to the record. If a user added a note to a record in an earlier task, the note that
you add replaces the older note.
5.
When you finish work on all records in the task, update the task status.
2.
3.
4.
Highlighter
View the records that do not use a status indicator.
5.
6.
21
7.
8.
Record selector
Select the records that a record action applies to.
9.
Note column
Read the note that the task owner added to the record.
10.
11.
12.
13.
Navigation options
Move to different pages in the task.
2.
Click Edit.
3.
Examine the records. Select a data field that contains an error that you can fix.
Note: Move the pointer over the field to see the type of data error that the field contains.
4.
5.
22
Select one of the following status indicators when you finish work on a record:
Accept Record
Indicates that the record is acceptable for storage with the organization data.
You can accept a record in a correct exceptions task and in a review exceptions task.
Reject Record
Indicates that the record is not acceptable for storage with the organization data.
Reject a record that you determine cannot correctly identify a business entity in the database. A
downstream process might drop the records that you reject from the database tables.
You can reject a record in a correct exceptions task and in a review exceptions task.
Reprocess Record
Indicates that the record contains a data quality issue that you cannot verify.
Reprocess a record when you cannot determine the accuracy of the record data. Another user in a
downstream task might verify or update the record data. Or, a downstream process might write the
record to a table for analysis and correction in a mapping.
You can reprocess a record in a correct exceptions task and in a review exceptions task.
Mark as Reviewed
Indicates that you reviewed the record data.
Mark a record as reviewed to indicate to other users that you examined the record. The status does not
describe the data quality of the record or specify any further action for the record data.
You can set other status indicators when you mark a record as reviewed. For example, you can
reprocess a record that you mark as reviewed.
You can mark a record as reviewed in any task.
Approve Record Edit
Indicates that you analyzed the record in a review exceptions task and you determined that the record is
acceptable for storage with the organization data.
You can also approve a record edit in a review duplicates task.
Reject Record Edit
Indicates that you analyzed the record in a review exceptions task and you determined that the record is
not acceptable for storage with the organization data.
You can also reject a record edit in a review duplicates task.
Clear Record Status
Clears any of the following status indicators from the record:
Accept Record
Reject Record
Reprocess Record
Clear the status if you determine that the current status is incorrect for the record.
You can clear the status indicator in a correct exceptions task and in a review exceptions task.
Clear Reviewer Status
Clears any of the following status indicators from the record:
23
Mark as Reviewed
Clear the status if you determine that the current review status is incorrect. For example, you might clear
the status of a record that another user reviewed in an earlier task.
You can clear the status indicator in any task.
Note: The status indicator must represent the current state of the record. For example, you might reject a
record because you identify an error in the record. If you fix the error, update the record status.
2.
Click Edit.
3.
4.
The operation to select all records applies to the records in the current workspace.
If you apply a filter to the task, the operation to select all records applies to the records in the
workspace that meet the filter criteria.
Open the Record Actions menu, and select the status indicator to apply to the records.
You can select the following status indicators in a correct exceptions task and a review exceptions task:
To indicate that a record does not contain valid data, select Reject Record.
To indicate that the record needs further analysis, select Reprocess Record.
To indicate that you examined the record, select Mark as Reviewed. You can select the indicator in
parallel to another indicator.
You can select the following status indicators in a review exceptions task:
5.
To verify that a record is valid for inclusion in the business data, select Mark as Accepted.
To verify that a record is not valid for inclusion in the business data, select Mark as Rejected.
If you disagree with the current status of the record, select one or both of the following options:
To clear an indicator that accepts, rejects, or reprocesses a record, select Clear Record Status.
To clear an indicator that specifies a type of review, select Clear Review Status.
24
Accepted. Records that are suitable for permanent storage with the organization data.
Rejected. Records that are not suitable for permanent storage with the organization data.
Review
Indicates the review status of the record in the current task.
You can choose from the following review options:
All Records.
25
Accepted. Records that are suitable for permanent storage with the organization data.
Rejected. Records that are not suitable for permanent storage with the organization data.
Review
Indicates the review status of the record in the current task.
You can choose from the following review options:
All Records.
2.
Click Filter.
The Filter panel opens.
26
3.
4.
2.
3.
4.
5.
6.
7.
To replace the data value that you enter with another value, enter a value in the Replace with field.
Select or clear the option to search the data fields for the complete string that you enter.
To replace the highlighted value with the value that you entered, click Replace.
Repeat the step to find and replace additional instances of the value in the task.
8.
27
2.
3.
Click Edit.
4.
5.
6.
2.
Verify the record data, and verify the status that earlier users assigned to each record.
3.
If you decide to update a record or to change the record status, click Edit.
To fix an error, click the field that contains the error and enter the correct data value.
To update the record status, perform one of the following actions:
28
To indicate that a record does not contain valid data, select Reject Record.
To indicate that the record needs further analysis, select Reprocess Record.
To indicate that you examined the record, select Mark as Reviewed. You can select the indicator in
parallel to another indicator.
To verify that a record is valid for inclusion in the business data, select Mark as Accepted.
To verify that a record is not valid for inclusion in the business data, select Mark as Rejected.
4.
To clear an indicator that accepts, rejects, or reprocesses a record, select Clear Record Status.
To clear an indicator that specifies a type of review, select Clear Review Status.
When you set the status of a record, you can indicate that you reviewed the record. The review status
does not describe the accuracy or the data quality of the record. For example, you can accept, reject, and
reprocess a series of records, and you can mark each record as reviewed.
As a best practice, mark every record that you examine as reviewed. The status confirms to a user in a
downstream task that another user examined the record. When you update the data in a record, mark the
record as reviewed regardless of the presence or absence of another status indicator on the record.
You can set any status indicator when you update a record. For example, you can update a record that
you reprocess. The update that you make can help the next user or a downstream data process to
analyze and repair the data.
The data values and the status indicators on a record can change independently of one another across
multiple tasks. When you update a record, verify that the data values and the status indicators are current
and accurate. The changes that you make can overrule the decisions that another user made.
The audit trail stores every change that a user makes to the data values and the status indicators on a
record. The audit trail does not store changes to the text in a note that you add to the record.
The data that you work on can pass to a task that corrects data or a task that reviews data. For example,
a developer who configures a Human task in a workflow might specify multiple correct exceptions tasks in
sequence. The developer might follow a review duplicates task with a correct duplicates task.
29
CHAPTER 3
Preferred Records, 31
Editing a Cluster , 34
30
Preferred Records
When you work on a cluster, you create or verify the most accurate and complete version of the record that
the cluster represents. The record that you create or verify is the preferred record.
The first row in the cluster contains the preferred record data. To update the preferred record, promote data
values from the other records in the cluster to the preferred record.
By default, the Analyst tool populates the preferred data row with data from the first record in the cluster. The
Analyst tool highlights the preferred data row. The Analyst tool also highlights the record that contains the
default data. When you promote a data value to the preferred record, the Analyst tool highlights the value that
you promote. You can edit a value that you add to the preferred record. You cannot edit a value in another
record in the cluster. If the default record is correct, you can accept the default record.
Note: The preferred record is not a member of the cluster. The preferred record is a unique record that the
workflow creates in the duplicate record database.
2.
Examine the preferred record and the other records in the current cluster.
If a field in another record contains a more accurate value than the same field in the preferred record,
promote the value to the preferred record.
3.
Optionally, perform any of the following tasks to verify that the cluster contain the most accurate data:
Create a cluster and move a record to the cluster that you created.
Preferred Records
31
4.
Update the cluster status to indicate that you reviewed the cluster.
5.
When you finish work on all of the clusters in the task, update the task status.
2.
Review the contents of each cluster. Verify or update the status that a user assigned to the cluster in
another task.
Use a filter to find the clusters that the user reviewed or did not review.
3.
If you agree with the content and the status of the cluster, make no change.
If you disagree with the content of any cluster, update the cluster.
If you disagree with the status of any cluster, update the status.
When you finish work on all of the clusters in the task, update the task status.
32
2.
3.
Filter options
Find clusters in the task that meet the criteria that you specify. When you apply a filter, you close the
current cluster and you open the first cluster that meets the filter criteria.
The filter also applies to the clusters that you discover. When you apply a filter, the Discovered Clusters
list shows the clusters within the discovered list that meet the filter criteria.
4.
Note option
Read a note that a user added to the current cluster on the Task Clusters list.
5.
6.
7.
Promotion option
Updates the data in the preferred record with the data from the row that you select.
8.
9.
10.
33
11.
12.
13.
14.
Editing a Cluster
Examine the preferred record and any other record in the cluster. Verify that the preferred record contains the
most accurate data for the business entity that the cluster represents.
If the current preferred record is correct, make no change.
1.
2.
3.
Examine the records in the cluster. Determine if the preferred record contains the most accurate data for
the business entity that the cluster represents.
The preferred record is the first record in the cluster. The Analyst tool highlights the preferred record and
the source record for the preferred data.
4.
Click Edit.
5.
Replace the current preferred record with another record in the cluster.
To replace the preferred record, click the promotion tool in the row that contains the record.
6.
When you complete work in a cluster, update the cluster status to Reviewed.
Related Topics:
34
35
2.
3.
Click Edit.
4.
Optionally, update the preferred record or update the contents of the cluster.
5.
To update the cluster status, perform one or more of the following actions:
6.
After you review all of the clusters in the task, update the task status. The task status indicates that the
cluster records are ready for the next stage in the workflow.
2.
3.
4.
5.
Select the column that contains the data value to search for.
6.
7.
8.
Click Find.
The search operation returns the records in the task data that contain the value that you searched for. If
you searched for multiple values, the search operation returns the records that contain every value.
36
9.
Select one or more records in the search results, and click Open.
The Data Editing tab displays the clusters that contain the records that you selected.
10.
11.
Related Topics:
2.
3.
Click Edit.
4.
Find a record in another cluster that matches a record in the current cluster.
Use the Discovered Clusters options to find the record. The search operation might return records in
multiple clusters.
5.
6.
7.
If a record in a cluster is a better match with the records in another cluster, move the record to the
cluster.
Use the Move Records icons to move the records.
8.
37
2.
3.
4.
Click Edit.
5.
6.
7.
8.
9.
Move any other record that matches the preferred record in the cluster that you created.
Related Topics:
2.
Open a cluster.
3.
4.
38
Accepted. Clusters that contain a preferred record that is suitable for storage with the organization
data.
Rejected. Clusters that do not contain a preferred record that is suitable for storage with the
organization data.
Reviewed. Clusters that you reviewed. The status does not indicate the status of the preferred record.
39
Moved to cluster. Any cluster that contained a record that a user moved to another cluster.
Moved from cluster. Any cluster that contains a record that a user moved from another cluster.
Review
Returns the clusters that contain the review status that you specify.
You can choose the following review options:
2.
3.
Click Filter.
The Filter panel opens.
Note: The Data Editing tab and the Data Audit tab display different sets of filter options.
4.
5.
40
Use the Cluster Actions menu options to update the record status. When you finish work on the task, use
the Task Actions menu options to update the task status.
Note: Verify that the other records in the cluster do not include any record that the business might want to
keep. You can move the records to another cluster, and you can create a cluster to store the record.
2.
3.
To update the contents of the cluster or the cluster status, click Edit.
To update the preferred record, perform one or more of the following actions:
Replace the current preferred record with another record in the cluster.
To replace the preferred record, click the promotion tool in the row that contains the record.
You can also move a record to another cluster or import a record from another cluster. To move records
between clusters, use the Discovered Clusters options.
To update the cluster status, perform one of the following actions:
4.
When you finish work on the cluster, save the cluster to the task.
After you review all of the clusters in the task, update the task status. The task status indicates that the
cluster records are ready for the next stage in the workflow.
41
A cluster is a set of records in a database table that share similar or identical data values. A developer
defines the criteria that sort the records into clusters. If you believe that a record does not belong in the
current cluster, use the Discovered Clusters options to find the correct cluster. If you believe that another
cluster contains a record that belongs in the current cluster, use the Discovered Clusters options to find
the correct cluster.
When you work on a cluster, use the preferred record to define the most version of the business entity
that the cluster represents. The preferred record is not necessarily the final version of the record. Another
user or another data process might work on the cluster after you complete the task.
When you update the preferred record, you update a record in the exception database that represents the
preferred form of the records in the cluster. You do not update the source data in the cluster.
When you set the status of a cluster, you can indicate that you reviewed the cluster. The review status
does not describe the accuracy or the data quality of the preferred record in the cluster.
As a best practice, mark every cluster that you examine as reviewed. The status confirms to a user in a
downstream task that another user examined the cluster. When you update the data in a record, mark the
record as reviewed regardless of the presence or absence of another status indicator on the record.
42
The audit trail stores every change that a user makes to the preferred record. The audit trail does not
store changes to the cluster data.
The data that you work on can pass to a task that corrects data or a task that reviews data. For example,
a developer who configures a Human task in a workflow might specify multiple correct exceptions tasks in
sequence. The developer might follow a review duplicates task with a correct duplicates task.
CHAPTER 4
Task Management
This chapter includes the following topics:
43
Release a task
Release a task that you own. The task that you release has no owner until the business administrator
assigns an owner or another user opens the task.
You can release a task at any time. You do not need to complete work on a task to release the task.
The following image shows the My Tasks view in the Start workspace:
2.
You can view, open, and release multiple tasks in a single operation.
Viewing Tasks
When you view a task, you open the task in read-only mode. As a task performer, you can view the contents
of a task that has no owner. A business administrator can view the contents of any task on the Task
Administration view.
You can open multiple tasks in read-only mode in a single operation. Each task opens on a separate tab in
the Exceptions workspace. Use the task check boxes to select the tasks.
1.
2.
3.
Opening a Task
Open a task to work on the task data. When you open a task, you claim ownership of the task. Other task
performers cannot edit a task that you own.
You can open multiple tasks in a single operation. Each task opens on a separate tab in the Exceptions
workspace. Use the task check boxes to select the tasks.
1.
44
2.
3.
Releasing a Task
When you release a task, you no longer own the task. Another user can claim ownership of the task, or a
business administrator can assign the task to another user. The task saves any work that you performed on
the task data.
You can open release multiple tasks in a single operation. Use the task check boxes to select the tasks. A
business administrator can release any task on the Task Administration view.
1.
2.
3.
45
2.
3.
4.
5.
6.
Click Assign.
The Task Administration view updates the records to show the task owner that you selected.
2.
3.
4.
5.
Task ID.
The unique identifier of the task instance within the Human task.
Task title.
The name of the task that appears in the Task Administration view.
Task type.
The type of task that the Human task generated. A task might be a correct exceptions task, a review
exceptions task, a correct duplicates task, or a review duplicates task.
Task owner
The name of the user who owns the task.
Due date
The date by which the user must complete the task.
Status
The ownership status of the task. Before a user claims a task, the task status is Created. When a
user claims a task, or when you assign a task to a user, the task status is Assigned.
Created.
The date on which the workflow that created the tasks ran.
Click Cancel.
To move the task data to the next stage in the workflow, click OK.
46
The number of records that the user reviewed and the total number of records in the task.
2.
The number of records that the user accepted as valid for storage with the business data.
3.
The number of records that the user selected for further analysis.
4.
The number of records that the user rejected as not valid for storage with the business data.
2.
Select a task.
The task opens in the Exceptions workspace.
3.
The workflow failed, and you want to run the workflow again.
47
2.
3.
4.
5.
Click OK.
The tasks are complete. The task data moves to the next stage in the workflow.
If you open the Inbox after you complete the tasks, the Inbox might not display any change to the task list. To
view the current list of tasks in the Inbox, refresh the Inbox.
Description
ROW_IDENTIFIER
REVIEW_STATUS
48
WORKFLOW_ID
USER_COMMENT
The most recent note that a user added to the record in the Analyst tool.
Column Name
UPDATED_STATUS
Description
The current status of the record data.
A record can have one of the following status values:
-
RECORD_STATUS
The record status that the workflow sets. The workflow sets the status value when it writes
the record to an exception data table. The default status is INVALID.
Description
ROW_IDENTIFIER
SEQUENTIAL_CLUSTER_ID
An identifier value for the cluster in the database table. The workflow uses the
value to sort the cluster rows in the database.
CLUSTER_ID
The value that the identifies the cluster to which the record belongs.
MATCH_SCORE
A value that indicates the degree of similarity between two records in the cluster.
The score is a decimal value between 0 and 1.
IS_MASTER
A value that identifies preferred records in the table. The values are Y for
preferred records and N for other records.
UPDATED_STATUS
USER_COMMENT
The most recent note that a user added to the cluster in the Analyst tool.
REVIEW_STATUS
WORKFLOW_ID
49
2.
3.
4.
Select or clear the option to export the column names as the first row of the export data.
5.
Click Export.
The Analyst tool exports the data file to the directory structure.
50
Index
A
audit trail
duplicate record filters 39
exception record filters 26
B
business administrator 12, 43
C
Clear Cluster Status
description 35
cluster
adding a note 38
creating 38
editing 34
finding records 36
status updating 35
clusters
filtering cluster data 39
D
Data Audit tab
duplicate record filters 39
exception record filters 26
Data Editing panel
exception data filters 25
filtering clusters 40
filtering records 26
Data Editing tab
duplicate record tasks 32
filtering cluster data 39
duplicate records
correct duplicates task 30, 31
creating a cluster 38
Discovered Clusters options 36
editing clusters 34
review duplicates task 32
steps to correct 31
steps to review 31
updating cluster status 35
duplicate tasks
export file structure 49
exception records
correct exceptions task 20
editing exception records 22
filtering 26
review exceptions task 20
steps to correct 19
steps to review 19
updating record status 24
exception tasks
export file structure 48
Exceptions workspace
correct exceptions task 21
Data Editing tab 21
Informatica Analyst 15
export file
duplicate tasks 49
exception tasks 48
exporting a task
description 48
F
filter options
exception data 25
filtering clusters
steps 40
filters
cluster data 39
I
Informatica Analyst interface
log in 17
M
My Tasks view 13, 43
N
notes
adding to clusters 38
P
preferred record
changing 34
Exception Management
overview 10
51
R
review task
review duplicates 41
steps to review clusters 40
roles
business administrator 12, 43
task performer 12, 43
S
Start workspace
columns 14
status indicators
duplicate record tasks 35
exception tasks 22
T
task
correct duplicates 30, 31
correct exceptions 20
exporting task data 48, 50
Human task 11
52
Index
task (continued)
Mapping task 11
opening 44
review duplicates 32
review exceptions 20
steps in a Human task 12
task instances 11
tasks and workflows 11
types of task 11
task administration
assigning a task to a user 46
task administration options 45
task instance
definition 11
task performer
My Tasks view 43
tasks
release a task 45
viewing 44
W
workflows
description 11