CDV August2024 IntroducingDataValidation en
CDV August2024 IntroducingDataValidation en
August 2024
This software and documentation are provided only under a separate license agreement containing restrictions on use and disclosure. No part of this document may be
reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica LLC.
U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial
computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such,
the use, duplication, disclosure, modification, and adaptation is subject to the restrictions and license terms set forth in the applicable Government contract, and, to the
extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License.
Informatica, Informatica Cloud, Informatica Intelligent Cloud Services, PowerCenter, PowerExchange, and the Informatica logo are trademarks or registered trademarks
of Informatica LLC in the United States and many jurisdictions throughout the world. A current list of Informatica trademarks is available on the web at https://
www.informatica.com/trademarks.html. Other company and product names may be trade names or trademarks of their respective owners.
Portions of this software and/or documentation are subject to copyright held by third parties. Required third party notices are included with the product.
The information in this documentation is subject to change without notice. If you find any problems in this documentation, report them to us at
[email protected].
Informatica products are warranted according to the terms and conditions of the agreements under which they are provided. INFORMATICA PROVIDES THE
INFORMATION IN THIS DOCUMENT "AS IS" WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT.
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Table of Contents 3
Preface
Read Introducing Data Validation to learn how to use Data Validation to verify the accuracy and completeness
of data integration operations by comparing two data sets.
Informatica Resources
Informatica provides you with a range of product resources through the Informatica Network and other online
portals. Use the resources to get the most from your Informatica products and solutions and to learn from
other Informatica users and subject matter experts.
Informatica Documentation
Use the Informatica Documentation Portal to explore an extensive library of documentation for current and
recent product releases. To explore the Documentation Portal, visit https://docs.informatica.com.
If you have questions, comments, or ideas about the product documentation, contact the Informatica
Documentation team at [email protected].
https://network.informatica.com/community/informatica-network/products/cloud-integration
Developers can learn more and share tips at the Cloud Developer community:
https://network.informatica.com/community/informatica-network/products/cloud-integration/cloud-
developers
https://marketplace.informatica.com/
4
Data Integration connector documentation
You can access documentation for Data Integration Connectors at the Documentation Portal. To explore the
Documentation Portal, visit https://docs.informatica.com.
To search the Knowledge Base, visit https://search.informatica.com. If you have questions, comments, or
ideas about the Knowledge Base, contact the Informatica Knowledge Base team at
[email protected].
Subscribe to the Informatica Intelligent Cloud Services Trust Center to receive upgrade, maintenance, and
incident notifications. The Informatica Intelligent Cloud Services Status page displays the production status
of all the Informatica cloud products. All maintenance updates are posted to this page, and during an outage,
it will have the most current information. To ensure you are notified of updates and outages, you can
subscribe to receive updates for a single component or all Informatica Intelligent Cloud Services
components. Subscribing to all components is the best way to be certain you never miss an update.
To subscribe, on the Informatica Intelligent Cloud Services Status page, click SUBSCRIBE TO UPDATES. You
can choose to receive notifications sent as emails, SMS text messages, webhooks, RSS feeds, or any
combination of the four.
To find online support resources on the Informatica Network, click Contact Support in the Informatica
Intelligent Cloud Services Help menu to go to the Cloud Support page. The Cloud Support page includes
system status information and community discussions. Log in to Informatica Network and click Need Help to
find additional resources and to contact Informatica Global Customer Support through email.
The telephone numbers for Informatica Global Customer Support are available from the Informatica web site
at https://www.informatica.com/services-and-training/support-services/contact-us.html.
Preface 5
Chapter 1
Create Data Validation test cases to compare the data sets. Use the reports of the test cases to view whether
there is unmatched, missing, or extra data between the data sets.
You must have the following licenses to successfully run Data Validation test cases:
Data Validation supports the Google Chrome and Microsoft Edge browsers. However, Informatica
recommends that you use the Google Chrome browser for better performance.
Data Validation doesn't support sub-organizations, so you can't use Data Validation in a sub-organization.
After you determine the validation requirements, create and run a test case and view test results and reports.
A test case report shows the status of the test case, the overall column matching status, and matching
status of each column that the test compares. For unsuccessful tests, the report provides information about
missing, unmatched, and extra records. If there are mismatches, use the information in the report to correct
the data, and then run the test again. Repeat the process as required until test results are successful.
Note: A data validation test can validate data and identify inconsistencies in data, but it can't identify the
source of the inconsistencies.
6
Data Validation test cases
Data Validation test cases are assets that you create to analyze and compare two data sets. They test the
accuracy and validity of the data by comparing the data sets.
When you create a test case, you select the connections that contain the data to compare, the data sources
and the columns to compare, and configure test parameters. The data sources can be database tables,
views, or saved SQL queries. Data Validation maps the columns in the source to the columns in the target.
Data Integration creates a job each time the test case runs. You can monitor the jobs in Data Integration. You
can also monitor the latest status of multiple test cases with test suite reports in Data Validation.
When you run a test case, Data Validation creates Data Integration mappings and tasks that map and process
the data you compare. By default, Data Validation deletes the mappings and tasks after the test case runs.
You can choose to save and use the Data Integration mappings and tasks for debugging purposes when you
create a test case.
You can use the following types of connections with Data Validation:
• Amazon Redshift v2
• Amazon S3 v2
• Databricks Delta
• Flat file
• Google BigQuery V2
• Microsoft Azure Data Lake Storage Gen2
• Microsoft Azure Synapse SQL
• MySQL
• Netezza
• ODBC connections with the DB2 subtype
• Oracle
• PostgreSQL
• SAP HANA
• Snowflake Data Cloud
• SQL Server
For more information about configuring an Amazon Redshift v2 connection, see the Data Integration help.
Amazon S3 v2 connection
You can create test cases for Parquet files within an Amazon S3 v2 connection. When you select an Amazon
S3 v2 connection in the test case wizard, you can enter a relative path to the S3 bucket where the Parquet file
is stored. If you do not enter a path, Data Validation lists all the files in the folder path specified in the
connection.
For more information about configuring an Amazon S3 v2 connection, see the Data Integration help.
Before you use a Databricks Delta connection, ensure that you have specified the database name in the
Databricks Delta connection in Administrator.
• Avro
• CSV flat file
• JSON
• ORC
• Parquet
When you select a Microsoft Azure Data Lake Storage Gen2 connection in the test case wizard, you can enter
a relative path to the folder where the file is stored. If you do not enter a path, Data Validation lists all the files
in the folder path specified in the connection.
You can't create test cases for hierarchical objects within a Microsoft Azure Data Lake Storage Gen2
connection.
For more information about configuring a Microsoft Azure Data Lake Storage Gen2 connection, see the Data
Integration help.
For more information about configuring an ODBC connection with the DB2 subtype, see the Data Integration
help.
Note: If you use a Microsoft Azure Data Lake Storage Gen2 connection or a Snowflake Data Cloud
connection, the Secure Agent must have at least 2048 MB of Java heap size to run test cases. Otherwise, you
might face an error. For more information, see the Informatica Knowledge Base article 000167312.
When you create a test suite, you select the test cases for the suite. Then, you generate a test suite report
that displays a report for each successful job.
Reports show detailed mapping results for each successful job run. Use the results to learn where data is
mismatched between the source and target.
You can access the following pages from the navigation bar:
• Explore by projects and folders. View all projects or select a particular project.
• Explore by asset types. View all assets or view assets of a particular type.
• Search for projects, folders, or assets. To search for projects, folders, and assets in the organization, view
the Explore page by All Projects, and then enter a name or description in the Find box. Or, view the
Explore page by Asset Types and to narrow your search, select an asset type from the All Assets list.
Then, in the Find box, enter a name or description in full or part.
• Sort the search results. Sort the Explore page by name, asset type, last update date, create date, or
description. When you sort by type, the Explore page groups assets by asset type. It does not list the
asset types in alphabetical order.
• Filter the objects on the page. To filter objects, click the Filter icon. To apply a filter, click Add Filter,
select the property to filter by, and then enter the property value. The filters that are available depend on
how you view the page. You can specify multiple filters.
If the result size is large, for example, over 1000 objects, all the objects appear on the Explore page but
the total number of objects that displays at the top and bottom of the page can be approximate for a
minute or so.
Tip: Filtering is available on other pages in addition to the Explore page. For example, on the Import
Assets page, you can filter by status to find the assets that imported successfully.
You can see projects, folders, and assets for all of the services that you use. If you select an asset to open it
or perform an action, and the asset is created in a different service than the one you have open, the service
opens in a new browser tab.
Explore page 11
Working with projects and assets on the Explore page
Perform actions on projects, folders, and assets on the Explore page. To see what actions you can perform
on an object, in the row that contains the object, click the Actions icon, as shown in the following image:
You can also perform an action on multiple objects at one time. Select the check box to the left of each
object, or select the Select All check box to select all of the objects that are displayed on the current page.
The following image shows the Select All check box in use:
After you select the objects, click Actions in the row of any of the selected objects.
Explore page 13
In the following example, the Updated By, Created On, and Created By columns will be hidden when All
Projects is selected:
A M
Actions menu 11 maintenance outages 5
assets 11
P
C project folders 11
Cloud Application Integration community projects 11
URL 4
Cloud Developer community
URL 4
S
searching for assets and projects 11
D status
Informatica Intelligent Cloud Services 5
Data Integration community system status 5
URL 4
T
E trust site
Explore page 11 description 5
F U
filtering 11 upgrade notifications 5
finding assets and projects 11
W
I web site 4
Informatica Global Customer Support
contact information 5
Informatica Intelligent Cloud Services
web site 4
15