100% found this document useful (1 vote)
2K views

ETL Testing Validation and Check List

The document provides checklists for validating ETL processes that load data into a data warehouse. It describes checking that data is extracted, transformed, and loaded correctly according to rules and within expected timeframes. It also provides examples of validating data types, lengths, constraints, counts, values, and quality. Unit testing, integration testing, and performance testing checklists are also included to validate ETL workflows and performance at increasing data volumes.

Uploaded by

Raghu Nath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
2K views

ETL Testing Validation and Check List

The document provides checklists for validating ETL processes that load data into a data warehouse. It describes checking that data is extracted, transformed, and loaded correctly according to rules and within expected timeframes. It also provides examples of validating data types, lengths, constraints, counts, values, and quality. Unit testing, integration testing, and performance testing checklists are also included to validate ETL workflows and performance at increasing data volumes.

Uploaded by

Raghu Nath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

ETL Testing Validation and Check list

Data Warehouse Validations (ETL Testing)


 Verify the logic for Data Extraction
 Verify that data is transformed correctly as per logic or business transformation rules.
Transformation Logic should be validated
 Verify all the database fields and field data is loaded correctly in the warehouse without
any truncation
 Verify the data and record counts are matching in the source and target tables.
 Verify the data is loaded within expected time frames in the Data warehouse. This
ensures improved performance and scalability.
 Verify proper error logs are generated for the rejected data that should consists of all
necessary details
 Verify Data integrity and also ensure duplicate data is not loaded in the data warehouse
 Verify the fields that contain NULL values

Basic data validation in ETL Testing:


Meta data testing:
ETL Mapping documents have the mapping rules between the source and target columns, data
transformations and data types. We need to validate that the source and target files or table
structure against corresponding mapping document provided by the client.
Data Type Check: Verification of data type match between Source and Target columns is
done during the data type check. Sometimes the transformation has to be reviewed as well as
the data types can change during the transformations. Ensure whether all values for specific
fields are stored the same way in the data warehouse regardless of how they were stored in
the source system.
Example 1: The source column data type is number so the target column data type should also
be number.
Example 2: If one source system stores “Off” or “On” in its status field and another source
system stores “0” or “1” in its status field, then a data type conversion transformation converts
the content of one or both of the fields to specified common value such as “OFF” or “ON”.
Data Length Check: Verifying whether the length of target column data type has to be
greater than or equal to source column data type. Transformations need to consider during
data length check as well.
Example: Source table has First name, Last name columns with the length of 100 each.
After applying the Expression transformation, the target table should have the column
Subscriber Name with the length of 200.
Source Table
A: Target Table B SSN Subscriber
SSN First Name Name
Last Name 001 NidhiShrama
001 Nidhi Sharma 002 VijayKumar
002 Vijay Kumar
Index/Constraint Check: As per the design document specifications, the proper constraints
and indexes are defined on the target tables are verified during index/Constraint Check. Some
of key checks are UNIQUE, NULL, NOT NULL, Primary Key, Foreign key, DEFAULT
Example 1: Verify that the columns that cannot be null have the 'NOT NULL' constraint.
Example 2: Verify that the Primary Key and Natural Key columns are indexed
Attribute Check: To verify whether all the attributes of source table are present in the target
table as per the mapping document.
Data Completeness checks:
The main purpose of Data Completeness check is to verify whether all the expected data is
loaded in to the target from the source. And performing completeness checks for transformed
columns is a bit tricky but can be done in most of the cases by understanding the
transformation rules and comparing the counts of the expected results.
Data completeness checks can be done by comparing and validating the record counts,
Aggregates (min,max,sum,avg) of source and target columns with or without transformations.
Count Validation: During the count validation the record counts are compared between
source and target to check for any rejected records.
Source:
SELECT count (*) FROM Cag_Look_up
Target:
SELECT count (*) FROM Cag_Look_up_target
If the counts from both queries results same (i.e. Source Count=Target Count) then there are
no rejected records.
Data Profile Validation: Aggregate functions like count, Avg, Min, Sum, Max (where
applicable) are compared between source and target columns.
Example:
Source:
SELECT count (MEMBER_ID), count (F_NAME), count (L_NAME), avg (PHARMACY_COST) FROM
Cag_Look_up
Target:
SELECT count (MEMBER_ID), count (F_NAME), count (L_NAME), avg (PHARMACY_COST) FROM
Cag_Look_up_target
Duplicate Check: As per the business requirements, any column or combination of columns
need to unique will be verified during the duplicate check.
Example:
Select F_NAME, L_NAME, FAMILY_ID, count (*) from MEMBER_PERSISTANT_TABLE group by
F_NAME, L_NAME, FAMILY_ID having count (*)>1
If the above query returns any results, then there are duplicate data in the columns F_NAME,
L_NAME and FAMILY_ID.
Data Accuracy Testing:
The data from Source is accurately transferred to the Target according to the business logic is
ensured by Data Accuracy testing.
Value Comparison: the columns in the source with the minimum or no transformation will
compare with the target columns. The Source Qualifier transformation and expression
transformation are used while value comparing check.
Example: In ETL testing, while performing the value comparison between source and target
data,
The below simple queries can be used for, Source Data is –
Select count (*) from cag_look_up where CARRIER_ID is not null
And Target table query is-
Select count (*) from cag_look_up C, Target Member_persistant M where
C. CARRIER_ID =M. CARRIER_ID and b.ID is not null
If the count in the query1 and query2 are matched, then we can conclude as the data of
CARRIER_ID column from source has successfully transferred to target.
Data Quality Check:
Number check:
Example1: one of the column in the source system starts with 0,after loading it into the target
system 0 should not be appended. This type of business Functionalities can be validated by
Number check.
Example 2: if in the source format of numbering the columns are as aa_30 but if the target is
only 30 then it has to load not pre_fix(aa_) .
Date Check: They have to follow Date format and it should be same across all the records.
Example:
The Standard format or the default format od date should be like yyyy-mm-dd etc.. And
sometimes we can validate like FROM_DATE should not greater than TO_DATE.
Precision Check: precision of some of the numeric columns in the target should be rounded
as per the business logic.
Example:
The value of price column in the source is 28.123789 but in the target it should be displayed as
28.20.(round of value)
Data Check: Some of the records from source to target need to be filtered out based on
certain business rules.
Example: only records with the data_of_service>2012 and batch_id! = 101 should enter into
target table.
Null Check: Based on the business logic, some of the columns should have “NULL” value.
Example: Display null value in Termination Date column unless and until if his “Active status”
Column is “T”.

Unit testing checklist


Some programmers are not well trained as testers. They may like to program, deploy the
code, and move on to the next development task without a thorough unit test. A checklist will
aid database programmers to systematically test their code before formal QA testing.

 Check the mapping of fields that support data staging and in data marts.
 Check for duplication of values generated using sequence generators.
 Check the correctness of surrogate keys that uniquely identify rows of data.
 Check for data-type constraints of the fields present in staging and core levels.
 Check the data loading status and error messages after ETLs (extracts, transformations,
loads).
 Look for string columns that are incorrectly left or right trimmed.
 Make sure all tables and specified fields were loaded from source to staging.
 Verify that not-null fields were populated.
 Verify that no data truncation occurred in each field.
 Make sure data types and formats are as specified during database design.
 Make sure there are no duplicate records in target tables.
 Make sure data transformations are correctly based on business rules.
 Verify that numeric fields are populated precisely.
 Make sure every ETL session completed with only planned exceptions.
 Verify all data cleansing, transformation, and error and exception handling.
 Verify stored procedure calculations and data mappings.

Integration testing checklist


An integration test checklist helps ensure that ETL workflows are executed as scheduled with
correct dependencies.

 Look for the successful execution of data-loading workflows.


 Make sure target tables are correctly populated with all expected records, and none
were rejected.
 Verify all dependencies among data-load workflows—including source-to-staging,
staging-to-operational data store (ODS), and staging-to-data marts—have been
properly defined.
 Check all ETL error and exception log messages for correctable issues.
 Verify that data-load jobs start and end at predefined times.

Performance and scalability testing checklist


As the volume of data in a warehouse grows, ETL execution times can be expected to increase,
and performance of queries often degrade. These changes can be mitigated by having a solid
technical architecture and efficient ETL design. The aim of performance testing is to point out
potential weaknesses in the ETL design, such as reading a file multiple times or creating
unnecessary intermediate files. A performance and scalability testing checklist helps discover
performance issues.

 Load the database with peak expected production volumes to help ensure that the
volume of data can be loaded by the ETL process within the agreed-on window.
 Compare ETL loading times to loads performed with a smaller amount of data to
anticipate scalability issues. Compare the ETL processing times component by
component to pinpoint any areas of weakness.
 Monitor the timing of the reject process, and consider how large volumes of rejected
data will be handled.
 Perform simple and multiple join queries to validate query performance on large
database volumes. Work with business users to develop sample queries and acceptable
performance criteria for each query.
System testing checklist
One of the objectives of data warehouse testing is to help ensure that the required business
functions are implemented correctly. This phase includes data verification, which tests the
quality of data populated into target tables. A system-testing checklist can help with this
process.

 Make sure the functionality of the system meets the business specifications.
 Look for the count of records in source tables and compare them with counts in target
tables, followed by analysis of rejected records.
 Check for end-to-end integration of systems and connectivity of the infrastructure—for
example, make sure hardware and network configurations are correct.
 Check all transactions, database updates, and data-flow functions for accuracy.
 Validate the functionality of the business reports.

Common Issues for ETL Testing:


1. Unique constraint violation error occurs.
Issue Details: While running a job, if job gets failed and in workflow session logs unique
constraint violation error occurs.

Trouble Shooting:

 Check the records in the corresponding table and delete the violated record.
 Check the run date mentioned in the transfer control table and try incrementing it.

2. Table or view does not exist issue.


Issue Details: While running a job if job gets failed and in session log we got an error like table
or view does not exist.

Trouble Shooting:

 Check the profile file for the corresponding Autosys job.


 Change the values accordingly for corresponding phase of testing respective to correct
servers.
 Check the connection string in power center
 Change the source and target Servers/DBs accordingly
 If above two things are updated, Login to the corresponding server with BL ids used for the
phase of testing and verify the table and view there.

3. Error related to integration service.

Issue Details: While running a job if job gets failed and in session log we got an error related
to integration service.

Trouble Shooting:
 Check the profile file for the corresponding Autosys job.
 Check for the correct repository of Power center corresponding to job.
 Verify that the Integration service should be updated correctly.
 Open the Power Center Monitor and verify that the workflow folder should be available
under the same Integration service.

4. Batch launched id access issue.

Issue Details: While running a job if job gets failed and in logs, Batch launched id access issue
is occurred.

Trouble Shooting:

 Check the access of the BL id for corresponding server/Database.


 If not, get the access of the same Server/DB for Batch Launcher id.
 If the id is accessible, check in the profile file if it has the right database/server name.

5. Record counts mismatch issue

Issue Details: If during record counts check in source and target if it’s mismatched in large
account.

Trouble Shooting:

 Check the load type, if its full load this is an issue and raise the defect.
 If its incremental load, check the source extract time and stop extract time, Change the
timestamp in the parameter table and re run the job.
 Check the no of processed rows and the no of loaded rows in the load summary of the
session log.

6. Workflow and session logs generation issue.

Issue Details: If Autosys job got failed and no workflow or session logs got generated.

Trouble Shooting:

 Check the JIL source of the job: In Command line of JIL source, Hyphen ‘–‘anddot ‘.’
Should be placed at appropriate position.
 Check if the profile files and the connections strings are pointing to the right
databases/servers.

7. Data loading issue for Full load.

Issue Details: If after running Autosys job data is not loaded at target side and in logs in the
load summary section there is 0 extraction and transformation for Full load.

Trouble Shooting:

 Check in the source table, there should be data in the source table. There is also a
possibility that the source is having the older data than the cut of data in the control
table or the last processed time stamp.
 If data is loading from STG to DW side, in the main STG table data should be present.
 Check the max load date in the DW table and process date in the stage table; if it is
already matching then increment the process date.

8. Data loading issue for Full load.

Issue Details: If after running AutoSys job data is not loaded at target side and in logs in the
load summary section there is 0 extraction and transformation for Incremental load.

Trouble Shooting:

 Check the transfer parameter entries for source and stop extract time and then check
the same in the logs. Time period for extraction of data load to till data load should be
corrected in the transfer parameter table.

9. Incremental Job failure.

Issue Details: If Autosys Incremental Job got failed.

Trouble Shooting:

 Check the transfer parameter table and check the parameter values corresponding to
the incremental Job.

10. Autosys Job got failed which don’t have work flow.

Issue Details: If the AutoSys Job which don’t have workflow got failed.

Trouble Shooting:

 Check the AutoSys logs in .err file. If the job got failed file size of .err becomes none
zero byte and if the job got succeeded vice versa.

11. AutoSys Job failure during flat file data loading.

Issue Details: If the job got failed when loading data from flat files to table.

Trouble Shooting:

 Check the AutoSys logs and catch the error from .err file.
 If the issue is related with files for example: invalid file or any other issues found, run
the gunzip –t filename command where the file is placed. It will return the exact error
for that file.

12. Data comparison issue in large extent.

Issue Details: During data comparison for source and target if large number of differences are
found in DB comparator result.

Trouble Shooting:

 Check the metadata columns at the target end and remove those columns from the
target end query of DB comparator.
 Check the order by in the both queries and modify the queries with proper order by
clause by using primary or unique key in both source and target end.
 Remove the timestamps from the comparison rule as they are intercepted differently by
the Sybase and the oracle database.

13. Box Job failure.

Issue Details: If box job got failed.

Trouble Shooting:

 Check all the sub jobs under box jobs.


 Pick the failed job and check the session logs for that job and check for the above
issues if any.

14. Box Job running for long time issue.

Issue Details: If the box jobs keep running for long time.

Resolution:

 Verify that there should not be any job under Box job should be on hold status.
 Change the status of the on hold sub job to off hold and trigger the box job.
 Put the failed sub-jobs on ice if it’s not a mandatory/critical dependent job

15. Workflow monitors issue.

Issue Details: If not able to see the workflow status after running job in workflow monitor,
getting error while opening.

Trouble shooting:

There is a network down reason issue and testing team need to contact to support team for
Informatica Power Center.

ETL Testing Challenges:


ETL testing is quite different from conventional testing. There are many challenges we faced
while performing data warehouse testing. Here is the list of few ETL testing challenges I
experienced on my project:
 Incompatible and duplicate data.
 Loss of data during ETL process.
 Unavailability of inclusive test bed.
 Testers have no privileges to execute ETL jobs by their own.
 Volume and complexity of data is very huge.
 Fault in business process and procedures.
 Trouble acquiring and building test data.
 Missing business flow information.

You might also like