0% found this document useful (0 votes)
548 views

SAP BW - ETL Testing Data Warehouse Testing Tutorial (A Complete Guide)

The document provides an overview of ETL testing and data warehouse testing. It discusses the ETL process, common ETL testing techniques like data transformation testing and source to target count testing. It also outlines the typical ETL/data warehouse testing process from requirement understanding to execution. Key differences between database testing and data warehouse testing are highlighted.

Uploaded by

tovinon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
548 views

SAP BW - ETL Testing Data Warehouse Testing Tutorial (A Complete Guide)

The document provides an overview of ETL testing and data warehouse testing. It discusses the ETL process, common ETL testing techniques like data transformation testing and source to target count testing. It also outlines the typical ETL/data warehouse testing process from requirement understanding to execution. Key differences between database testing and data warehouse testing are highlighted.

Uploaded by

tovinon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Tuto 0: scenario

Tutorial #1: ETL Testing Data Warehouse Testing Introduction Guide

Tutorial #2: ETL Testing Using Informatica PowerCenter Tool

Tutorial #3: ETL vs. DB Testing

Tutorial #4: Business Intelligence (BI) Testing: How to Test Business Data

Tutorial #5: Top 10 ETL Testing Tools

1. Tuto 1 :ETL Testing Data Warehouse


Testing Tutorial (A Complete Guide)
June 16, 2023

ETL Testing / Data Warehouse Process and Challenges:

Today let me take a moment and explain my testing fraternity about one of the most
demanding and upcoming skills for my tester friends i.e. ETL testing (Extract, Transform, and
Load).

This tutorial will present you with a complete idea about ETL testing and what we do to
test the ETL process.

Complete List Tutorials in this series:

• Tutorial #1: ETL Testing Data Warehouse Testing Introduction Guide


• Tutorial #2: ETL Testing Using Informatica PowerCenter Tool
• Tutorial #3: ETL vs. DB Testing
• Tutorial #4: Business Intelligence (BI) Testing: How to Test Business Data
• Tutorial #5: Top 10 ETL Testing Tools

It has been observed that Independent Verification and Validation is gaining huge market
potential and many companies are now seeing this as a prospective business gain.
Customers have been offered a different range of products in terms of service offerings,
distributed in many areas based on technology, process, and solutions. ETL or data warehouse
is one of the offerings which are developing rapidly and successfully.

Through ETL process, data is fetched from the source systems, transformed as per business
rules and finally loaded to the target system (data warehouse). A data warehouse is an
enterprise-wide store which contains integrated data that aids in the business decision-making
process. It is a part of business intelligence.

What You Will Learn: [hide]

• Why do Organizations Need Data Warehouse?


• ETL Process
• ETL Testing Techniques
• ETL/Data Warehouse Testing Process
• Difference Between Database and Data Warehouse Testing
• ETL Testing Challenges
• Recommended Reading
Why do Organizations Need Data Warehouse?

Organizations with organized IT practices are looking forward to creating the next level of
technology transformation. They are now trying to make themselves much more operational
with easy-to-interoperate data.

Having said that data is the most important part of any organization, it may be everyday data
or historical data. Data is the backbone of any report and reports are the baseline on which all
vital management decisions are taken.

Most companies are taking a step forward in constructing their data warehouse to store and
monitor real-time data as well as historical data. Crafting an efficient data warehouse is not
an easy job. Many organizations have distributed departments with different applications
running on distributed technology.

ETL tool is employed in order to make a flawless integration between different data sources
from different departments.

The ETL tool will work as an integrator, extracting data from different sources; transforming
it into the preferred format based on the business transformation rules and loading it into a
cohesive DB known as Data Warehouse.

Well planned, well defined and effective testing scope guarantees smooth conversion of
the project to production. A business gains real buoyancy once the ETL processes are verified
and validated by an independent group of experts to make sure that the data warehouse is
concrete and robust.

ETL or Data warehouse testing is categorized into four different engagements


irrespective of the technology or ETL tools used:

• New Data Warehouse Testing: New DW is built and verified from scratch. Data
input is taken from customer requirements and different data sources and a new data
warehouse is built and verified with the help of ETL tools.
• Migration Testing: In this type of project, customers will have an existing DW and
ETL performing the job, but they are looking to bag new tools in order to improve
efficiency.
• Change Request: In this type of project new data is added from different sources to
an existing DW. Also, there might be a condition where customers need to change
their existing business rules or they might integrate the new rules.
• Report Testing: Report is the end result of any Data Warehouse and the basic propose
for which DW builds. The report must be tested by validating the layout, data in the
report and calculation.
ETL Process

ETL Testing Techniques

1) Data Transformation Testing: Verify if data is transformed correctly according to various


business requirements and rules.

2) Source to Target Count Testing: Make sure that the count of records loaded in the target
is matching with the expected count.

3) Source to Target Data Testing: Make sure that all projected data is loaded into the data
warehouse without any data loss or truncation.

4) Data Quality Testing: Make sure that ETL application appropriately rejects, replaces with
default values and reports invalid data.

5) Performance Testing: Make sure that data is loaded in the data warehouse within the
prescribed and expected time frames to confirm improved performance and scalability.

6) Production Validation Testing: Validate the data in the production system & compare it
against the source data.

7) Data Integration Testing: Make sure that the data from various sources has been loaded
properly to the target system and all the threshold values are checked.

8) Application Migration Testing: In this testing, ensure that the ETL application is working
fine on moving to a new box or platform.
9) Data & constraint Check: The datatype, length, index, constraints, etc. are tested in this
case.

10) Duplicate Data Check: Test if there is any duplicate data present in the target system.
Duplicate data can lead to incorrect analytical reports.

Apart from the above ETL testing methods, other testing methods like system integration
testing, user acceptance testing, incremental testing, regression testing, retesting and
navigation testing are also carried out to make sure that everything is smooth and reliable.

ETL/Data Warehouse Testing Process

Similar to any other testing that lies under Independent Verification and Validation, ETL is
also going through the same phase.

• Requirement Understanding
• Validating
• Test Estimation is based on a number of tables, the complexity of rules, data volume
and performance of a job.
• Test Planning is based on the inputs from test estimation and business requirements.
We need to identify here as what is in scope and what is out of scope. We will also
look out for dependencies, risks and mitigation plans during this phase.
• Designing Test cases and Test scenarios from all the available inputs. We also need to
design mapping documents and SQL scripts.
• Once all the test cases are ready and approved, the testing team will proceed to
perform pre-execution checks and test data preparation for testing.
• Lastly, execution is performed until exit criteria are met. So, the execution phase
includes running ETL jobs, monitoring job runs, SQL script execution, defect logging,
defect retesting and regression testing.
• Upon successful completion, a summary report is prepared and the closure process is
done. In this phase, sign off is given to promote the job or code to the next phase.

The first two phases i.e., requirement understanding and validation can be regarded as pre-
steps of ETL test process.

So, the main process can be represented as below:

It is necessary to define a test strategy which should be mutually accepted by stakeholders


before starting actual testing. A well-defined test strategy will ensure that the correct approach
has been followed to meet the testing aspirations.
ETL/Data Warehouse testing might require writing SQL statements extensively by the testing
team or maybe tailoring the SQL provided by the development team. In any case, a testing
team must be aware of the results that they are trying to get using those SQL statements.

Difference Between Database and Data Warehouse Testing

There is a popular misunderstanding that database testing and data warehouses are similar
while the fact is that both hold different directions in testing.

• Database testing is done using a smaller scale of data normally with OLTP (Online
transaction processing) type of databases while data warehouse testing is done with
large volume with data involving OLAP (online analytical processing) databases.
• In database testing, normally data is consistently injected from uniform sources while
in data warehouse testing most of the data comes from different kind of data sources
which are sequentially inconsistent.
• We generally only perform CRUD (Create, read, update and delete) operations during
database testing while in data warehouse testing we use read-only (Select) operation.
• Normalized databases are used in DB testing while demoralized DB is used in data
warehouse testing.

There are a number of universal verifications that have to be carried out for any kind of data
warehouse testing.

Given below is the list of objects that are treated as essential for validation in this
testing:

• Verify that data transformation from source to destination works as expected.


• Verify that the expected data is added to the target system.
• Verify that all DB fields and field data are loaded without any truncation.
• Verify data checksum for record count match.
• Verify that for rejected data proper error logs are generated with all the details.
• Verify NULL value fields
• Verify that duplicate data is not loaded.
• Verify data integrity

=> Know the difference between ETL/Data warehouse testing & Database Testing.

ETL Testing Challenges

This testing is quite different from conventional testing. Many challenges are faced while
performing data warehouse testing.

Here are a few challenges that I experienced on my project:

• Incompatible and duplicate data


• Loss of data during ETL process.
• Unavailability of the inclusive testbed.
• Testers have no privileges to execute ETL jobs on their own.
• The volume and complexity of the data is huge.
• Fault in business processes and procedures.
• Trouble acquiring and building test data
• Unstable testing environment
• Missing business flow information

Data is important for businesses to make critical business decisions. ETL testing plays a
significant role in validating and ensuring that the business information is accurate, consistent
and reliable. It also minimizes the hazard of data loss in production.

Hope these tips will help you ensure that your ETL process is accurate and the data
warehouse built by this is a competitive advantage for your business.

Complete List of ETL Testing Tutorials:

• Tutorial #1: ETL Testing Data Warehouse Testing Introduction guide


• Tutorial #2: ETL Testing Using Informatica PowerCenter Tool
• Tutorial #3: ETL vs. DB Testing
• Tutorial #4: Business Intelligence (BI) Testing: How to Test Business Data
• Tutorial #5: Top 10 ETL Testing Tools

This is a guest post by Vishal Chhaperia who is working in an MNC in a test management
role. He has extensive experience in managing multi-technology QA projects, Processes and
teams.

Further Reading =>> Best ETL Test Automation Tools

Have you worked on ETL testing? Please share your ETL/DW testing tips and
challenges below.

2.Tuto2:How to Perform ETL Testing


Using Informatica PowerCenter Tool
August 25, 2023

It is a known fact that ETL testing is one of the crucial aspects of any Business Intelligence
(BI) based application. In order to get the quality assurance and acceptance to go live in
business, the BI application should be tested well beforehand.

The primary objective of ETL testing is to ensure that the Extract, Transform & Load
functionality is working as per the business requirements and in sync with the performance
standards.
Before we dig into ETL Testing with Informatica, it is essential to know what ETL and

Informatica are.

What You Will Learn: [hide]

• What you will learn in this ETL tutorial:


• Informatica PowerCenter ETL Testing Tool:
• Understanding ETL testing specific to Informatica:
• Classification of ETL Testing in Informatica:
• Benefits of Using Informatica as an ETL tool:
• Some useful Tips to assist you in Informatica ETL testing:
• Conclusion:
• Recommended Reading

What you will learn in this ETL tutorial:

• Basics of ETL, Informatica & ETL testing.


• Understanding ETL testing specific to Informatica.
• Classification of ETL testing in Informatica.
• Sample test cases for Informatica ETL testing.
• Benefits of using Informatica as an ETL tool.
• Tips & Tricks to aid you in testing.

In computing, Extract, Transform, Load (ETL) refers to a process in database usage and
especially in data warehousing that performs:

• Data extraction – Extracts data from homogeneous or heterogeneous data sources.


• Data Transformation – Formats the data into required type.
• Data Load – Move and store the data to a permanent location for long term usage.

Informatica PowerCenter ETL Testing Tool:


Informatica PowerCenter is a powerful ETL tool from Informatica Corporation. It is a
single, unified enterprise data integration platform for accessing, discovering, and integrating
data from virtually any business system, in any

Also Read => List of Top Informatica Scheduling Integration Tools

It is a single, unified enterprise data integration platform for accessing, discovering, and
integrating data from virtually any business system, in any format and delivering that data
throughout the enterprise at any speed. Through Informatica PowerCenter, we create
workflows that perform end to end ETL operations.

Download and Install Informatica PowerCenter:

To install and configure Informatica PowerCenter 9.x use the below link that has step by step
instructions:
=> Informatica PowerCenter 9 Installation and Configuration Guide

Understanding ETL testing specific to Informatica:

ETL testers often have pertinent questions about what to test in Informatica and how much
test coverage is needed?

Let me take you through a tour on how to perform ETL testing specific to Informatica.

The main aspects which should be essentially covered in Informatica ETL testing are:

• Testing the functionality of Informatica workflow and its components; all the transformations
used in the underlying mappings.
• To check the data completeness (i.e. ensuring if the projected data is getting loaded to the
target without any truncation and data loss),
• Verifying if the data is getting loaded to the target within estimated time limits (i.e.
evaluating performance of the workflow),
• Ensuring that the workflow does not allow any invalid or unwanted data to be loaded in the
target.

Classification of ETL Testing in Informatica:

For better understanding and ease of the tester, ETL testing in Informatica can be divided into
two main parts –

#1) High-level testing


#2) Detailed testing

Firstly, in the high-level testing:

• You can check if the Informatica workflow and related objects are valid or not.
• Verify if the workflow is getting completed successfully on running.
• Confirm if all the required sessions/tasks are being executed in the workflow.
• Validate if the data is getting loaded to the desired target directory and with the expected
filename (in case the workflow is creating a file), etc.
In a nutshell, you can say that the high-level testing includes all the basic sanity checks.

Coming to the next part i.e. detailed testing in Informatica, you will be going in depth to
validate if the logic implemented in Informatica is working as expected in terms of its results
and performance.

• You need to do the output data validations at the field level which will confirm that each
transformation is operating fine
• Verify if the record count at each level of processing and finally if the target is as expected.
• Monitor thoroughly elements like source qualifier and target in source/target statistics of
session
• Ensure that the run duration of the Informatica workflow is at par with the estimated run
time.

To sum up, we can say that the detailed testing includes a rigorous end to end validation of
Informatica workflow and the related flow of data.

Let us take an example here:

We have a flat file that contains data about different products. It stores details like the name
of the product, its description, category, date of expiry, price, etc.

My requirement is to fetch each product record from the file, generate a unique product id
corresponding to each record and load it into the target database table. I also need to
suppress those products which either belong to the category ‘C’ or whose expiry date is less
than the current date.

Say, my flat file (source) looks like this:

(Note: Click on any image for enlarged view)

Based on my requirements stated above, my database table (Target) should look like this:

Table name: Tbl_Product

Prod_ID (Primary
Product_name Prod_description Prod_category Prod_expiry_date Prod_price
Key)

1001 ABC This is product ABC. M 8/14/2017 150

1002 DEF This is product DEF. S 6/10/2018 700


Prod_ID (Primary
Product_name Prod_description Prod_category Prod_expiry_date Prod_price
Key)

This is product
1003 PQRS M 5/23/2019 1500
PQRS.

Now, say, we have developed an Informatica workflow to get the solution for my ETL
requirements.

The underlying Informatica mapping will read data from the flat file, pass the data through a
router transformation that will discard rows which either have product category as ‘C’ or
expiry date, then I will be using a sequence generate to create the unique primary key values
for Prod_ID column in Product Table.

Finally, the records will be loaded to Product table which is the target for my Informatica
mapping.

Examples:

Below are the sample test cases for the scenario explained above.

You can use these test cases as a template in your Informatica testing project and add/remove
similar test cases depending upon the functionality of your workflow.

#1) Test Case ID: T001

Test Case Purpose: Validate workflow – [workflow_name]

Test Procedure:

• Go to workflow manager
• Open workflow
• Workflows menu-> click on validate

Input Value/Test Data: Sources and targets are available and connected
Sources: [all source instances name]
Mappings: [all mappings name]
Targets: [all target instances name]
Session: [all sessions name]

Expected Results: Message in workflow manager status bar: “Workflow [workflow_name] is


valid “

Actual Results: Message in workflow manager status bar: “Workflow [workflow_name] is


valid “

Remarks: Pass

Tester Comments:
#2) Test Case ID: T002

Test Case Purpose: To ensure if the workflow is running successfully

Test Procedure:

• Go to workflow manager
• Open workflow
• Right click in workflow designer and select Start workflow
• Check status in Workflow Monitor

Input Value/Test Data: Same as test data for T001

Expected Results: Message in the output window in Workflow manager: Task Update:
[workflow_name] (Succeeded)

Actual Results: Message in the output window in Workflow manager: Task Update:
[workflow_name] (Succeeded)

Remarks: Pass

Tester Comments: Workflow succeeded

Note: You can easily see the workflow run status (failed/succeeded) in Workflow monitor as
shown in below example. Once the workflow will be completed, the status will reflect
automatically in workflow monitor.

In the above screenshot, you can see the start time and end time of workflow as well as the
status as succeeded.

#3) Test Case ID: T003

Test Case Purpose: To validate if the desired number of records are getting loaded to target

Test Procedure: Once the workflow has run successfully, go to the target table in database
Check the number of rows in target database table

Input Value/Test Data: 5 rows in the source file


Target: database table – [Tbl_Product]
Query to run in SQL server: Select count(1) from [Tbl_Product]
Expected Results: 3 rows selected

Actual Results: 3 rows selected

Remarks: Pass

Tester Comments:

#4) Test Case ID: T004

Test Case Purpose: To check if sequence generator in Informatica mapping is working fine
for populating [primary_key_column_name e.g. Prod_ID] column

Test Procedure: Once the workflow has run successfully, go to the target table in database
Check the unique sequence generated in column Prod_ID

Input Value/Test Data: value for Prod_ID left blank for every row in source file
Sequence Generator mapped to Prod_ID column in the mapping
Sequence generator start value set as 1001
Target: database table- [Tbl_Product] opened in SQL Server

Expected Results: Value from 1001 to 1003 populated against every row for Prod_ID
column

Actual Results: Value from 1001 to 1003 populated against every row for Prod_ID column

Remarks: Pass

Tester Comments:

#5) Test Case ID: T005

Test Case Purpose: To validate if router transformation is working fine to suppress records
in case the product category is ‘C’ or the product has got expired.

Test Procedure: Once the workflow has run successfully, go to the target table in database
Run the query on the target table to check if the desired records have got suppressed.

Input Value/Test Data: 5 rows in the source file


Target: database table – [Tbl_Product]
Query to run in SQL server: Select * from Product where Prod_category=’C’ or
Prod_expiry_date < sysdate;

Expected Results: no rows selected

Actual Results: no rows selected

Remarks: Pass

Tester Comments: (if any)


#6) Test Case ID: T006

Test Case Purpose: To check the performance of the workflow by recording the workflow
runtime.

Test Procedure:

• Open the workflow monitor and go the run that was done as part of T001.
• Record the start time and end time of workflow.
• Calculate total run time by subtracting start time from end time.

Input Value/Test Data: Workflow has run successfully


Start time of workflow in monitor
End time of workflow in monitor.

Expected Results: 2 min 30 secs

Actual Results: 2 min 15 secs

Remarks: Pass

Tester Comments: Considering the test as ‘Pass’ in case the actual run duration is +/- 10% of
expected run duration.

#7) Test Case ID: T007

Test Case Purpose: To validate data at target table column level in order to ensure that there
is no data loss.

Test Procedure: Once the workflow has run successfully, go to the SQL Server.
Run the query on the target table to check there is no data loss.

Input Value/Test Data: Workflow has run successfully


One sample record from source flat file.
SQL Query: Select Top 1 * from Tbl_Patient;

Expected Results:
1 row returned

Prod_ID (Primary
Product_name Prod_description Prod_category Prod_expiry_date Prod_price
Key)

This is product
1001 ABC M 8/14/2017 150
ABC.

Actual Results:
1 row returned.
Prod_ID (Primary
Product_name Prod_description Prod_category Prod_expiry_date Prod_price
Key)

This is product
1001 ABC M 8/14/2017 150
ABC.

Remarks: Pass

Tester Comments: Considering the test as ‘Pass’ in case the actual run duration is +/- 10% of
expected run duration.

Benefits of Using Informatica as an ETL tool:

Informatica is a popular and successful ETL tool because:

• It has a high “go live” success rate (nearly 100%)


• Informatica has the capability of enabling Lean Integration.
• It is a moderately priced tool when compared to other ETL tools.
• It comes with internal job scheduler. So, there is no need to use third-party scheduler
separately like some other ETL tools do.
• Easy training and tool availability has made Informatica more popular.

Suggested reading =>> Top ETL Test Automation Tools

Some useful Tips to assist you in Informatica ETL testing:

• Generate the test data before executing the test scenarios.


• The test data should be in sync with the test case it is used for.
• Make sure that you have covered all the 3 scenarios – no data is submitted, invalid data is
submitted and valid data is submitted as an input to Informatica workflow.
• Make sure to test that all the required data is getting loaded to target completely. For this,
you can use test case – T003 described above as a sample.
• It is very important to test that the workflow is doing all the data transformations correctly
as per business rules.
• I would suggest that for each transformation applied in your Informatica mapping, you
should have a checklist to verify output data against it. That way, you can report bugs easily if
any transformation is not working fine.

Conclusion:

So, we have seen in detail, some of the sample test cases that can be used as a template to
cover ETL testing in Informatica. As I mentioned earlier, you can add/remove/modify these
test cases depending on the scenario you have in your project.

As I mentioned earlier, you can add/remove/modify these test cases depending on the scenario
you have in your project.

The Informatica PowerCenter is a foundation for any data integration activities.


You can easily perform script-free automated testing of data copied to test, dev or production
environment, and that is the reason why PowerCenter is the most popular ETL tool nowadays.

Recommended reading => ETL vs. DB Testing – A Closer Look at ETL Testing Need

About the author: This is a guest article by Priya K. She is having 4+ years of hands-on
experience in developing and supporting Informatica ETL applications.

Feel free to post your queries/comments about this ETL tool.

Recommended Reading

• Best Software Testing Tools 2023 [QA Test Automation Tools]


• How to Perform Backend Testing
• ETL Testing Interview Questions and Answers
• ETL Testing Data Warehouse Testing Tutorial (A Complete Guide)
• Testing Primer eBook Download
• ETL vs. DB Testing - A Closer Look at ETL Testing Need, Planning and ETL Tools
• Load Testing Using LoadUI - A Free and Open Source Load Testing Tool
• 10 Best ETL Testing Tools in 2023 [TOP SELECTIVE]

Categories Database Testing, Software Testing Tools

The Differences Between Unit Testing, Integration Testing and Functional Testing

How to Get More Interview Calls? 4 Reasons Why You Are Not Right Now

15 thoughts on “How to Perform ETL Testing Using Informatica


PowerCenter Tool”

1. Devanath

at

Really useful and nice info. Informatica installation steps are very clear.

Reply

2. Sumit

at

good tutorial. which are other best tools for etl?

Reply

3. ganesh

at
..

Reply

4. Priya Kaushal

at

Hi Sumit,

To answer your question, below is the list of all the ETL tools:

Informatica – Power Center


IBM – Websphere DataStage(Formerly known as Ascential DataStage)
SAP – BusinessObjects Data Integrator
IBM – Cognos Data Manager (Formerly known as Cognos DecisionStream)
Microsoft – SQL Server Integration Services
Oracle – Data Integrator (Formerly known as Sunopsis Data Conductor)
SAS – Data Integration Studio
Oracle – Warehouse Builder
AB Initio
Information Builders – Data Migrator
Pentaho – Pentaho Data Integration
Embarcadero Technologies – DT/Studio
IKAN – ETL4ALL
IBM – DB2 Warehouse Edition
Pervasive – Data Integrator
ETL Solutions Ltd. – Transformation Manager
Group 1 Software (Sagent) – DataFlow
Sybase – Data Integrated Suite ETL
Talend – Talend Open Studio
Expressor Software – Expressor Semantic Data Integration System
Elixir – Elixir Repertoire
OpenSys – CloverETL

If you are a software tester, I would like to add that Informatica Data Validation
Option provides an ETL testing tool that can accelerate and automate ETL testing in
both production environments and development & test. This means that you can
deliver complete, repeatable and auditable test coverage in less time with no
programming skills required.

Reply

o Rani Basava

at

Hi Priya,
I want to learn about the Informatica data validation option tool.
Thanks,
Rani

Reply

5. prasanna

at

Hi Priya,

i want to learn ETL testing and informatica tool, can you please let me know what
kind of prior knowledge should help me to understand all these concepts sooner.

Reply

6. Raj

at

Thanks . useful information on infa Testing.

Reply

7. Aaradhya

at

Hi,
Nice blog about How to Perform ETL Testing Using Informatica PowerCenter Tool.
Can you explain about How can you define informatica powercenter in a very detailed
manner?
Thanks,
Aaradhya,

Reply

8. Jinka Venkatesh

at

Hi Team,

i had to know one thing.


if we had a work flow,we need to test in each step
like i source qualifier, command prompt ..we can provide script like that
in how many or levels we need to test that work flow
how can we ensure that each and every step we validated?
could you please help me in that
Thanks in Advance
Jinka

Reply

9. sanjay bhachand

at

Hi Priya K.,
Thank you share yor exp. with us, priya i need your help, i facing problem to clear a
informatica Developer Interview, i always rejected after first round, and not getting
exactly reason. so please share to me about how to prepare and what is flow of
preparation.
please share all things :[email protected]

thank you in advance.

Reply

10. Gowthami

at

Hi Priya K.,

I have 3 years exp in informatica development. I want to learn new technology,can


you suggest any other technology which helps my carrier growth.

Thank you in advance.

Reply

11. Gowtham

at

Hi priya,

The blog is very useful for me.


I have just stared learning ETL.
It will be great help if you share more test cases for different Transformations.
[email protected]

Thanks in advance.

Reply

12. R Pradhan
at

Hi ,
We are currently upgrading our data warehouses to Oracle . As part of the upgrade
testing, we are required to validate the Informatica workflows that ETL the data into
the databases. We are new to this kind of thing and would therefore like to know what
to lookout for when testing. Also any hints on drawing up a test plan around this
testing will be helpful as this has never been done in our organisation.
Any help and guidance will be much appreciated.

Reply

13. uma

at

Hi Priya,
Nice Blog thanks for sharing nice article on informatica testing this is first time i came
across in detail info.

Reply

14. Abhishek

at

Hi Priya,
Can you suggest a course / website for learning Informatica ETL Testing? Can you
please share the details here [email protected]

About SoftwareTestingHelp

Helping our community since 2006! Most popular portal for Software professionals with 300
million+ visits and 400,000+ followers! You will absolutely love our creative content on QA, Dev,
Software Tools & Services Reviews!

Recommended Reading

• Best Software Testing Tools 2023 [QA Test Automation Tools]


• How to Perform Backend Testing
• ETL Testing Interview Questions and Answers
• ETL Testing Data Warehouse Testing Tutorial (A Complete Guide)
• Protected: Testing Primer eBook Download
• ETL vs. DB Testing – A Closer Look at ETL Testing Need, Planning and ETL Tools
• Load Testing Using LoadUI – A Free and Open Source Load Testing Tool
• 10 Best ETL Testing Tools in 2023 [TOP SELECTIVE]
3.Tuto3:ETL vs. DB Testing – A Closer
Look at ETL Testing Need, Planning
and ETL Tools
June 21, 2023

Software Testing has a variety of areas to be concentrated. Major varieties are functional and
non-functional testing. Functional Testing is the procedural way to ensure so that the
functionality developed works as expected. Non-functional testing is the approach by which
the non-functional aspects like enhanced or performance at an acceptable level can be
ensured.

There is another flavor of testing called DB testing. Data is organized in the database in the
form of tables. For business, there can be flows where the data from the multiple tables can be
merged or processed on to a single table and vice versa.

ETL Testing is one another kind of testing that is preferred in the business case where a kind
of reporting need is sought by the clients. The reporting is sought in order to analyze the
demands, needs and the supply so that clients, business and the end-users are very well served
and benefited.

What will you learn in this tutorial?

In this tutorial, you will learn what is Database Testing, what is ETL Testing, a difference
between DB Testing and ETL Testing, and more details about ETL testing need, process, and
planning with real examples.

We have also covered ETL Testing in more detail on the below page. Also, have a look at it.
=> ETL Testing / Data Warehouse Testing Tips and Techniques

What You Will Learn: [hide]

• DB Testing vs. ETL Testing


o Comparative Study Of ETL And DB Testing
o Why Should The Business Go For ETL?
o ETL Test Planning
o Critical ETL Needs
o Basic Issues In ETL Testing
o Points To Remember While ETL Test Planning And Execution
o ETL Tools And Their Significant Usage
o Conclusion
o Recommended Reading

DB Testing vs. ETL Testing

Most of us are a little confused over considering that both database testing and the ETL
testing are similar and the same. The fact is they are similar but not the same.

DB Testing:

DB Testing is usually used extensively in the business flows where there are multiple data
flows occurring in the application from multiple data sources on to a single table. The data
source can be a table, flat file, application or anything else that can yield some output data.

In turn, the output data obtained can still be used as input for the sequential business flow.
Hence when we perform DB testing the most important thing that has to be captured is the
way the data can get transformed from the source along with how it gets saved in the
destination location.

Synchronization is one major and the essential thing that has to be considered when
performing the DB Testing. Due to the positioning of the application in the architectural flow,
there might be few issues with the data or DB synchronization. Hence while performing the
testing, this has to be taken care of as this can overcome the potential invalid defects or bugs.

Example #1:

Project “A” has integrated architecture where the particular application makes use of data
from several other heterogeneous data sources. Hence the integrity of these data with the
destination location has to be done along with the validations for the following:
• Primary foreign key validation
• Column values integrity
• Null values for any columns

What is ETL Testing?

ETL Testing is a special type of testing that the client wants to have it done for their
forecasting and analysis of their business. This is mostly used for reporting purposes. For
instance, if the clients need to have reported on the customers who use or go for their product
based on the day they purchase, they have to make use of the ETL reports.

Post analysis and reporting, this data is data warehoused to a data warehouse where the old
historical business data has to be moved.

This is a multiple level testing as the data from the source is transformed into multiple
environments before it reaches the final destined location.

Example #2:

We will consider a group “A” doing retail customer business through a shopping market
where the customer can purchase any household items required for their day to day
survival. Here all the customers visiting are provided with a unique membership id with
which they can gain points every time they come to purchase things from the shopping
market.

The regulations provided by the group say that the points gained expire every year. And
depending upon their usage, the membership can be either upgraded to a higher grade
member or downgraded to a lower grade member comparatively to the current grade.

After 5 years of shopping market establishment now management is looking for scaling up
their business along with revenue.

Hence they required few business reports so that they can promote their customers.

In Database Testing we perform the following:


#1) Validations on the target tables which are created with columns with logical calculations
as described in the logical mapping sheet and the data routing document.

#2) Manipulations like Inserting, Updating and Deletion of the customer data can be
performed on any end-user POS application in an integrated system along with the back-end
database so that the same changes are reflected in the end system.

#3) DB testing has to ensure that there is no customer data that has been misinterpreted or
even truncated. This might lead to serious issues like incorrect mapping of customer data with
their loyalty

In ETL Testing we check for the following:

#1) Assuming there are 100 customers in the source, you will check whether all these
customers along with their data from the 100 rows have been moved from the source system
to the target. This is known as verification of Data completeness check.

#2) Checking if the customer data has been properly manipulated and demonstrated in the 100
rows. This is simply called verification of Data Accuracy check.

#3) Reports for the customers who have gained points more than x values within a particular
period.

Comparative Study Of ETL And DB Testing

ETL and DB testing have few of the aspects differing within themselves that is more essential
to be understood before performing them. This helps us in understanding the values and
significance of the testing and the way it helps the business.

Following is a tabular form that describes the basic behavior of both the testing formats.

DB Testing ETL Testing

Primary goal Data integration BI Reporting

Applicable In the functional system where the External to the business flow environment.
place business flow occurs input is the historical business data

Automation
QTP, Selenium Informatica, QuerySurge, COGNOS
tool

Severe impacts can lead as it is the Potential impacts as in when the clients
Business
integrated architecture of the business wants to have the forecasting and analysis to
impact
flows be done

Modelling
Entity Relationship Dimensional
used

System Online Transaction Processing Online Analytical Processing


DB Testing ETL Testing

Data Nature Normalized data is being used here Denormalized data is being used here

Why Should The Business Go For ETL?

Plenty of business needs are available for them to consider ETL testing. Every business has to
have its unique mission and the line of business. All business has its product life cycle which
takes the generic form:

It is very clear that any new product enters the market with tremendous growth in sales and
till a stage called maturity and thereafter it declines in sales. This gradual change witnesses a
definite drop in business growth. Hence it is more important to analyze the customer needs for
the business growth and other factors required to make the organization more profitable.

So in reality, the clients want to analyze the historical data and come up with some reports
strategically.
ETL Test Planning

One of the main steps in ETL testing is about planning the test that is going to be executed. It
will be similar to the Test Plan for the System Testing that is usually performed except few
attributes like requirements and test cases.

Here the requirements are nothing but a mapping sheet that will have kind of mapping
between data within different databases. As we are aware that the ETL testing occurs on
multiple levels, there are various mappings needed for validating this.

Most of the time the data is captured from the source databases are not directly. All the source
data will have the tables’ view from where the data can be used.
Example: Following is an example of how the mappings can be provided. The two columns
VIEW_NAME and TABLE_NAME can be used to represent the views for reading data from
the source and the table in the ETL environment respectively.

It is advisable to maintain the naming convention that can help us while planning for
automation. Generic notation that can be used is just prefixing the name of the environment.

The most significant thing in ETL is about identifying the essential data and the tables from
the source. The next essential step is the mapping of tables from the source to the ETL
environment.

Following is an example of how the mapping between the tables from the various
environments can be related to the ETL purpose.

The above mapping assumes the data from the source table to the staging table. And from
then on to the tables in EDW and then to OLAP which is the final reporting environment.
Hence at any point in time, data synchronization is very important for the ETL’s sake.

Critical ETL Needs

As we understand ETL is the need for forecasting, reporting and analyzing the business in
order to capture the customer needs in a more successive manner. This will enable the
business to have higher demands than in the past.

Here are few of the critical needs without which ETL testing cannot be achieved:

1. Data and tables identification: This is important as there can be many other irrelevant and
unnecessary data that can be of least importance when forecasting and analyze the
customer needs. Hence the relevant data and the tables have to be selected before starting
up the ETL works.
2. Mapping sheet: This is one of the critical needs while doing ETL works. Mapping of the right
table from the source to the destination is mandatory and any problems or incorrect data in
this sheet might impact the whole ETL deliverable.
3. Table designs and data, column type: This is the next major step when considering the
mapping of source tables into the destined tables. The column type has to match with the
tables at both the places etc.
4. Database access: The main thing is access to the database where ETL goes on. Any
restrictions on the access will have an equivalent impact.

ETL Reporting and Testing

Reporting in ETL is more important as it explains and directs the clients the customer needs.
By this, they can forecast and analyze the exact customer needs

Example #3:

A company which manufactures silk fabric wanted to analyze their annual sales. On review
of their annual sales, they found during the month of August and September there was a
tremendous fall in sales with the use of the report they generated.

Hence they decided to roll out the promotional offer like the exchange, discounts, etc., that
enhanced their sales.

Basic Issues In ETL Testing

There can be a number of issues while performing ETL testing like the following:

• Either the access to the source tables or the views will not be valid.
• The column name and the data type from the source to the next layer might not match.
• A number of records from the source table to the destined tabled might not match.

And there might be much more.

Following is a sample of mapping sheet where there are columns like VIEW_NAME,
COLUMN_NAME, DATA_TYPE, TABLE_NAME, COLUMN_NAME, DATA_TYPE, and
TRANSFORMATION LOGIC present.

The first 3 columns represent the details of the source database and the next 3 are the details
for the immediate preceding database. The last column is very important. Transformation
logic is the way the data from the source is read and stored in the destined database. This
depends on the business and ETL needs.
Points To Remember While ETL Test Planning And Execution

The most important thing in ETL testing is the loading of data based on the extraction criteria
from the source DB. When this criterion is invalid or obsolete then there will be no data in the
table to perform ETL testing that really brings in more issues.

Following are a few of the points to be taken care while ETL Test Planning and
Execution:

#1) Data is being extracted from the heterogeneous data sources


#2) ETL process handling in the integrated environment that have different:

• DBMS
• OS
• Hardware
• Communication protocols

#3) Necessity in having a logical data mapping sheet before the physical data can be
transformed
#4) Understanding and examining of the data sources
#5) Initial load and the incremental load
#6) Audit columns
#7) Loading the facts and the dimensions

ETL Tools And Their Significant Usage

ETL tools are basically used to build and convert the transformation logic by taking data
from the source into another applying the transformation logic. You can also map the schemas
from the source to the destination which occurs in unique ways, transform and clean up data
before it can be moved to the destination, along with loading at the destination in an efficient
manner.

This can significantly reduce the manual efforts as the mapping can be done that is used for
almost all of the ETL validation and verification.

ETL tools:

1. Informatica – PowerCenter – is one of the popular ETL tools that is introduced by the
Informatica Corporation. This has a very good customer base covering wide areas. The major
components of the tool are its tools for clients and the repository tools and the servers. To
know more about the tool please click here
2. IBM – Infosphere Information Server – IBM who is the market leader in terms of Computer
technology has developed the Infosphere Information server that is used for Information
Integration and Management in the year 2008. To know more about the tool please click
here
3. Oracle – Data Integrator – Oracle Corporation has developed its ETL tool in the name of
Oracle – Data Integrator. Their increasing customer support has made them update their ETL
tools in various versions. To know more about the tool please click here

More examples of the usage of ETL testing:

Considering some Airlines which want to roll out promotions and offers to attract the
customers strategically. Firstly they will try to understand the demands and needs of the
customer’s specifications. In order to achieve this, they will require the historical data
preferably the previous 2 years’ data. Using the data they will analyze and prepare some
reports that will be helpful in understanding the customers’ needs.

The reports can be of the following kind:

1. Customers from region A who travels to region B on certain dates


2. Customers with specific age criterion travel to city XX

And there can be many other reports.

Analyzing these reports will help the clients in identifying the kind of promotions and offers
that will benefit the customers and at the same time can benefit businesses where this can
become a Win-Win situation. This can be easily achieved by ETL testing and reports.

In parallel, the IT segment faces a serious DB issue that has been noticed that has stopped
multiple services, in turn, has the potential to cause impacts in the business. On investigation,
it was identified that some invalid data has corrupted a few databases that needed to be
corrected manually.

In the former case, it is ETL reports and testing that will be required.

Whereas the latter case is where the DB testing has to be done properly to overcome issues
with invalid data.

Conclusion

Hope the above tutorial has provided a simple and clear overview of what ETL testing is and
why it has to be done along with the business impacts or benefits they yield. This does not
stop here, but it can extend to set foresight in growth in business.

About the author: This tutorial is written by Nagarajan. He is a Test Lead with over 6 years
of Software Testing experience in various functional areas like Banking, Airlines, and
Telecom in terms of both manual and automation.

Please let us know your thoughts/questions in the comments below.

Recommended Reading
• ETL Testing Interview Questions and Answers
• ETL Testing Data Warehouse Testing Tutorial (A Complete Guide)
• 10 Best ETL Testing Tools in 2023 [TOP SELECTIVE]
• How to Perform ETL Testing Using Informatica PowerCenter Tool
• 31 Top Database Testing Interview Questions and Answers
• 40+ Best Database Testing Tools - Popular Data Testing Solutions
• Database Testing Complete Guide (Why, What, and How to Test Data)
• Selenium Database Testing (Using WebDriver and JDBC API)

Categories Database Testing

How to Write an Effective Test Summary Report [Sample Report Download]

How to Test Smarter: Explore More, Document Less

21 thoughts on “ETL vs. DB Testing – A Closer Look at ETL Testing Need,


Planning and ETL Tools”

1. Surbhi

at

nice explanation. do you have ETL testing training as well?

Reply

2. Sathish

at

Nice article of ETL Testing.

Reply

3. Kunjal Gandhi

at

Awesome explanation!!

Reply

4. Venkateswara Rao Kotta

at

Very Nice Article

Reply
5. Njuneki Jayne

at

I have enjoyed reading this article, very educative.

Reply

6. rinku

at

thanku for the valuable information…

Reply

7. Nagarajan STH Author

at

@Surbhi, Sathish, Venkateswara Rao Kotta, Njuneki Jayne

Thanks for your comments. You can always get back to us regarding any clarifications
from Software Testing.

-Nagarajan

Reply

8. Nagarajan STH Author

at

@Surbhi,

We are Planning to conduct training on ETL testing as well. Will keep posted on our
training’s calendar.. Please keep checking!!

-Nagarajan

Reply

9. Boxfish

at

Useful Article, Thank you for Sharing.

I think your website have a problem with html link. i didn’t see any images on this
page, fyi.
Reply

10. Nandhini

at

Very nicely and neatly explained. Thanks

Reply

11. Manander Singh

at

Nice post!!!

I am confused with my work that I am working for my company.


My job is to validate the excel reports for correctness and completeness against
database entries manualy…

What I thought its not a testing.

But after reading you article I thought its a kind of ETL testing.
Can you explain a little….

Reply

12. Tester

at

Hi, I have one suggestion for your web site. If one left clicks on any image on web
page it gets open in the entire page and we need to go back to the page. eg. Check
page – https://www.softwaretestinghelp.com/etl-testing-vs-db-testing/. It would be
much better if your image will not respond to left click and rather provide this option
in right click.

Reply

13. Neelakanta

at

Nice Explanation

Reply

14. Piyush Dubey

at
Thanks for such a good exlplanation. Are you conducting ETL testing also. If yes then
plz update me on [email protected].

Reply

15. Vasu

at

Which one is best of these two etl and db testing. Just suggest ne….

Reply

16. Saritha

at

This article provided required info for further progress in ETL Testing in initial stage

Reply

17. mallesh

at

Fully clarified about ETL testing and Data Warehouse thank you very much for your
efforts to help others…..

Reply

18. James

at

About the Automation tool for DB Testing & ETL testing as you suggested:
– I completely agree: automation tools (for ETL testing) are Informatica, QuerySurge,
COGNOS, …
– But in the automation tools (for Database testing): I don’t think QTP, Selenium are
the automation tool for Database testing. Because if you research well about the nature
of QTP or Selenium, you’ll find out that the main function of these tools focus to
automation GUI (rather than communicating directly to the database).
**Example: We have built an Framework X to test a website with the data shown on
website are getting from a specific Database (SQL Server). The Framework X use
Selenium WebDriver API for interacting on GUI (such as: click, mouse move, assert
text, …). Besides for the communicating directly to the database, the Framework X
use the SqlClient API.
==> As you see, for the testing a value shown on website need the combination both
SqlClient API & Selenium WebDriver API. The SqlClient API to help you get
expected data from Database, the WebDriver API help you to get the current value on
GUI for testing as an actual result. Base on the expected & actual result, you’ll make a
comparison via two ways: 1-use assert function of Selenium WebDriver API for
comparison; 2-use nature assert function of Framework X for comparison.
==> Let’s get to the core of the discussion: I think that for the automation test
Database.
– We can’t focus to 1 specific Automation GUI API for Database testing.
– We should focus to many things, such as: Automation GUI API, Communication
database API, etc … instead of only focusing on Automation GUI API.

Many thanks,

Reply

19. Mukesh

at

Nice post for freshers

Reply

20. Gopi

at

Hi Nagaraj, do you conduct online training on ETL. If yes please share me the details
to my email @ [email protected]. I am very much impressed the way you
explained on the difference between ETL and DB testing.

Reply

21. swetha

at

if ETL testing has been conducted or planned to conduct please keep me informed.
QuerySurge i am looking.

Reply
4.Tuto 4:The 4 Steps to Business
Intelligence (BI) Testing: How to Test
Business Data
June 24, 2023

Business Intelligence (BI) is a process of gathering, analyzing, and transforming raw data into
accurate, efficient, and meaningful information which can be used to make wise business
decisions and refine business strategy.

BI gives organizations a sense of clairvoyance. Only the perception is not fueled by extra-
sensory ability but by facts.
Business Intelligence testing initiatives help companies gain deeper and better insights so they
can manage or make decisions based on hard facts or data.

The way this is done has changed considerably in the current day’s market. What used to be
offline reports and such is now live business integration.

This is great news for both businesses and users because:

• Businesses know what is working and what is not easily


• Better User experience with the software

Recommended read => Business Process Testing (BPT)

BI is not achieved with one tool or via one system. It is a collection of applications,
technologies, and components that make up the entire implementation.
To simplify and show you the flow of events:

User transactional data (Relational database, or OLTP) Flat file, records or other
formats of data etc. -> ETL processes-> Data Warehouse->Data Mart->OLAP
additional sorting, categorizing, filtering etc. provide meaningful insights – BI.

Business Integration is when this analytics effect the way a certain application works.

For example, Your Credit Card might not work at a new location because BI alerts the
application that it is an unusual transaction. This has happened to me once. I was at an art
exhibition where there were artisans from different parts of the US. I used my credit card to
buy a few things, but it would not go through because the seller was registered from a part of
US that my credit card was never used at. This is an example of BI integration to prevent
fraud.

Recommended product on Amazon or other retail sites, related videos on video sites etc. are
other examples of Business Integration of BI.

From the above flow, it is also apparent that ETL and storage systems are important to
successful BI implementation. Which is why, BI testing is never an independent event. It
involves ETL and Data warehouse testing as integral elements. And as testers, it is important
to understand and know more about how to test these.
STH has you covered there. We have articles that talk about these concepts. I will provide the
links below so we can get those out of the way and focus on BI alone.

• ETL Testing / Data Warehouse Testing – Tips, Techniques, Process and Challenges
• ETL vs. DB Testing – A Closer Look at ETL Testing Need, Planning and ETL Tools

One more thing that Business Intelligence testing experts almost always recommend is:
Testing the entire flow, right from when the time data gets taken from the source all the
way to the end. Do not just test for the reports and analytics at the end alone.

Therefore, the sequence should be:

What You Will Learn: [hide]

• Business Intelligence testing Sequence:


• BI Testing Strategy:
• Conclusion:
• Recommended Reading

Business Intelligence testing Sequence:

#1) Check the Data at the source:

Business Data usually does not come from one source and in one format alone. Make sure that
the source and the type of data that it sends matches. Also, do a basic validation right here.

Let us say a student’s details are sent from a source for subsequent processing and storage.
Make sure that the details are correct, right at this point itself. If the GPA shows as 7, this is
clearly over than the 5 point system. So, such data can be discarded or corrected right here
itself without taking it for further processing.

This is usually the “Extract” stage of the ETL.

#2) Check the data transformation:

This is where the raw data gets processed into business targeted information.

• The source and destination data types should match. E.g.: You can’t store the date as
text.
• Primary key, foreign key, null, default value constraints, etc. should be intact.
• The ACID properties of source and destination should be validated, etc.

#3) Check the data Loading

(Into a data warehouse or Data mart or anywhere it is going to be permanently located):

The actual scripts that load the data and testing them would be definitely included in your
ETL testing. The data storage system, however, has to be validated for the following:
• Performance: As systems become more intricate, there are relationships formed
between multiple entities to make several co-relations. This is great news for data
analytics, however, this kind of complexity often results in queries taking too long to
retrieve results. Therefore, performance testing plays an important role here.
• Scalability: Data is only going to increase not decrease. Therefore, tests have to be
done to make sure that the size of the growing business and data volumes can be
handled by the current implementation or not. This also includes testing the archival
strategy too. Basically, you are trying to test the decision- “What happens to older data
and what if I need it?”

It is also a good idea to test the other aspects such as its computational abilities, recovery from
failure, error logging, exception handling, etc.

#4) BI Report Testing:

Finally, the reports, the last layer of the entire flow.

This is what is considered Business Intelligence. But, as you can see from the above, the
reports are never going to be correct, consistent and fast if your preceding layers were
malfunctioning.

At this point, look for:

• The reports generated and their applicability to the business


• The ability to customize and personalize the parameters to be included in the reports.
Sorting, Categorizing, grouping, etc.
• The appearance of the report itself. In other words, the readability.
• If the BI elements are BI integrated, then the corresponding functionality of the
application is to be included in an end-to-end test.

BI Testing Strategy:

Now that we know what to test and resources for ETL and Data Warehouse testing, let’s look
at what process the tester’s need to follow.

Simple, a BI testing project is a testing project too. That means the typical stages of testing are
applicable here too, whether it is the performance you are testing or functional end to end
testing:

• Test planning
• Test strategy
• Test design (Your Test cases will be query intensive rather than plain text based. This
is the ONE major difference between your typical test projects to an ETL/Data
Warehouse/BI testing project.)
• Test execution (Once again, you are going to need some querying interface such as
TOAD to run your queries)
• Defect reporting, closure etc.

Conclusion:
BI is an integral element of all business areas. E-Commerce, Health Care, Education,
Entertainment and every other business relies on BI to know their business better and to
provide a killer experience to their users.

We hope this article gave you the necessary information to explore Business Intelligence
testing area much further.

About the author: This post is written by STH team member Swati.

Have you been a BI tester? Please do share your experiences, comments and questions
below.

Recommended Reading

• ETL Testing Data Warehouse Testing Tutorial (A Complete Guide)


• Best Software Testing Tools 2023 [QA Test Automation Tools]
• ETL Testing Interview Questions and Answers
• Testing Primer eBook Download
• 10 Best ETL Testing Tools in 2023 [TOP SELECTIVE]
• B2B (Business to Business) Gateway Testing Process
• Global Software Testing Business To Reach $34.5 Billion Soon
• Business Process Testing (BPT) - How to Simplify and Speed Up the Testing Process
Using BPT

Categories Testing Concepts, Testing Methodologies, Types of Testing


A Step by Step Guide to IBM Rational ClearQuest Tool
How To Be A Productivity Junkie (Especially as a Tester)

20 thoughts on “The 4 Steps to Business Intelligence (BI) Testing: How to Test


Business Data”

1. Naga Sekhar

at

Simply super article about the BI/ETL testing concepts overview.

Reply

2. Sheetal patil

at

Hi
this is a good article on BI test. Very few and best testers get the opportunity of
business data testing.

Reply

3. Suman prabha
at

good article. is there any tool for this testing or its manual only?

Reply

4. Swati Seela

at

@Naga Sekhar: Thank you! Glad you found it useful.

Reply

5. Swati Seela

at

@Sheetal Patil: Thank you! I agree, the opportunities are few.

Reply

6. Swati Seela

at

@Suman prabha: All the tools used for BI can be used for testing. However, if the BI
insights are on point or not, is only decided by people. So yes and no! Thanks for
stopping by..

Reply

7. JonesRay

at

As a software tester this one is useful to know about the way to test the business
related issues, let me implement your useful tips.

Reply

8. Ratna Reddy

at

It’s good and valuable information. It’s very useful for me also

Reply

9. bee
at

Hi

Good article. Is there anyone worked with infrastructure testing, how to do this? Can
anyone highlight about this?

Reply

10. Deepa

at

Great. Very good and simple to understand article.

Reply

11. Mario

at

It’s good information and simple to understand it, thank’s a lot.

Reply

12. Shilpa

at

Its a wonderful and practical article.

BI Report testing is very different and very challenging…every single data on the
report has to be “PERFECT” else its a BIG BLUNDER

Mostly Manual Testing works better … with limited automation

Reply

13. shivansh

at

Is there any way to switch to bi testing after having experience of an year or two in
performance testing? I got trained in BI but put into PT.. I’d like to switch if possible
in future.

Reply

14. Ramkumar
at

Very good over view on BI

Reply

15. Chandra

at

Good Knowledge Share

Reply

16. Shilpa

at

BI-Testing is very important since Major decisions are made using these Reports.
Testing the output/Reports with the Raw Data is “HUGE RESPONSIBLE TASK” ….
Manual Testing is best for a Reports , it requires a “Different Skills” than normal
testing
well said about “Testing at the Early Stage” ….

Reply

17. Rashmi Gupta

at

It is good and practical to implement

Reply

18. Sivaprasad R

at

I got an opportunity to do SAP BI 3. Testing. It’s not at all easy. It was the purely
manual end to end testing. After completion of the test, you will be expert in all
domains like SD, MM, Inventory etc. The ultimate goal of SAP BI is BEX and
reporting.
It means Bus. Explorer Query generation. So if you need a query you must have to
clearly code the tables from various modules. Eg. One day’s Sales. We need SD
module and its corresponding tables. DSOs and Infocubes are data stagers in SAP BI.
T-codes will play major roles in SAP.

Reply

o Roma
at

Very well explained Siva

Reply

19. Emilia Jazz

at

I have actually used the ways mentioned and they work. Thanks for sharing.

Reply
5.Tuto5:10 Best ETL Testing Tools in 2023
[TOP SELECTIVE]
September 4, 2023

List and Comparison of the Best ETL Testing Tools:

Almost all the IT companies today, highly depend on data flow as a large amount of
information is made available for access and one can get everything that is required.

This is where the concept of ETL and ETL Testing comes into the picture. Basically, ETL is
abbreviated as Extraction, Transformation, and Loading. Presently ETL Testing is performed
using SQL scripting or using spreadsheets which may be a time-consuming and error-prone
approach.

In this article, we will have detailed discussions on several concepts viz. ETL, ETL Process,
ETL testing, and different approaches used for it along with the most popular ETL testing
tools.

Also read => ETL Testing Tips

What You Will Learn: [hide]

• What is ETL Testing?


• Most Popular ETL Testing Tools
o #1) RightData
o #2) Integrate.io
o #3) iCEDQ
o #4) BiG EVAL
o #5) Informatica Data Validation
o #6) QuerySurge
o #7) Datagaps ETL Validator
o #8) QualiDI
o #9) Talend Open Studio for Data Integration
o #10) Codoid’s ETL Testing Services
o #11) Data-Centric Testing
o #12) SSISTester
o #13) TestBench
o #14) DataQ
o Points to Remember
o ETL Testing Process
o Types of ETL Testing
o How to Create Test Cases in ETL Testing
• Conclusion
o Recommended Reading
What is ETL Testing?

#1) As mentioned previously ETL stands for Extraction, Transformation, and Loading is
considered to be the three prime database functions.

• Extraction: Reading data from the database.


• Transformation: Converting the extracted data into the required form to store in another
database.
• Loading: Writing the data into the target database.

#2) ETL is used to transfer or migrate the data from one database to another, to prepare data
marts or data warehouses.

The following diagram elaborates the ETL Process in a precise way:


=>> Contact us to suggest a listing here.

Most Popular ETL Testing Tools

Like automation testing, ETL Testing can also be automated. Automated ETL Testing reduces
time consumption during the testing process and helps to maintain accuracy.

Few ETL Testing Automation Tools are used to perform ETL Testing more effectively and
rapidly.

Given below is the list of the top ETL Testing Tools:

1. RightData
2. Integrate.io
3. iCEDQ
4. BiG EVAL
5. Informatica Data Validation
6. QuerySurge
7. Datagaps ETL Validator
8. QualiDI
9. Talend Open Studio for Data Integration
10. Codoid’s ETL Testing Services
11. Data Centric Testing
12. SSISTester
13. TestBench
14. DataQ

#1) RightData
RDt is a self-service ETL/Data Integrations testing tool designed to help business and
technology teams with the automation of data quality assurance and data quality control
processes.

RDt’s intuitive interface allows users to validate and reconcile data between datasets
regardless of the differences in the data model or the data source type. It is designed to work
efficiently for data platforms with high complexity and huge volumes.
Key Features:

• Powerful universal query studio where users can perform queries on any data source
(RDBMS, SAP, Files, Bigdata, Dashboards, Reports, Rest APIs, etc.), explore metadata, analyze
data, discover data by data profiling, prepare by performing transformations and cleansing,
and snapshot data to assist with data reconciliation, business rules, and transformations
validation.
• Using RDt, users can perform field-to-field data comparisons regardless of the differences in
the data model and structure between source and target.
• It comes with a pre-delivered set of validation rules along with a custom business rule
builder.
• RDt has bulk comparison capacities to facilitate technical data reconciliation across the
project landscape (e.g. compare production environment data with UAT, etc.)
• Robust alerting and notification capabilities starting from emails through automatic creation
of defect/incident management tools of your choice.
• RDt’s data quality metrics and data quality dimension dashboard allow data platform owners
an insight into the health of their data platform with drill-down capabilities into the scenarios
and exact records and fields causing the validation failures.
• RDt can be used for testing analytics/BI tools like Tableau, Power BI, Qlik, SSRS, Business
Objects Webi, SAP Bex, etc.
• RDt’s two-way integration with CICD tools (Jenkins, Jira, BitBucket, etc.) assists your data
team’s journey of DevOps enablement through DataOps.

=> Visit RDt Website

#2) Integrate.io
Integrate.io is a data integration, ETL, and ELT platform. This cloud-based platform will
streamline data processing. It provides an intuitive graphic interface to implement an ETL,
ELT, or a replication solution. With Integrate.io you will be able to perform out-of-the-box
data transformations.

Key Features:

• Integrate.io’s workflow engine will help you to orchestrate and schedule data pipelines.
• You will be able to implement complex data preparation functions by using rich expression
language.
• It has the functionalities to schedule jobs, monitor job progress, and status as well as sample
data outputs, and ensure correctness and validity.
• Integrate.io’s platform will let you integrate data from more than 100 data stores and SaaS
applications.
• Integrate.io offers both low-code or no-code options.

=> Visit Integrate.io Website

#3) iCEDQ

iCEDQ enables Left Shift Approach, which is central to DataOps. We recommend starting
early in the non-production phase to test data and continuously monitor the production data.

iCEDQ’s rules-based approach empowers users to automate ETL Testing, Cloud Data
Migration Testing, Big Data Testing, and Product Data Monitoring.
Key Features:

• An in-memory engine that can evaluate billions of records at scale.


• Enables users to do transformation testing, duplicate data testing, schema testing, Type II
dimension testing, and much more.
• Advance groovy scripting for data prep, cleansing, triggering API’s, shell scripts, or any
external process.
• Import custom Java libraries or create reusable test functions.
• Implement DataOps by integrating with any Scheduling, Orchestration, GIT, or DevOps tool.
• Push results to Slack, Jira, ServiceNow, Alation, Manta, or any enterprise product.
• Single Sign-On, Advanced role-based access control, and Encryption features.
• Use the inbuilt Dashboard module or enterprise reporting tools like Tableau, Power BI, and
Qlik to generate reports for more insight.
• Deploy anywhere. On-Prem or in AWS, Azure, GCP, IBM Cloud, Oracle Cloud, or other
platforms.

=> Visit iCEDQ Website

#4) BiG EVAL

BiG EVAL is a comprehensive suite of software tools aimed at leveraging the value of
enterprise data by continuously validating and monitoring quality. It automates testing tasks
during ETL and DWH development and provides quality metrics in production.
Features:

• Autopilot testing for agile development, driven by metadata from your database or metadata
repository.
• Data Quality Measuring and Assisted Problem Solving.
• High-performance in-memory scripting and rules engine.
• Abstraction of any kind of data (RDBMS, APIs, Flatfiles, Business applications cloud / on-
premises).
• Clear modern dashboards and alerting processes.
• Embeddable into DevOps CI/CD flows, ticket systems, and more.
• BiG EVAL checks data against your very own and scenario-specific quality criteria.
• User-defined test cases give great flexibility when you need your own testing algorithms.

=> Visit BiG EVAL Website

#5) Informatica Data Validation


Informatica Data Validation is a GUI based ETL Testing tool which is used to extract,
transform and load (ETL). The testing includes a comparison of tables before and after data
migration.

This type of testing ensures data integrity, i.e. the volume of data is correctly loaded and is in
the expected format into the destination system.

Key Features:

• Informatica Validation tool is a comprehensive ETL Testing tool which does not require any
programming skill.
• It provides automation during ETL testing which ensures if the data is delivered correctly and
is in the expected format into the destination system.
• It helps to complete data validation and reconciliation in the testing and production
environment.
• It reduces the risk of introducing errors during transformation and avoids bad data being
transformed into the destination system.
• Informatica Data Validation is useful in the Development, Testing and Production
environment where it is necessary to validate the data integrity before moving into the
production system.
• 50 to 90% of cost and effort can be saved using the Informatica Data Validation tool.
• Informatica Data Validation provides a complete solution for data validation along with data
integrity.
• Reduces programming efforts and business risks due to an intuitive user interface and built-
in operators.
• Identifies and prevents data quality issues and provide greater business productivity.
• Allows 64% free trial and 36% paid service that reduces the time and cost required for data
validation.

Visit the official site here: Informatica Data Validation

#6) QuerySurge

QuerySurge tool is specifically built for testing of Big Data and Data warehouse. It ensures
that the data extracted and loaded from the source system to the destination system is correct
and is as per the expected format. Any issues or differences are identified very quickly by
QuerySurge.

Key Features:

• QuerySurge is an automated tool for Big Data Testing and ETL Testing.
• It improves the data quality and accelerates testing cycles.
• It validates data using the Query Wizard.
• It saves time & cost by automating manual efforts and schedules tests for a specific time.
• QuerySurge supports ETL Testing across various platforms like IBM, Oracle, Microsoft, SAP.
• It helps to build test scenarios and test suit along with configurable reports without specific
knowledge of SQL.
• It generates email reports through an automated process.
• Reusable query snippet to generate reusable code.
• It provides a collaborative view of data health.
• QuerySurge can be integrated with HP ALM, TFS, IBM Rational Quality Manager.
• Verifies, converts, and upgrades data through the ETL process.
• It is a commercial tool that connects source and target data and also supports real-time
progress of test scenarios.

Visit the official site here: QuerySurge

#7) Datagaps ETL Validator

ETL Validator tool is designed for ETL Testing and Big Data Testing. It is a solution for data
integration projects. The testing of such data integration project includes various data types,
huge volume, and various source platforms.

ETL Validator helps to overcome such challenges using automation which further helps to
reduce the cost and to minimize efforts.

• ETL Validator has an inbuilt ETL engine which compares millions of records from various
databases or flat files.
• ETL Validator is data testing tool specifically designed for automated data warehouse testing.
• Visual Test Case Builder with drag and drop capability.
• ETL Validator has features of Query Builder which writes the test cases without manually
typing any queries.
• Compare aggregate data such as count, sum, distinct count etc.
• Simplifies the comparison of database schema across various environment which includes
data type, index, length, etc.
• ETL Validator supports various platforms such as Hadoop, XML, Flat files etc.
• It supports email notification, web reporting etc.
• It can be integrated with HP ALM which results in sharing of test results across various
platforms.
• ETL Validator is used to check Data Validity, Data Accuracy and also to perform Metadata
Testing.
• Checks Referential Integrity, Data Integrity, Data Completeness and Data Transformation.
• It is a commercial tool with 30 days trial and requires zero custom programming and
improves business productivity.

Visit the official site here: Datagaps ETL Validator

#8) QualiDI

QualiDi is an automated testing platform which offers end to end testing and ETL Testing. It
automates ETL Testing and improves the effectiveness of ETL Testing. It also reduces the
testing cycle and improves data quality.

QualiDI identifies bad data and non-compliant data very easily. QualiDI reduces the
regression cycle and data validation.

Key Features:

• QualiDI creates automated test cases and it also provides support for automated data
comparison.
• It offers data traceability and test case traceability.
• It has a centralized repository for requirements, test cases, and test results.
• It can be integrated with HPQC, Hadoop, etc.
• QualiDI identifies a defect in the early stage which in turn reduces the cost.
• It supports email notifications.
• It supports the continuous integration process.
• It supports Agile development and the rapid delivery of sprints.
• QualiDI manages complex BI Testing cycles, eliminates human error and data quality
maintained.

Visit the official site: QualiDi

#9) Talend Open Studio for Data Integration

Talend Open Studio for Data Integration is an open-source tool that makes ETL Testing
easier. This includes all ETL Testing functionality and additional continuous delivery
mechanisms. With the help of Talend Data Integration tool, a user can run the ETL jobs on
the remote servers that too with a variety of operating systems.

ETL Testing ensures that data is transformed from the source system to the target without any
data loss and thereby adhering to transformation rules.

Key Features:

• Talend Data Integration supports any type of relational database, Flat files, etc.
• Integrated GUI which simplifies the design and development of ETL processes.
• Talend Data Integration has inbuilt data connectors with more than 900 components.
• It detects business ambiguity and inconsistency in transformation rules quickly.
• It supports remote job execution.
• Identifies defects at an early stage to reduce costs.
• It provides quantitative and qualitative metrics based on ETL best practices.
• Context switching is possible between
• ETL development, ETL testing, and ETL production environment.
• Real-time data flow tracking along with detailed execution statistics.
Visit the official site here: Talend ETL Testing

#10) Codoid’s ETL Testing Services

Codoid’s ETL and data warehouse testing service includes data migration and data validation
from the source to the target system. ETL Testing ensures that there is no data error, no bad
data or data loss while loading data from the source to the target system.

It quickly identifies any data errors or any other general errors that occurred during the ETL
process.

Key Features:

• Codoid’s ETL Testing service ensures data quality in the data warehouse and data
completeness validation from the source to the target system.
• ETL Testing and data validation ensure that the business information transformed from
source to target system is accurate and reliable.
• The automated testing process performs data validation during and post data migration and
prevents any data corruption.
• Data validation includes count, aggregates, and spot checks between the target and actual
data.
• The automated testing process verifies if data type, data length, indexes are accurately
transformed and loaded into the target system.
• Data quality Testing prevents data errors, bad data or any syntax issues.

Visit the official site here: Codoid’s ETL Testing


#11) Data-Centric Testing

Data-Centric testing tool performs robust data validation to avoid any glitches such as data
loss or data inconsistency during data transformation. It compares data between systems and
ensures that the data loaded into the target system is exactly matching with the source system
in terms of data volume, data type, format, etc.

Key Features:

• Data-Centric Testing is build to perform ETL Testing and Data warehouse testing.
• Data-Centric Testing is the largest and oldest testing practice.
• It offers ETL Testing, data migration, and reconciliation.
• It supports various relational databases, Flat files, etc.
• Efficient Data validation with 100% data coverage.
• Data-Centric Testing also supports comprehensive reporting.
• The automated process of data validation generates SQL queries which result in the
reduction of cost and efforts.
• It offers a comparison between heterogeneous databases like Oracle & SQL Server and
ensures that the data in both systems is in the correct format.

#12) SSISTester

SSISTester is a framework that helps in the unit and integration testing of SSIS packages. It
also helps to create ETL processes in a test-driven environment which thereby helps to
identify errors in the development process.

There are a number of packages created while implementing ETL processes and these need to
be tested during unit testing. An integration test is also a “Live test”.

Key Features:

• The unit test creates and verifies tests and once execution gets complete it performs a clean-
up job.
• Integration test verifies that all packages are satisfied post-execution of the unit test.
• Tests are created in a simple way as the user creates it in Visual Studio.
• Real-time debugging of a test is possible using SSISTester.
• Monitoring of test execution with user-friendly GUI.
• Test results are exported in HTML format.
• It removes external dependencies by using fake source and destination addresses.
• For the creation of tests, it supports any .NET language.

#13) TestBench

TestBench is a database management and verification tool. It is a unique solution which


addresses all issues related to the database. User managed data rollback to improve testing
productivity and accuracy.

It also helps to reduce environment downtime. TestBench reports all inserted, updated, and
deleted transactions which are performed in a test environment and capture the status of the
data before and after the transaction.

Key Features:

• It always maintains data confidentiality to protect data.


• It has a restoration point for an application when a user wants to return back to a specific
point.
• It improves decision making knowledge.
• It customizes data sets to improve test efficiency.
• It helps with maximum test coverage and helps reduce time and money.
• Data privacy rules ensure that live data is not available in the test environment.
• Results are compared with various databases. The results include differences in tables &
operation performed on tables.
• TestBench analyzes the relationship between the tables and maintains the referential
integrity between tables.

#14) DataQ

DataQ provides various tools for quickly identifying data issues. The platform is very
intuitive and designed for both developers and testers. It is built ground up for the high
volume of data, so whether you have hundreds of records or billions, we have you covered.
• Automate ETL Testing and Monitoring.
• Data Migration Testing with auto-detection of keys.
• Data Quality Monitoring – Freshness, Distribution, Volume, Schema, Completeness,
Accuracy.
• Auto Suggestion of Data Quality rules.
• Cross-reference data validation across multiple data sources.
• Can connect to over 40 different data sources, various file formats, Kafka, and API out of the
box.
• Ability to create a library of custom functions.
• Schema Validation
• Data Profile comparison
• Compute resources are initialized and terminated on demand.
• On-prem and cloud-agnostic solution.
• Jira, Slack, Teams integration.

Points to Remember

While performing ETL testing, several factors are to be kept in mind by the testers.

Some of them are listed below:


o Apply suitable business transformation logic.
o Execute backend data-driven tests.
o Create and execute absolute test cases, test plans, and test harness.
o Assure accuracy of data transformation, scalability and performance.
o Make sure E

• TL application reports invalid values.


• Unit tests should be created as targeted standards.
ETL Testing Process

ETL Testing Process is similar to other testing processes and includes some stages.

They are:

• Identifying business requirements


• Test Planning
• Designing test cases and test data
• Test execution and bug reporting
• Summarizing reports
• Test closure

Types of ETL Testing

ETL Testing can be classified into the following categories according to the testing process
that is being followed.

#1) Production Validation Testing:

It is also called Table balancing or product reconciliation. It is performed on data before or


while being moved into the production system in the correct order.

#2) Source To Target Testing:

This type of ETL Testing is performed to validate the data values after data transformation.

#3) Application Upgrade:

It is used to check whether the data is extracted from an older application or new application
or repository.

#4) Data Transformation Testing:

Multiple SQL queries are required to be run for each and every row to verify data
transformation standards.

#5) Data Completeness Testing:

This type of testing is performed to verify if the expected data is loaded at the appropriate
destination as per the predefined standards.

I would also like to compare ETL Testing with Database Testing but before that let us have a
look at the types of ETL Testing with respect to database testing.

Given below are the Types of ETL Testing with respect to Database Testing:

#1) Constraint Testing:


Testers should test whether the data is mapped accurately from source to destination while
checking for it testers need to focus on some key checks (constraints).

They are:

• NOT NULL
• UNIQUE
• Primary Key
• Foreign Key
• Check
• NULL
• Default

#2) Duplicate Check Testing:

Source and target tables contain a huge amount of data with frequently repeated values, in
such case testers follow some database queries to find such duplication.

#3) Navigation Testing:

Navigation concerns with the GUI of an application. The user finds an application friendly
when he gets easy and relevant navigation throughout the entire system. The tester must focus
on avoiding irrelevant navigation from the user’s point of view.

#4) Initialization Testing:

Initialization Testing is performed to check the combination of hardware and software


requirements along with the platform it is installed on.

#5) Attribute Check Testing:

This testing is performed to verify if all the attributes of both the source and target system are
the same

From the above listing one may consider that ETL Testing is quite similar to Database Testing
but the fact is ETL Testing is concerned with Data Warehouse Testing and not Database
Testing.

There are several other facts due to which ETL Testing differs from Database Testing.

Let’s have a quick look at what they are:

• The primary goal of Database Testing is to check if the data follows the rules and standards
of the data model, on the other hand, ETL Testing checks if data is moved or mapped as
expected.
• Database Testing focuses on maintaining a primary key-foreign key relationship while ETL
Testing verifies for data transformation as per the requirement or expectation and is the
same at the source and target system.
• Database Testing recognizes missing data whereas ETL Testing determines duplicate data.
• Database Testing is used for data integration and ETL Testing for enterprise business
intelligence reporting
• These are some major differences that make ETL Testing different from Database Testing.

Given below is the table showing the list of ETL Bugs:

Type of bug Description

Calculation Bugs Final output wrong due to mathematical error

Input/output Bugs Accepts invalid values and rejects valid values

H/W bugs Device is not responding due to hardware issues

User Interface bugs Related to GUI of an application

Load condition bugs Denies multiple users

How to Create Test Cases in ETL Testing

The primary goal of ETL testing is to ensure whether the extracted and transformed data is
loaded accurately from the source to the destination system. ETL testing includes two
documents, they are:

#1) ETL Mapping Sheets: This document contains information about the source &
destination tables and their references. Mapping sheet provides help to create big SQL queries
while performing ETL Testing.

#2) Database schema for Source and Destination table: It should be kept updated in the
mapping sheet with database schema to perform data validation.

Conclusion

ETL Testing is not only a tester’s duty but it also involves developers, business analysts,
database administrators (DBA), and even the users. The ETL Testing process has become
vital as it is required to make strategic decisions at regular time intervals.

Suggested reading =>> Best ETL Automation Tools

ETL Testing is being considered as Enterprise Testing as it requires a good knowledge of


SDLC, SQL queries, ETL procedures, etc.

=>> Contact us to suggest a listing here.

Let us know if we have missed out on any tools on the above list and also suggest the ones
that you use for ETL Testing in your daily routin

You might also like