SAP BW - ETL Testing Data Warehouse Testing Tutorial (A Complete Guide)
SAP BW - ETL Testing Data Warehouse Testing Tutorial (A Complete Guide)
Tutorial #4: Business Intelligence (BI) Testing: How to Test Business Data
Today let me take a moment and explain my testing fraternity about one of the most
demanding and upcoming skills for my tester friends i.e. ETL testing (Extract, Transform, and
Load).
This tutorial will present you with a complete idea about ETL testing and what we do to
test the ETL process.
It has been observed that Independent Verification and Validation is gaining huge market
potential and many companies are now seeing this as a prospective business gain.
Customers have been offered a different range of products in terms of service offerings,
distributed in many areas based on technology, process, and solutions. ETL or data warehouse
is one of the offerings which are developing rapidly and successfully.
Through ETL process, data is fetched from the source systems, transformed as per business
rules and finally loaded to the target system (data warehouse). A data warehouse is an
enterprise-wide store which contains integrated data that aids in the business decision-making
process. It is a part of business intelligence.
Organizations with organized IT practices are looking forward to creating the next level of
technology transformation. They are now trying to make themselves much more operational
with easy-to-interoperate data.
Having said that data is the most important part of any organization, it may be everyday data
or historical data. Data is the backbone of any report and reports are the baseline on which all
vital management decisions are taken.
Most companies are taking a step forward in constructing their data warehouse to store and
monitor real-time data as well as historical data. Crafting an efficient data warehouse is not
an easy job. Many organizations have distributed departments with different applications
running on distributed technology.
ETL tool is employed in order to make a flawless integration between different data sources
from different departments.
The ETL tool will work as an integrator, extracting data from different sources; transforming
it into the preferred format based on the business transformation rules and loading it into a
cohesive DB known as Data Warehouse.
Well planned, well defined and effective testing scope guarantees smooth conversion of
the project to production. A business gains real buoyancy once the ETL processes are verified
and validated by an independent group of experts to make sure that the data warehouse is
concrete and robust.
• New Data Warehouse Testing: New DW is built and verified from scratch. Data
input is taken from customer requirements and different data sources and a new data
warehouse is built and verified with the help of ETL tools.
• Migration Testing: In this type of project, customers will have an existing DW and
ETL performing the job, but they are looking to bag new tools in order to improve
efficiency.
• Change Request: In this type of project new data is added from different sources to
an existing DW. Also, there might be a condition where customers need to change
their existing business rules or they might integrate the new rules.
• Report Testing: Report is the end result of any Data Warehouse and the basic propose
for which DW builds. The report must be tested by validating the layout, data in the
report and calculation.
ETL Process
2) Source to Target Count Testing: Make sure that the count of records loaded in the target
is matching with the expected count.
3) Source to Target Data Testing: Make sure that all projected data is loaded into the data
warehouse without any data loss or truncation.
4) Data Quality Testing: Make sure that ETL application appropriately rejects, replaces with
default values and reports invalid data.
5) Performance Testing: Make sure that data is loaded in the data warehouse within the
prescribed and expected time frames to confirm improved performance and scalability.
6) Production Validation Testing: Validate the data in the production system & compare it
against the source data.
7) Data Integration Testing: Make sure that the data from various sources has been loaded
properly to the target system and all the threshold values are checked.
8) Application Migration Testing: In this testing, ensure that the ETL application is working
fine on moving to a new box or platform.
9) Data & constraint Check: The datatype, length, index, constraints, etc. are tested in this
case.
10) Duplicate Data Check: Test if there is any duplicate data present in the target system.
Duplicate data can lead to incorrect analytical reports.
Apart from the above ETL testing methods, other testing methods like system integration
testing, user acceptance testing, incremental testing, regression testing, retesting and
navigation testing are also carried out to make sure that everything is smooth and reliable.
Similar to any other testing that lies under Independent Verification and Validation, ETL is
also going through the same phase.
• Requirement Understanding
• Validating
• Test Estimation is based on a number of tables, the complexity of rules, data volume
and performance of a job.
• Test Planning is based on the inputs from test estimation and business requirements.
We need to identify here as what is in scope and what is out of scope. We will also
look out for dependencies, risks and mitigation plans during this phase.
• Designing Test cases and Test scenarios from all the available inputs. We also need to
design mapping documents and SQL scripts.
• Once all the test cases are ready and approved, the testing team will proceed to
perform pre-execution checks and test data preparation for testing.
• Lastly, execution is performed until exit criteria are met. So, the execution phase
includes running ETL jobs, monitoring job runs, SQL script execution, defect logging,
defect retesting and regression testing.
• Upon successful completion, a summary report is prepared and the closure process is
done. In this phase, sign off is given to promote the job or code to the next phase.
The first two phases i.e., requirement understanding and validation can be regarded as pre-
steps of ETL test process.
There is a popular misunderstanding that database testing and data warehouses are similar
while the fact is that both hold different directions in testing.
• Database testing is done using a smaller scale of data normally with OLTP (Online
transaction processing) type of databases while data warehouse testing is done with
large volume with data involving OLAP (online analytical processing) databases.
• In database testing, normally data is consistently injected from uniform sources while
in data warehouse testing most of the data comes from different kind of data sources
which are sequentially inconsistent.
• We generally only perform CRUD (Create, read, update and delete) operations during
database testing while in data warehouse testing we use read-only (Select) operation.
• Normalized databases are used in DB testing while demoralized DB is used in data
warehouse testing.
There are a number of universal verifications that have to be carried out for any kind of data
warehouse testing.
Given below is the list of objects that are treated as essential for validation in this
testing:
=> Know the difference between ETL/Data warehouse testing & Database Testing.
This testing is quite different from conventional testing. Many challenges are faced while
performing data warehouse testing.
Data is important for businesses to make critical business decisions. ETL testing plays a
significant role in validating and ensuring that the business information is accurate, consistent
and reliable. It also minimizes the hazard of data loss in production.
Hope these tips will help you ensure that your ETL process is accurate and the data
warehouse built by this is a competitive advantage for your business.
This is a guest post by Vishal Chhaperia who is working in an MNC in a test management
role. He has extensive experience in managing multi-technology QA projects, Processes and
teams.
Have you worked on ETL testing? Please share your ETL/DW testing tips and
challenges below.
It is a known fact that ETL testing is one of the crucial aspects of any Business Intelligence
(BI) based application. In order to get the quality assurance and acceptance to go live in
business, the BI application should be tested well beforehand.
The primary objective of ETL testing is to ensure that the Extract, Transform & Load
functionality is working as per the business requirements and in sync with the performance
standards.
Before we dig into ETL Testing with Informatica, it is essential to know what ETL and
Informatica are.
In computing, Extract, Transform, Load (ETL) refers to a process in database usage and
especially in data warehousing that performs:
It is a single, unified enterprise data integration platform for accessing, discovering, and
integrating data from virtually any business system, in any format and delivering that data
throughout the enterprise at any speed. Through Informatica PowerCenter, we create
workflows that perform end to end ETL operations.
To install and configure Informatica PowerCenter 9.x use the below link that has step by step
instructions:
=> Informatica PowerCenter 9 Installation and Configuration Guide
ETL testers often have pertinent questions about what to test in Informatica and how much
test coverage is needed?
Let me take you through a tour on how to perform ETL testing specific to Informatica.
The main aspects which should be essentially covered in Informatica ETL testing are:
• Testing the functionality of Informatica workflow and its components; all the transformations
used in the underlying mappings.
• To check the data completeness (i.e. ensuring if the projected data is getting loaded to the
target without any truncation and data loss),
• Verifying if the data is getting loaded to the target within estimated time limits (i.e.
evaluating performance of the workflow),
• Ensuring that the workflow does not allow any invalid or unwanted data to be loaded in the
target.
For better understanding and ease of the tester, ETL testing in Informatica can be divided into
two main parts –
• You can check if the Informatica workflow and related objects are valid or not.
• Verify if the workflow is getting completed successfully on running.
• Confirm if all the required sessions/tasks are being executed in the workflow.
• Validate if the data is getting loaded to the desired target directory and with the expected
filename (in case the workflow is creating a file), etc.
In a nutshell, you can say that the high-level testing includes all the basic sanity checks.
Coming to the next part i.e. detailed testing in Informatica, you will be going in depth to
validate if the logic implemented in Informatica is working as expected in terms of its results
and performance.
• You need to do the output data validations at the field level which will confirm that each
transformation is operating fine
• Verify if the record count at each level of processing and finally if the target is as expected.
• Monitor thoroughly elements like source qualifier and target in source/target statistics of
session
• Ensure that the run duration of the Informatica workflow is at par with the estimated run
time.
To sum up, we can say that the detailed testing includes a rigorous end to end validation of
Informatica workflow and the related flow of data.
We have a flat file that contains data about different products. It stores details like the name
of the product, its description, category, date of expiry, price, etc.
My requirement is to fetch each product record from the file, generate a unique product id
corresponding to each record and load it into the target database table. I also need to
suppress those products which either belong to the category ‘C’ or whose expiry date is less
than the current date.
Based on my requirements stated above, my database table (Target) should look like this:
Prod_ID (Primary
Product_name Prod_description Prod_category Prod_expiry_date Prod_price
Key)
This is product
1003 PQRS M 5/23/2019 1500
PQRS.
Now, say, we have developed an Informatica workflow to get the solution for my ETL
requirements.
The underlying Informatica mapping will read data from the flat file, pass the data through a
router transformation that will discard rows which either have product category as ‘C’ or
expiry date, then I will be using a sequence generate to create the unique primary key values
for Prod_ID column in Product Table.
Finally, the records will be loaded to Product table which is the target for my Informatica
mapping.
Examples:
Below are the sample test cases for the scenario explained above.
You can use these test cases as a template in your Informatica testing project and add/remove
similar test cases depending upon the functionality of your workflow.
Test Procedure:
• Go to workflow manager
• Open workflow
• Workflows menu-> click on validate
Input Value/Test Data: Sources and targets are available and connected
Sources: [all source instances name]
Mappings: [all mappings name]
Targets: [all target instances name]
Session: [all sessions name]
Remarks: Pass
Tester Comments:
#2) Test Case ID: T002
Test Procedure:
• Go to workflow manager
• Open workflow
• Right click in workflow designer and select Start workflow
• Check status in Workflow Monitor
Expected Results: Message in the output window in Workflow manager: Task Update:
[workflow_name] (Succeeded)
Actual Results: Message in the output window in Workflow manager: Task Update:
[workflow_name] (Succeeded)
Remarks: Pass
Note: You can easily see the workflow run status (failed/succeeded) in Workflow monitor as
shown in below example. Once the workflow will be completed, the status will reflect
automatically in workflow monitor.
In the above screenshot, you can see the start time and end time of workflow as well as the
status as succeeded.
Test Case Purpose: To validate if the desired number of records are getting loaded to target
Test Procedure: Once the workflow has run successfully, go to the target table in database
Check the number of rows in target database table
Remarks: Pass
Tester Comments:
Test Case Purpose: To check if sequence generator in Informatica mapping is working fine
for populating [primary_key_column_name e.g. Prod_ID] column
Test Procedure: Once the workflow has run successfully, go to the target table in database
Check the unique sequence generated in column Prod_ID
Input Value/Test Data: value for Prod_ID left blank for every row in source file
Sequence Generator mapped to Prod_ID column in the mapping
Sequence generator start value set as 1001
Target: database table- [Tbl_Product] opened in SQL Server
Expected Results: Value from 1001 to 1003 populated against every row for Prod_ID
column
Actual Results: Value from 1001 to 1003 populated against every row for Prod_ID column
Remarks: Pass
Tester Comments:
Test Case Purpose: To validate if router transformation is working fine to suppress records
in case the product category is ‘C’ or the product has got expired.
Test Procedure: Once the workflow has run successfully, go to the target table in database
Run the query on the target table to check if the desired records have got suppressed.
Remarks: Pass
Test Case Purpose: To check the performance of the workflow by recording the workflow
runtime.
Test Procedure:
• Open the workflow monitor and go the run that was done as part of T001.
• Record the start time and end time of workflow.
• Calculate total run time by subtracting start time from end time.
Remarks: Pass
Tester Comments: Considering the test as ‘Pass’ in case the actual run duration is +/- 10% of
expected run duration.
Test Case Purpose: To validate data at target table column level in order to ensure that there
is no data loss.
Test Procedure: Once the workflow has run successfully, go to the SQL Server.
Run the query on the target table to check there is no data loss.
Expected Results:
1 row returned
Prod_ID (Primary
Product_name Prod_description Prod_category Prod_expiry_date Prod_price
Key)
This is product
1001 ABC M 8/14/2017 150
ABC.
Actual Results:
1 row returned.
Prod_ID (Primary
Product_name Prod_description Prod_category Prod_expiry_date Prod_price
Key)
This is product
1001 ABC M 8/14/2017 150
ABC.
Remarks: Pass
Tester Comments: Considering the test as ‘Pass’ in case the actual run duration is +/- 10% of
expected run duration.
Conclusion:
So, we have seen in detail, some of the sample test cases that can be used as a template to
cover ETL testing in Informatica. As I mentioned earlier, you can add/remove/modify these
test cases depending on the scenario you have in your project.
As I mentioned earlier, you can add/remove/modify these test cases depending on the scenario
you have in your project.
Recommended reading => ETL vs. DB Testing – A Closer Look at ETL Testing Need
About the author: This is a guest article by Priya K. She is having 4+ years of hands-on
experience in developing and supporting Informatica ETL applications.
Recommended Reading
The Differences Between Unit Testing, Integration Testing and Functional Testing
How to Get More Interview Calls? 4 Reasons Why You Are Not Right Now
1. Devanath
at
Really useful and nice info. Informatica installation steps are very clear.
Reply
2. Sumit
at
Reply
3. ganesh
at
..
Reply
4. Priya Kaushal
at
Hi Sumit,
To answer your question, below is the list of all the ETL tools:
If you are a software tester, I would like to add that Informatica Data Validation
Option provides an ETL testing tool that can accelerate and automate ETL testing in
both production environments and development & test. This means that you can
deliver complete, repeatable and auditable test coverage in less time with no
programming skills required.
Reply
o Rani Basava
at
Hi Priya,
I want to learn about the Informatica data validation option tool.
Thanks,
Rani
Reply
5. prasanna
at
Hi Priya,
i want to learn ETL testing and informatica tool, can you please let me know what
kind of prior knowledge should help me to understand all these concepts sooner.
Reply
6. Raj
at
Reply
7. Aaradhya
at
Hi,
Nice blog about How to Perform ETL Testing Using Informatica PowerCenter Tool.
Can you explain about How can you define informatica powercenter in a very detailed
manner?
Thanks,
Aaradhya,
Reply
8. Jinka Venkatesh
at
Hi Team,
Reply
9. sanjay bhachand
at
Hi Priya K.,
Thank you share yor exp. with us, priya i need your help, i facing problem to clear a
informatica Developer Interview, i always rejected after first round, and not getting
exactly reason. so please share to me about how to prepare and what is flow of
preparation.
please share all things :[email protected]
Reply
10. Gowthami
at
Hi Priya K.,
Reply
11. Gowtham
at
Hi priya,
Thanks in advance.
Reply
12. R Pradhan
at
Hi ,
We are currently upgrading our data warehouses to Oracle . As part of the upgrade
testing, we are required to validate the Informatica workflows that ETL the data into
the databases. We are new to this kind of thing and would therefore like to know what
to lookout for when testing. Also any hints on drawing up a test plan around this
testing will be helpful as this has never been done in our organisation.
Any help and guidance will be much appreciated.
Reply
13. uma
at
Hi Priya,
Nice Blog thanks for sharing nice article on informatica testing this is first time i came
across in detail info.
Reply
14. Abhishek
at
Hi Priya,
Can you suggest a course / website for learning Informatica ETL Testing? Can you
please share the details here [email protected]
About SoftwareTestingHelp
Helping our community since 2006! Most popular portal for Software professionals with 300
million+ visits and 400,000+ followers! You will absolutely love our creative content on QA, Dev,
Software Tools & Services Reviews!
Recommended Reading
Software Testing has a variety of areas to be concentrated. Major varieties are functional and
non-functional testing. Functional Testing is the procedural way to ensure so that the
functionality developed works as expected. Non-functional testing is the approach by which
the non-functional aspects like enhanced or performance at an acceptable level can be
ensured.
There is another flavor of testing called DB testing. Data is organized in the database in the
form of tables. For business, there can be flows where the data from the multiple tables can be
merged or processed on to a single table and vice versa.
ETL Testing is one another kind of testing that is preferred in the business case where a kind
of reporting need is sought by the clients. The reporting is sought in order to analyze the
demands, needs and the supply so that clients, business and the end-users are very well served
and benefited.
In this tutorial, you will learn what is Database Testing, what is ETL Testing, a difference
between DB Testing and ETL Testing, and more details about ETL testing need, process, and
planning with real examples.
We have also covered ETL Testing in more detail on the below page. Also, have a look at it.
=> ETL Testing / Data Warehouse Testing Tips and Techniques
Most of us are a little confused over considering that both database testing and the ETL
testing are similar and the same. The fact is they are similar but not the same.
DB Testing:
DB Testing is usually used extensively in the business flows where there are multiple data
flows occurring in the application from multiple data sources on to a single table. The data
source can be a table, flat file, application or anything else that can yield some output data.
In turn, the output data obtained can still be used as input for the sequential business flow.
Hence when we perform DB testing the most important thing that has to be captured is the
way the data can get transformed from the source along with how it gets saved in the
destination location.
Synchronization is one major and the essential thing that has to be considered when
performing the DB Testing. Due to the positioning of the application in the architectural flow,
there might be few issues with the data or DB synchronization. Hence while performing the
testing, this has to be taken care of as this can overcome the potential invalid defects or bugs.
Example #1:
Project “A” has integrated architecture where the particular application makes use of data
from several other heterogeneous data sources. Hence the integrity of these data with the
destination location has to be done along with the validations for the following:
• Primary foreign key validation
• Column values integrity
• Null values for any columns
ETL Testing is a special type of testing that the client wants to have it done for their
forecasting and analysis of their business. This is mostly used for reporting purposes. For
instance, if the clients need to have reported on the customers who use or go for their product
based on the day they purchase, they have to make use of the ETL reports.
Post analysis and reporting, this data is data warehoused to a data warehouse where the old
historical business data has to be moved.
This is a multiple level testing as the data from the source is transformed into multiple
environments before it reaches the final destined location.
Example #2:
We will consider a group “A” doing retail customer business through a shopping market
where the customer can purchase any household items required for their day to day
survival. Here all the customers visiting are provided with a unique membership id with
which they can gain points every time they come to purchase things from the shopping
market.
The regulations provided by the group say that the points gained expire every year. And
depending upon their usage, the membership can be either upgraded to a higher grade
member or downgraded to a lower grade member comparatively to the current grade.
After 5 years of shopping market establishment now management is looking for scaling up
their business along with revenue.
Hence they required few business reports so that they can promote their customers.
#2) Manipulations like Inserting, Updating and Deletion of the customer data can be
performed on any end-user POS application in an integrated system along with the back-end
database so that the same changes are reflected in the end system.
#3) DB testing has to ensure that there is no customer data that has been misinterpreted or
even truncated. This might lead to serious issues like incorrect mapping of customer data with
their loyalty
#1) Assuming there are 100 customers in the source, you will check whether all these
customers along with their data from the 100 rows have been moved from the source system
to the target. This is known as verification of Data completeness check.
#2) Checking if the customer data has been properly manipulated and demonstrated in the 100
rows. This is simply called verification of Data Accuracy check.
#3) Reports for the customers who have gained points more than x values within a particular
period.
ETL and DB testing have few of the aspects differing within themselves that is more essential
to be understood before performing them. This helps us in understanding the values and
significance of the testing and the way it helps the business.
Following is a tabular form that describes the basic behavior of both the testing formats.
Applicable In the functional system where the External to the business flow environment.
place business flow occurs input is the historical business data
Automation
QTP, Selenium Informatica, QuerySurge, COGNOS
tool
Severe impacts can lead as it is the Potential impacts as in when the clients
Business
integrated architecture of the business wants to have the forecasting and analysis to
impact
flows be done
Modelling
Entity Relationship Dimensional
used
Data Nature Normalized data is being used here Denormalized data is being used here
Plenty of business needs are available for them to consider ETL testing. Every business has to
have its unique mission and the line of business. All business has its product life cycle which
takes the generic form:
It is very clear that any new product enters the market with tremendous growth in sales and
till a stage called maturity and thereafter it declines in sales. This gradual change witnesses a
definite drop in business growth. Hence it is more important to analyze the customer needs for
the business growth and other factors required to make the organization more profitable.
So in reality, the clients want to analyze the historical data and come up with some reports
strategically.
ETL Test Planning
One of the main steps in ETL testing is about planning the test that is going to be executed. It
will be similar to the Test Plan for the System Testing that is usually performed except few
attributes like requirements and test cases.
Here the requirements are nothing but a mapping sheet that will have kind of mapping
between data within different databases. As we are aware that the ETL testing occurs on
multiple levels, there are various mappings needed for validating this.
Most of the time the data is captured from the source databases are not directly. All the source
data will have the tables’ view from where the data can be used.
Example: Following is an example of how the mappings can be provided. The two columns
VIEW_NAME and TABLE_NAME can be used to represent the views for reading data from
the source and the table in the ETL environment respectively.
It is advisable to maintain the naming convention that can help us while planning for
automation. Generic notation that can be used is just prefixing the name of the environment.
The most significant thing in ETL is about identifying the essential data and the tables from
the source. The next essential step is the mapping of tables from the source to the ETL
environment.
Following is an example of how the mapping between the tables from the various
environments can be related to the ETL purpose.
The above mapping assumes the data from the source table to the staging table. And from
then on to the tables in EDW and then to OLAP which is the final reporting environment.
Hence at any point in time, data synchronization is very important for the ETL’s sake.
As we understand ETL is the need for forecasting, reporting and analyzing the business in
order to capture the customer needs in a more successive manner. This will enable the
business to have higher demands than in the past.
Here are few of the critical needs without which ETL testing cannot be achieved:
1. Data and tables identification: This is important as there can be many other irrelevant and
unnecessary data that can be of least importance when forecasting and analyze the
customer needs. Hence the relevant data and the tables have to be selected before starting
up the ETL works.
2. Mapping sheet: This is one of the critical needs while doing ETL works. Mapping of the right
table from the source to the destination is mandatory and any problems or incorrect data in
this sheet might impact the whole ETL deliverable.
3. Table designs and data, column type: This is the next major step when considering the
mapping of source tables into the destined tables. The column type has to match with the
tables at both the places etc.
4. Database access: The main thing is access to the database where ETL goes on. Any
restrictions on the access will have an equivalent impact.
Reporting in ETL is more important as it explains and directs the clients the customer needs.
By this, they can forecast and analyze the exact customer needs
Example #3:
A company which manufactures silk fabric wanted to analyze their annual sales. On review
of their annual sales, they found during the month of August and September there was a
tremendous fall in sales with the use of the report they generated.
Hence they decided to roll out the promotional offer like the exchange, discounts, etc., that
enhanced their sales.
There can be a number of issues while performing ETL testing like the following:
• Either the access to the source tables or the views will not be valid.
• The column name and the data type from the source to the next layer might not match.
• A number of records from the source table to the destined tabled might not match.
Following is a sample of mapping sheet where there are columns like VIEW_NAME,
COLUMN_NAME, DATA_TYPE, TABLE_NAME, COLUMN_NAME, DATA_TYPE, and
TRANSFORMATION LOGIC present.
The first 3 columns represent the details of the source database and the next 3 are the details
for the immediate preceding database. The last column is very important. Transformation
logic is the way the data from the source is read and stored in the destined database. This
depends on the business and ETL needs.
Points To Remember While ETL Test Planning And Execution
The most important thing in ETL testing is the loading of data based on the extraction criteria
from the source DB. When this criterion is invalid or obsolete then there will be no data in the
table to perform ETL testing that really brings in more issues.
Following are a few of the points to be taken care while ETL Test Planning and
Execution:
• DBMS
• OS
• Hardware
• Communication protocols
#3) Necessity in having a logical data mapping sheet before the physical data can be
transformed
#4) Understanding and examining of the data sources
#5) Initial load and the incremental load
#6) Audit columns
#7) Loading the facts and the dimensions
ETL tools are basically used to build and convert the transformation logic by taking data
from the source into another applying the transformation logic. You can also map the schemas
from the source to the destination which occurs in unique ways, transform and clean up data
before it can be moved to the destination, along with loading at the destination in an efficient
manner.
This can significantly reduce the manual efforts as the mapping can be done that is used for
almost all of the ETL validation and verification.
ETL tools:
1. Informatica – PowerCenter – is one of the popular ETL tools that is introduced by the
Informatica Corporation. This has a very good customer base covering wide areas. The major
components of the tool are its tools for clients and the repository tools and the servers. To
know more about the tool please click here
2. IBM – Infosphere Information Server – IBM who is the market leader in terms of Computer
technology has developed the Infosphere Information server that is used for Information
Integration and Management in the year 2008. To know more about the tool please click
here
3. Oracle – Data Integrator – Oracle Corporation has developed its ETL tool in the name of
Oracle – Data Integrator. Their increasing customer support has made them update their ETL
tools in various versions. To know more about the tool please click here
Considering some Airlines which want to roll out promotions and offers to attract the
customers strategically. Firstly they will try to understand the demands and needs of the
customer’s specifications. In order to achieve this, they will require the historical data
preferably the previous 2 years’ data. Using the data they will analyze and prepare some
reports that will be helpful in understanding the customers’ needs.
Analyzing these reports will help the clients in identifying the kind of promotions and offers
that will benefit the customers and at the same time can benefit businesses where this can
become a Win-Win situation. This can be easily achieved by ETL testing and reports.
In parallel, the IT segment faces a serious DB issue that has been noticed that has stopped
multiple services, in turn, has the potential to cause impacts in the business. On investigation,
it was identified that some invalid data has corrupted a few databases that needed to be
corrected manually.
In the former case, it is ETL reports and testing that will be required.
Whereas the latter case is where the DB testing has to be done properly to overcome issues
with invalid data.
Conclusion
Hope the above tutorial has provided a simple and clear overview of what ETL testing is and
why it has to be done along with the business impacts or benefits they yield. This does not
stop here, but it can extend to set foresight in growth in business.
About the author: This tutorial is written by Nagarajan. He is a Test Lead with over 6 years
of Software Testing experience in various functional areas like Banking, Airlines, and
Telecom in terms of both manual and automation.
Recommended Reading
• ETL Testing Interview Questions and Answers
• ETL Testing Data Warehouse Testing Tutorial (A Complete Guide)
• 10 Best ETL Testing Tools in 2023 [TOP SELECTIVE]
• How to Perform ETL Testing Using Informatica PowerCenter Tool
• 31 Top Database Testing Interview Questions and Answers
• 40+ Best Database Testing Tools - Popular Data Testing Solutions
• Database Testing Complete Guide (Why, What, and How to Test Data)
• Selenium Database Testing (Using WebDriver and JDBC API)
1. Surbhi
at
Reply
2. Sathish
at
Reply
3. Kunjal Gandhi
at
Awesome explanation!!
Reply
at
Reply
5. Njuneki Jayne
at
Reply
6. rinku
at
Reply
at
Thanks for your comments. You can always get back to us regarding any clarifications
from Software Testing.
-Nagarajan
Reply
at
@Surbhi,
We are Planning to conduct training on ETL testing as well. Will keep posted on our
training’s calendar.. Please keep checking!!
-Nagarajan
Reply
9. Boxfish
at
I think your website have a problem with html link. i didn’t see any images on this
page, fyi.
Reply
10. Nandhini
at
Reply
at
Nice post!!!
But after reading you article I thought its a kind of ETL testing.
Can you explain a little….
Reply
12. Tester
at
Hi, I have one suggestion for your web site. If one left clicks on any image on web
page it gets open in the entire page and we need to go back to the page. eg. Check
page – https://www.softwaretestinghelp.com/etl-testing-vs-db-testing/. It would be
much better if your image will not respond to left click and rather provide this option
in right click.
Reply
13. Neelakanta
at
Nice Explanation
Reply
at
Thanks for such a good exlplanation. Are you conducting ETL testing also. If yes then
plz update me on [email protected].
Reply
15. Vasu
at
Which one is best of these two etl and db testing. Just suggest ne….
Reply
16. Saritha
at
This article provided required info for further progress in ETL Testing in initial stage
Reply
17. mallesh
at
Fully clarified about ETL testing and Data Warehouse thank you very much for your
efforts to help others…..
Reply
18. James
at
About the Automation tool for DB Testing & ETL testing as you suggested:
– I completely agree: automation tools (for ETL testing) are Informatica, QuerySurge,
COGNOS, …
– But in the automation tools (for Database testing): I don’t think QTP, Selenium are
the automation tool for Database testing. Because if you research well about the nature
of QTP or Selenium, you’ll find out that the main function of these tools focus to
automation GUI (rather than communicating directly to the database).
**Example: We have built an Framework X to test a website with the data shown on
website are getting from a specific Database (SQL Server). The Framework X use
Selenium WebDriver API for interacting on GUI (such as: click, mouse move, assert
text, …). Besides for the communicating directly to the database, the Framework X
use the SqlClient API.
==> As you see, for the testing a value shown on website need the combination both
SqlClient API & Selenium WebDriver API. The SqlClient API to help you get
expected data from Database, the WebDriver API help you to get the current value on
GUI for testing as an actual result. Base on the expected & actual result, you’ll make a
comparison via two ways: 1-use assert function of Selenium WebDriver API for
comparison; 2-use nature assert function of Framework X for comparison.
==> Let’s get to the core of the discussion: I think that for the automation test
Database.
– We can’t focus to 1 specific Automation GUI API for Database testing.
– We should focus to many things, such as: Automation GUI API, Communication
database API, etc … instead of only focusing on Automation GUI API.
Many thanks,
Reply
19. Mukesh
at
Reply
20. Gopi
at
Hi Nagaraj, do you conduct online training on ETL. If yes please share me the details
to my email @ [email protected]. I am very much impressed the way you
explained on the difference between ETL and DB testing.
Reply
21. swetha
at
if ETL testing has been conducted or planned to conduct please keep me informed.
QuerySurge i am looking.
Reply
4.Tuto 4:The 4 Steps to Business
Intelligence (BI) Testing: How to Test
Business Data
June 24, 2023
Business Intelligence (BI) is a process of gathering, analyzing, and transforming raw data into
accurate, efficient, and meaningful information which can be used to make wise business
decisions and refine business strategy.
BI gives organizations a sense of clairvoyance. Only the perception is not fueled by extra-
sensory ability but by facts.
Business Intelligence testing initiatives help companies gain deeper and better insights so they
can manage or make decisions based on hard facts or data.
The way this is done has changed considerably in the current day’s market. What used to be
offline reports and such is now live business integration.
BI is not achieved with one tool or via one system. It is a collection of applications,
technologies, and components that make up the entire implementation.
To simplify and show you the flow of events:
User transactional data (Relational database, or OLTP) Flat file, records or other
formats of data etc. -> ETL processes-> Data Warehouse->Data Mart->OLAP
additional sorting, categorizing, filtering etc. provide meaningful insights – BI.
Business Integration is when this analytics effect the way a certain application works.
For example, Your Credit Card might not work at a new location because BI alerts the
application that it is an unusual transaction. This has happened to me once. I was at an art
exhibition where there were artisans from different parts of the US. I used my credit card to
buy a few things, but it would not go through because the seller was registered from a part of
US that my credit card was never used at. This is an example of BI integration to prevent
fraud.
Recommended product on Amazon or other retail sites, related videos on video sites etc. are
other examples of Business Integration of BI.
From the above flow, it is also apparent that ETL and storage systems are important to
successful BI implementation. Which is why, BI testing is never an independent event. It
involves ETL and Data warehouse testing as integral elements. And as testers, it is important
to understand and know more about how to test these.
STH has you covered there. We have articles that talk about these concepts. I will provide the
links below so we can get those out of the way and focus on BI alone.
• ETL Testing / Data Warehouse Testing – Tips, Techniques, Process and Challenges
• ETL vs. DB Testing – A Closer Look at ETL Testing Need, Planning and ETL Tools
One more thing that Business Intelligence testing experts almost always recommend is:
Testing the entire flow, right from when the time data gets taken from the source all the
way to the end. Do not just test for the reports and analytics at the end alone.
Business Data usually does not come from one source and in one format alone. Make sure that
the source and the type of data that it sends matches. Also, do a basic validation right here.
Let us say a student’s details are sent from a source for subsequent processing and storage.
Make sure that the details are correct, right at this point itself. If the GPA shows as 7, this is
clearly over than the 5 point system. So, such data can be discarded or corrected right here
itself without taking it for further processing.
This is where the raw data gets processed into business targeted information.
• The source and destination data types should match. E.g.: You can’t store the date as
text.
• Primary key, foreign key, null, default value constraints, etc. should be intact.
• The ACID properties of source and destination should be validated, etc.
The actual scripts that load the data and testing them would be definitely included in your
ETL testing. The data storage system, however, has to be validated for the following:
• Performance: As systems become more intricate, there are relationships formed
between multiple entities to make several co-relations. This is great news for data
analytics, however, this kind of complexity often results in queries taking too long to
retrieve results. Therefore, performance testing plays an important role here.
• Scalability: Data is only going to increase not decrease. Therefore, tests have to be
done to make sure that the size of the growing business and data volumes can be
handled by the current implementation or not. This also includes testing the archival
strategy too. Basically, you are trying to test the decision- “What happens to older data
and what if I need it?”
It is also a good idea to test the other aspects such as its computational abilities, recovery from
failure, error logging, exception handling, etc.
This is what is considered Business Intelligence. But, as you can see from the above, the
reports are never going to be correct, consistent and fast if your preceding layers were
malfunctioning.
BI Testing Strategy:
Now that we know what to test and resources for ETL and Data Warehouse testing, let’s look
at what process the tester’s need to follow.
Simple, a BI testing project is a testing project too. That means the typical stages of testing are
applicable here too, whether it is the performance you are testing or functional end to end
testing:
• Test planning
• Test strategy
• Test design (Your Test cases will be query intensive rather than plain text based. This
is the ONE major difference between your typical test projects to an ETL/Data
Warehouse/BI testing project.)
• Test execution (Once again, you are going to need some querying interface such as
TOAD to run your queries)
• Defect reporting, closure etc.
Conclusion:
BI is an integral element of all business areas. E-Commerce, Health Care, Education,
Entertainment and every other business relies on BI to know their business better and to
provide a killer experience to their users.
We hope this article gave you the necessary information to explore Business Intelligence
testing area much further.
About the author: This post is written by STH team member Swati.
Have you been a BI tester? Please do share your experiences, comments and questions
below.
Recommended Reading
1. Naga Sekhar
at
Reply
2. Sheetal patil
at
Hi
this is a good article on BI test. Very few and best testers get the opportunity of
business data testing.
Reply
3. Suman prabha
at
good article. is there any tool for this testing or its manual only?
Reply
4. Swati Seela
at
Reply
5. Swati Seela
at
Reply
6. Swati Seela
at
@Suman prabha: All the tools used for BI can be used for testing. However, if the BI
insights are on point or not, is only decided by people. So yes and no! Thanks for
stopping by..
Reply
7. JonesRay
at
As a software tester this one is useful to know about the way to test the business
related issues, let me implement your useful tips.
Reply
8. Ratna Reddy
at
It’s good and valuable information. It’s very useful for me also
Reply
9. bee
at
Hi
Good article. Is there anyone worked with infrastructure testing, how to do this? Can
anyone highlight about this?
Reply
10. Deepa
at
Reply
11. Mario
at
Reply
12. Shilpa
at
BI Report testing is very different and very challenging…every single data on the
report has to be “PERFECT” else its a BIG BLUNDER
Reply
13. shivansh
at
Is there any way to switch to bi testing after having experience of an year or two in
performance testing? I got trained in BI but put into PT.. I’d like to switch if possible
in future.
Reply
14. Ramkumar
at
Reply
15. Chandra
at
Reply
16. Shilpa
at
BI-Testing is very important since Major decisions are made using these Reports.
Testing the output/Reports with the Raw Data is “HUGE RESPONSIBLE TASK” ….
Manual Testing is best for a Reports , it requires a “Different Skills” than normal
testing
well said about “Testing at the Early Stage” ….
Reply
at
Reply
18. Sivaprasad R
at
I got an opportunity to do SAP BI 3. Testing. It’s not at all easy. It was the purely
manual end to end testing. After completion of the test, you will be expert in all
domains like SD, MM, Inventory etc. The ultimate goal of SAP BI is BEX and
reporting.
It means Bus. Explorer Query generation. So if you need a query you must have to
clearly code the tables from various modules. Eg. One day’s Sales. We need SD
module and its corresponding tables. DSOs and Infocubes are data stagers in SAP BI.
T-codes will play major roles in SAP.
Reply
o Roma
at
Reply
at
I have actually used the ways mentioned and they work. Thanks for sharing.
Reply
5.Tuto5:10 Best ETL Testing Tools in 2023
[TOP SELECTIVE]
September 4, 2023
Almost all the IT companies today, highly depend on data flow as a large amount of
information is made available for access and one can get everything that is required.
This is where the concept of ETL and ETL Testing comes into the picture. Basically, ETL is
abbreviated as Extraction, Transformation, and Loading. Presently ETL Testing is performed
using SQL scripting or using spreadsheets which may be a time-consuming and error-prone
approach.
In this article, we will have detailed discussions on several concepts viz. ETL, ETL Process,
ETL testing, and different approaches used for it along with the most popular ETL testing
tools.
#1) As mentioned previously ETL stands for Extraction, Transformation, and Loading is
considered to be the three prime database functions.
#2) ETL is used to transfer or migrate the data from one database to another, to prepare data
marts or data warehouses.
Like automation testing, ETL Testing can also be automated. Automated ETL Testing reduces
time consumption during the testing process and helps to maintain accuracy.
Few ETL Testing Automation Tools are used to perform ETL Testing more effectively and
rapidly.
1. RightData
2. Integrate.io
3. iCEDQ
4. BiG EVAL
5. Informatica Data Validation
6. QuerySurge
7. Datagaps ETL Validator
8. QualiDI
9. Talend Open Studio for Data Integration
10. Codoid’s ETL Testing Services
11. Data Centric Testing
12. SSISTester
13. TestBench
14. DataQ
#1) RightData
RDt is a self-service ETL/Data Integrations testing tool designed to help business and
technology teams with the automation of data quality assurance and data quality control
processes.
RDt’s intuitive interface allows users to validate and reconcile data between datasets
regardless of the differences in the data model or the data source type. It is designed to work
efficiently for data platforms with high complexity and huge volumes.
Key Features:
• Powerful universal query studio where users can perform queries on any data source
(RDBMS, SAP, Files, Bigdata, Dashboards, Reports, Rest APIs, etc.), explore metadata, analyze
data, discover data by data profiling, prepare by performing transformations and cleansing,
and snapshot data to assist with data reconciliation, business rules, and transformations
validation.
• Using RDt, users can perform field-to-field data comparisons regardless of the differences in
the data model and structure between source and target.
• It comes with a pre-delivered set of validation rules along with a custom business rule
builder.
• RDt has bulk comparison capacities to facilitate technical data reconciliation across the
project landscape (e.g. compare production environment data with UAT, etc.)
• Robust alerting and notification capabilities starting from emails through automatic creation
of defect/incident management tools of your choice.
• RDt’s data quality metrics and data quality dimension dashboard allow data platform owners
an insight into the health of their data platform with drill-down capabilities into the scenarios
and exact records and fields causing the validation failures.
• RDt can be used for testing analytics/BI tools like Tableau, Power BI, Qlik, SSRS, Business
Objects Webi, SAP Bex, etc.
• RDt’s two-way integration with CICD tools (Jenkins, Jira, BitBucket, etc.) assists your data
team’s journey of DevOps enablement through DataOps.
#2) Integrate.io
Integrate.io is a data integration, ETL, and ELT platform. This cloud-based platform will
streamline data processing. It provides an intuitive graphic interface to implement an ETL,
ELT, or a replication solution. With Integrate.io you will be able to perform out-of-the-box
data transformations.
Key Features:
• Integrate.io’s workflow engine will help you to orchestrate and schedule data pipelines.
• You will be able to implement complex data preparation functions by using rich expression
language.
• It has the functionalities to schedule jobs, monitor job progress, and status as well as sample
data outputs, and ensure correctness and validity.
• Integrate.io’s platform will let you integrate data from more than 100 data stores and SaaS
applications.
• Integrate.io offers both low-code or no-code options.
#3) iCEDQ
iCEDQ enables Left Shift Approach, which is central to DataOps. We recommend starting
early in the non-production phase to test data and continuously monitor the production data.
iCEDQ’s rules-based approach empowers users to automate ETL Testing, Cloud Data
Migration Testing, Big Data Testing, and Product Data Monitoring.
Key Features:
BiG EVAL is a comprehensive suite of software tools aimed at leveraging the value of
enterprise data by continuously validating and monitoring quality. It automates testing tasks
during ETL and DWH development and provides quality metrics in production.
Features:
• Autopilot testing for agile development, driven by metadata from your database or metadata
repository.
• Data Quality Measuring and Assisted Problem Solving.
• High-performance in-memory scripting and rules engine.
• Abstraction of any kind of data (RDBMS, APIs, Flatfiles, Business applications cloud / on-
premises).
• Clear modern dashboards and alerting processes.
• Embeddable into DevOps CI/CD flows, ticket systems, and more.
• BiG EVAL checks data against your very own and scenario-specific quality criteria.
• User-defined test cases give great flexibility when you need your own testing algorithms.
This type of testing ensures data integrity, i.e. the volume of data is correctly loaded and is in
the expected format into the destination system.
Key Features:
• Informatica Validation tool is a comprehensive ETL Testing tool which does not require any
programming skill.
• It provides automation during ETL testing which ensures if the data is delivered correctly and
is in the expected format into the destination system.
• It helps to complete data validation and reconciliation in the testing and production
environment.
• It reduces the risk of introducing errors during transformation and avoids bad data being
transformed into the destination system.
• Informatica Data Validation is useful in the Development, Testing and Production
environment where it is necessary to validate the data integrity before moving into the
production system.
• 50 to 90% of cost and effort can be saved using the Informatica Data Validation tool.
• Informatica Data Validation provides a complete solution for data validation along with data
integrity.
• Reduces programming efforts and business risks due to an intuitive user interface and built-
in operators.
• Identifies and prevents data quality issues and provide greater business productivity.
• Allows 64% free trial and 36% paid service that reduces the time and cost required for data
validation.
#6) QuerySurge
QuerySurge tool is specifically built for testing of Big Data and Data warehouse. It ensures
that the data extracted and loaded from the source system to the destination system is correct
and is as per the expected format. Any issues or differences are identified very quickly by
QuerySurge.
Key Features:
• QuerySurge is an automated tool for Big Data Testing and ETL Testing.
• It improves the data quality and accelerates testing cycles.
• It validates data using the Query Wizard.
• It saves time & cost by automating manual efforts and schedules tests for a specific time.
• QuerySurge supports ETL Testing across various platforms like IBM, Oracle, Microsoft, SAP.
• It helps to build test scenarios and test suit along with configurable reports without specific
knowledge of SQL.
• It generates email reports through an automated process.
• Reusable query snippet to generate reusable code.
• It provides a collaborative view of data health.
• QuerySurge can be integrated with HP ALM, TFS, IBM Rational Quality Manager.
• Verifies, converts, and upgrades data through the ETL process.
• It is a commercial tool that connects source and target data and also supports real-time
progress of test scenarios.
ETL Validator tool is designed for ETL Testing and Big Data Testing. It is a solution for data
integration projects. The testing of such data integration project includes various data types,
huge volume, and various source platforms.
ETL Validator helps to overcome such challenges using automation which further helps to
reduce the cost and to minimize efforts.
• ETL Validator has an inbuilt ETL engine which compares millions of records from various
databases or flat files.
• ETL Validator is data testing tool specifically designed for automated data warehouse testing.
• Visual Test Case Builder with drag and drop capability.
• ETL Validator has features of Query Builder which writes the test cases without manually
typing any queries.
• Compare aggregate data such as count, sum, distinct count etc.
• Simplifies the comparison of database schema across various environment which includes
data type, index, length, etc.
• ETL Validator supports various platforms such as Hadoop, XML, Flat files etc.
• It supports email notification, web reporting etc.
• It can be integrated with HP ALM which results in sharing of test results across various
platforms.
• ETL Validator is used to check Data Validity, Data Accuracy and also to perform Metadata
Testing.
• Checks Referential Integrity, Data Integrity, Data Completeness and Data Transformation.
• It is a commercial tool with 30 days trial and requires zero custom programming and
improves business productivity.
#8) QualiDI
QualiDi is an automated testing platform which offers end to end testing and ETL Testing. It
automates ETL Testing and improves the effectiveness of ETL Testing. It also reduces the
testing cycle and improves data quality.
QualiDI identifies bad data and non-compliant data very easily. QualiDI reduces the
regression cycle and data validation.
Key Features:
• QualiDI creates automated test cases and it also provides support for automated data
comparison.
• It offers data traceability and test case traceability.
• It has a centralized repository for requirements, test cases, and test results.
• It can be integrated with HPQC, Hadoop, etc.
• QualiDI identifies a defect in the early stage which in turn reduces the cost.
• It supports email notifications.
• It supports the continuous integration process.
• It supports Agile development and the rapid delivery of sprints.
• QualiDI manages complex BI Testing cycles, eliminates human error and data quality
maintained.
Talend Open Studio for Data Integration is an open-source tool that makes ETL Testing
easier. This includes all ETL Testing functionality and additional continuous delivery
mechanisms. With the help of Talend Data Integration tool, a user can run the ETL jobs on
the remote servers that too with a variety of operating systems.
ETL Testing ensures that data is transformed from the source system to the target without any
data loss and thereby adhering to transformation rules.
Key Features:
• Talend Data Integration supports any type of relational database, Flat files, etc.
• Integrated GUI which simplifies the design and development of ETL processes.
• Talend Data Integration has inbuilt data connectors with more than 900 components.
• It detects business ambiguity and inconsistency in transformation rules quickly.
• It supports remote job execution.
• Identifies defects at an early stage to reduce costs.
• It provides quantitative and qualitative metrics based on ETL best practices.
• Context switching is possible between
• ETL development, ETL testing, and ETL production environment.
• Real-time data flow tracking along with detailed execution statistics.
Visit the official site here: Talend ETL Testing
Codoid’s ETL and data warehouse testing service includes data migration and data validation
from the source to the target system. ETL Testing ensures that there is no data error, no bad
data or data loss while loading data from the source to the target system.
It quickly identifies any data errors or any other general errors that occurred during the ETL
process.
Key Features:
• Codoid’s ETL Testing service ensures data quality in the data warehouse and data
completeness validation from the source to the target system.
• ETL Testing and data validation ensure that the business information transformed from
source to target system is accurate and reliable.
• The automated testing process performs data validation during and post data migration and
prevents any data corruption.
• Data validation includes count, aggregates, and spot checks between the target and actual
data.
• The automated testing process verifies if data type, data length, indexes are accurately
transformed and loaded into the target system.
• Data quality Testing prevents data errors, bad data or any syntax issues.
Data-Centric testing tool performs robust data validation to avoid any glitches such as data
loss or data inconsistency during data transformation. It compares data between systems and
ensures that the data loaded into the target system is exactly matching with the source system
in terms of data volume, data type, format, etc.
Key Features:
• Data-Centric Testing is build to perform ETL Testing and Data warehouse testing.
• Data-Centric Testing is the largest and oldest testing practice.
• It offers ETL Testing, data migration, and reconciliation.
• It supports various relational databases, Flat files, etc.
• Efficient Data validation with 100% data coverage.
• Data-Centric Testing also supports comprehensive reporting.
• The automated process of data validation generates SQL queries which result in the
reduction of cost and efforts.
• It offers a comparison between heterogeneous databases like Oracle & SQL Server and
ensures that the data in both systems is in the correct format.
#12) SSISTester
SSISTester is a framework that helps in the unit and integration testing of SSIS packages. It
also helps to create ETL processes in a test-driven environment which thereby helps to
identify errors in the development process.
There are a number of packages created while implementing ETL processes and these need to
be tested during unit testing. An integration test is also a “Live test”.
Key Features:
• The unit test creates and verifies tests and once execution gets complete it performs a clean-
up job.
• Integration test verifies that all packages are satisfied post-execution of the unit test.
• Tests are created in a simple way as the user creates it in Visual Studio.
• Real-time debugging of a test is possible using SSISTester.
• Monitoring of test execution with user-friendly GUI.
• Test results are exported in HTML format.
• It removes external dependencies by using fake source and destination addresses.
• For the creation of tests, it supports any .NET language.
#13) TestBench
It also helps to reduce environment downtime. TestBench reports all inserted, updated, and
deleted transactions which are performed in a test environment and capture the status of the
data before and after the transaction.
Key Features:
#14) DataQ
DataQ provides various tools for quickly identifying data issues. The platform is very
intuitive and designed for both developers and testers. It is built ground up for the high
volume of data, so whether you have hundreds of records or billions, we have you covered.
• Automate ETL Testing and Monitoring.
• Data Migration Testing with auto-detection of keys.
• Data Quality Monitoring – Freshness, Distribution, Volume, Schema, Completeness,
Accuracy.
• Auto Suggestion of Data Quality rules.
• Cross-reference data validation across multiple data sources.
• Can connect to over 40 different data sources, various file formats, Kafka, and API out of the
box.
• Ability to create a library of custom functions.
• Schema Validation
• Data Profile comparison
• Compute resources are initialized and terminated on demand.
• On-prem and cloud-agnostic solution.
• Jira, Slack, Teams integration.
Points to Remember
While performing ETL testing, several factors are to be kept in mind by the testers.
•
o Apply suitable business transformation logic.
o Execute backend data-driven tests.
o Create and execute absolute test cases, test plans, and test harness.
o Assure accuracy of data transformation, scalability and performance.
o Make sure E
ETL Testing Process is similar to other testing processes and includes some stages.
They are:
ETL Testing can be classified into the following categories according to the testing process
that is being followed.
This type of ETL Testing is performed to validate the data values after data transformation.
It is used to check whether the data is extracted from an older application or new application
or repository.
Multiple SQL queries are required to be run for each and every row to verify data
transformation standards.
This type of testing is performed to verify if the expected data is loaded at the appropriate
destination as per the predefined standards.
I would also like to compare ETL Testing with Database Testing but before that let us have a
look at the types of ETL Testing with respect to database testing.
Given below are the Types of ETL Testing with respect to Database Testing:
They are:
• NOT NULL
• UNIQUE
• Primary Key
• Foreign Key
• Check
• NULL
• Default
Source and target tables contain a huge amount of data with frequently repeated values, in
such case testers follow some database queries to find such duplication.
Navigation concerns with the GUI of an application. The user finds an application friendly
when he gets easy and relevant navigation throughout the entire system. The tester must focus
on avoiding irrelevant navigation from the user’s point of view.
This testing is performed to verify if all the attributes of both the source and target system are
the same
From the above listing one may consider that ETL Testing is quite similar to Database Testing
but the fact is ETL Testing is concerned with Data Warehouse Testing and not Database
Testing.
There are several other facts due to which ETL Testing differs from Database Testing.
• The primary goal of Database Testing is to check if the data follows the rules and standards
of the data model, on the other hand, ETL Testing checks if data is moved or mapped as
expected.
• Database Testing focuses on maintaining a primary key-foreign key relationship while ETL
Testing verifies for data transformation as per the requirement or expectation and is the
same at the source and target system.
• Database Testing recognizes missing data whereas ETL Testing determines duplicate data.
• Database Testing is used for data integration and ETL Testing for enterprise business
intelligence reporting
• These are some major differences that make ETL Testing different from Database Testing.
The primary goal of ETL testing is to ensure whether the extracted and transformed data is
loaded accurately from the source to the destination system. ETL testing includes two
documents, they are:
#1) ETL Mapping Sheets: This document contains information about the source &
destination tables and their references. Mapping sheet provides help to create big SQL queries
while performing ETL Testing.
#2) Database schema for Source and Destination table: It should be kept updated in the
mapping sheet with database schema to perform data validation.
Conclusion
ETL Testing is not only a tester’s duty but it also involves developers, business analysts,
database administrators (DBA), and even the users. The ETL Testing process has become
vital as it is required to make strategic decisions at regular time intervals.
Let us know if we have missed out on any tools on the above list and also suggest the ones
that you use for ETL Testing in your daily routin