ETL Process

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

ETL – Tester's Roles

An ETL tester is primarily responsible for validating the data sources,


extraction of data, applying transformation logic, and loading the data in the
target tables.
The key responsibilities of an ETL tester are listed below.

Verify the Tables in the Source System


It involves the following operations −

 Count check
 Reconcile records with the source data
 Data type check
 Ensure no spam data loaded
 Remove duplicate data
 Check all the keys are in place

Apply Transformation Logic


Transformation logic is applied before loading the data. It involves the
following operations −
 Data threshold validation check, for example, age value shouldn’t be
more than 100.
 Record count check, before and after the transformation logic applied.
 Data flow validation from the staging area to the intermediate tables.
 Surrogate key check.

Data Loading
Data is loaded from the staging area to the target system. It involves the
following operations −
 Record count check from the intermediate table to the target system.
 Ensure the key field data is not missing or Null.
 Check if the aggregate values and calculated measures are loaded in
the fact tables.
 Check modeling views based on the target tables.
 Check if CDC has been applied on the incremental load table.
 Data check in dimension table and history table check.
 Check the BI reports based on the loaded fact and dimension table and
as per the expected results.

Testing the ETL Tools


ETL testers are required to test the tools and the test-cases as well. It involves
the following operations −

 Test the ETL tool and its functions


 Test the ETL Data Warehouse system
 Create, design, and execute the test plans and test cases.
 Test the flat file data transfers.
ETL Testing – Techniques
It is important that you define the correct ETL Testing technique before
starting the testing process. You should take an acceptance from all the
stakeholders and ensure that a correct technique is selected to perform ETL
testing. This technique should be well known to the testing team and they
should be aware of the steps involved in the testing process.
There are various types of testing techniques that can be used. In this
chapter, we will discuss the testing techniques in brief.

Production Validation Testing


To perform Analytical Reporting and Analysis, the data in your production
should be correct. This testing is done on the data that is moved to the
production system. It involves data validation in the production system and
comparing it the with the source data.

Source-to-target Count Testing


This type of testing is done when the tester has less time to perform the
testing operation. It involves checking the count of data in the source and the
target systems. It doesn’t involve checking the values of data in the target
system. It also doesn’t involve if the data is in ascending or descending order
after mapping of data.

Source-to-target Data Testing


In this type of testing, a tester validates data values from the source to the
target system. It checks the data values in the source system and the
corresponding values in the target system after transformation. This type of
testing is time-consuming and is normally performed in financial and banking
projects.

Data Integration / Threshold Value Validation Testing


In this type of testing, a tester validates the range of data. All the threshold
values in the target system are checked if they are as per the expected result.
It also involves integration of data in the target system from multiple source
systems after transformation and loading.
Example − Age attribute shouldn’t have a value greater than 100. In the date
column DD/MM/YY, the month field shouldn’t have a value greater than 12.

Application Migration Testing


Application migration testing is normally performed automatically when you
move from an old application to a new application system. This testing saves
a lot of time. It checks if the data extracted from an old application is same as
per the data in the new application system.

Data Check and Constraint Testing


It includes performing various checks such as data type check, data length
check, and index check. Here a Test Engineer performs the following
scenarios − Primary Key, Foreign Key, NOT NULL, NULL, and UNIQUE.

Duplicate Data Check Testing


This testing involves checking for duplicate data in the target system. When
there is a huge amount of data in the target system, it is possible that there is
duplicate data in the production system that may result in incorrect data in
Analytical Reports.
Duplicate values can be checked with SQL statement like −
Select Cust_Id, Cust_NAME, Quantity, COUNT (*)
FROM Customer
GROUP BY Cust_Id, Cust_NAME, Quantity HAVING COUNT (*) >1;
Duplicate data appears in the target system due to the following reasons −
 If no primary key is defined, then duplicate values may come.
 Due to incorrect mapping or environmental issues.
 Manual errors while transferring data from the source to the target
system.

Data Transformation Testing


Data transformation testing is not performed by running a single SQL
statement. It is time-consuming and involves running multiple SQL queries for
each row to verify the transformation rules. The tester needs to run SQL
queries for each row and then compare the output with the target data.

Data Quality Testing


Data quality testing involves performing number check, date check, null
check, precision check, etc. A tester performs Syntax Test to report invalid
characters, incorrect upper/lower case order, etc. and Reference Tests to
check if the data is according to the data model.

Incremental Testing
Incremental testing is performed to verify if Insert and Update statements are
executed as per the expected result. This testing is performed step-by-step
with old and new data.

Regression Testing
When we make changes to data transformation and aggregation rules to add
new functionality which also helps the tester to find new errors, it is called
Regression Testing. The bugs in data that that comes in regression testing
are called Regression.

Retesting
When you run the tests after fixing the codes, it is called retesting.

System Integration Testing


System integration testing involves testing the components of a system
individually and later integrating the modules. There are three ways a system
integration can be done: top-down, bottom-up, and hybrid.

Navigation Testing
Navigation testing is also known as testing the front-end of the system. It
involves enduser point of view testing by checking all the aspects of the front-
end report − includes data in various fields, calculation and aggregates, etc.

ETL Testing – Process


ETL testing covers all the steps involved in an ETL lifecycle. It starts with
understanding the business requirements till the generation of a summary
report.
The common steps under ETL Testing lifecycle are listed below −
 Understanding the business requirement.
 Validation of the business requirement.
 Test Estimation is used to provide the estimated time to run test-cases
and to complete the summary report.
 Test Planning involves finding the Testing technique based on the
inputs as per business requirement.
 Creating test scenarios and test cases.
 Once the test-cases are ready and approved, the next step is to perform
pre-execution check.
 Execute all the test-cases.
 The last step is to generate a complete summary report and file a
closure process.

You might also like