0% found this document useful (0 votes)
282 views

ETL Testing or Data Warehouse Testing Tutorial

ETL testing ensures accuracy of data loaded from a source to a destination after transformation. It involves verifying data at stages between source and destination. ETL stands for Extract-Transform-Load and is the process of loading data from source systems like OLTP databases to a data warehouse. It includes extracting, transforming, and loading data. ETL testing helps validate mappings, data integrity, transformations and quality from source to target.

Uploaded by

SWAPNIL4UMATTERS
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
282 views

ETL Testing or Data Warehouse Testing Tutorial

ETL testing ensures accuracy of data loaded from a source to a destination after transformation. It involves verifying data at stages between source and destination. ETL stands for Extract-Transform-Load and is the process of loading data from source systems like OLTP databases to a data warehouse. It includes extracting, transforming, and loading data. ETL testing helps validate mappings, data integrity, transformations and quality from source to target.

Uploaded by

SWAPNIL4UMATTERS
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

ETL Testing or Data Warehouse Testing Tutorial

guru99.com/utlimate-guide-etl-datawarehouse-testing.html

Before we learn anything about ETL Testing its important to learn about Business
Intelligence and Dataware. Let’s get started –

What is BI?
Business Intelligence is the process of collecting raw data or business data and turning it
into information that is useful and more meaningful. The raw data is the records of the
daily transaction of an organization such as interactions with customers, administration of
finance, and management of employee and so on. These data’s will be used for “Reporting,
Analysis, Data mining, Data quality and Interpretation, Predictive Analysis”.

What is Data Warehouse?


A data warehouse is a database that is designed for query and analysis rather than for
transaction processing. The data warehouse is constructed by integrating the data from
multiple heterogeneous sources.It enables the company or organization to consolidate data
from several sources and separates analysis workload from transaction workload. Data is
turned into high quality information to meet all enterprise reporting requirements for all
levels of users.

What is ETL?

1/11
ETL stands for Extract-Transform-Load and it is a process of how data is loaded from the
source system to the data warehouse. Data is extracted from an OLTP database,
transformed to match the data warehouse schema and loaded into the data warehouse
database. Many data warehouses also incorporate data from non-OLTP systems such as
text files, legacy systems and spreadsheets.

Let see how it works

For example, there is a retail store which has different departments like sales, marketing,
logistics etc. Each of them is handling the customer information independently, and the
way they store that data is quite different. The sales department have stored it by
customer’s name, while marketing department by customer id.

Now if they want to check the history of the customer and want to know what the different
products he/she bought owing to different marketing campaigns; it would be very tedious.

The solution is to use a Datawarehouse to store information from different sources in a


uniform structure using ETL. ETL can transform dissimilar data sets into an unified
structure.Later use BI tools to derive meaningful insights and reports from this data.

The following diagram gives you the ROAD MAP of the ETL process

1. Extract

Extract relevant data

2. Transform

Transform data to DW (Data Warehouse) format


Build keys - A key is one or more data attributes that uniquely identify an entity.
Various types of keys are primary key, alternate key, foreign key, composite key,
2/11
surrogate key. The datawarehouse owns these keys and never allows any other entity
to assign them.
Cleansing of data :After the data is extracted, it will move into the next phase, of
cleaning and conforming of data. Cleaning does the omission in the data as well as
identifying and fixing the errors. Conforming means resolving the conflicts between
those data’s that is incompatible, so that they can be used in an enterprise data
warehouse. In addition to these, this system creates meta-data that is used to
diagnose source system problems and improves data quality.

3. Load

Load data into DW ( Data Warehouse)


Build aggregates - Creating an aggregate is summarizing and storing data which is
available in fact table in order to improve the performance of end-user queries.

What is ETL Testing?


ETL testing is done to ensure that the data that has been loaded from a source to the
destination after business transformation is accurate. It also involves the verification of data
at various middle stages that are being used between source and destination. ETL stands
for Extract-Transform-Load.

ETL Testing Process


Similar to other Testing Process, ETL also go through different phases. The different phases
of ETL testing process is as follows

3/11
ETL testing is performed in five stages

1. Identifying data sources and requirements


2. Data acquisition
3. Implement business logics and dimensional Modelling
4. Build and populate data
5. Build Reports

Types of ETL Testing

Types Of
Testing Testing Process

Production “Table balancing” or “production reconciliation” this type of ETL testing is


Validation done on data as it is being moved into production systems. To support your
Testing business decision, the data in your production systems has to be in the
correct order. Informatica Data Validation Option provides the ETL testing
automation and management capabilities to ensure that production systems
are not compromised by the data.

4/11
Source to Such type of testing is carried out to validate whether the data values
Target Testing transformed are the expected data values.
(Validation
Testing)

Application Such type of ETL testing can be automatically generated, saving substantial
Upgrades test development time. This type of testing checks whether the data extracted
from an older application or repository are exactly same as the data in a
repository or new application.

Metadata Metadata testing includes testing of data type check, data length check and
Testing index/constraint check.

Data To verify that all the expected data is loaded in target from the source, data
Completeness completeness testing is done. Some of the tests that can be run are compare
Testing and validate counts, aggregates and actual data between the source and
target for columns with simple transformation or no transformation.

Data Accuracy This testing is done to ensure that the data is accurately loaded and
Testing transformed as expected.

Data Testing data transformation is done as in many cases it cannot be achieved


Transformation by writing one source SQL query and comparing the output with the target.
Testing Multiple SQL queries may need to be run for each row to verify the
transformation rules.

Data Quality Data Quality Tests includes syntax and reference tests. In order to avoid any
Testing error due to date or order number during business process Data Quality
testing is done. Syntax Tests: It will report dirty data, based on invalid
characters, character pattern, incorrect upper or lower case order etc.
Reference Tests: It will check the data according to the data model. For
example: Customer ID Data quality testing includes number check, date
check, precision check, data check , null check etc.

Incremental This testing is done to check the data integrity of old and new data with the
ETL testing addition of new data. Incremental testing verifies that the inserts and
updates are getting processed as expected during incremental ETL process.

GUI/Navigation This testing is done to check the navigation or GUI aspects of the front end
Testing reports.

How to create ETL Test Case


ETL testing is a concept which can be applied to different tools and databases in information
management industry. The objective of ETL testing is to assure that the data that has
been loaded from a source to destination after business transformation is accurate.

5/11
It also involves the verification of data at various middle stages that are being used between
source and destination.

While performing ETL testing, two documents that will always be used by an ETL tester are

1. ETL mapping sheets :An ETL mapping sheets contain all the information of source
and destination tables including each and every column and their look-up in reference
tables. An ETL testers need to be comfortable with SQL queries as ETL testing may
involve writing big queries with multiple joins to validate data at any stage of ETL. ETL
mapping sheets provide a significant help while writing queries for data verification.
2. DB Schema of Source, Target: It should be kept handy to verify any detail in mapping
sheets.

ETL Test Scenarios and Test Cases

Test Scenario Test Cases

Mapping doc Verify mapping doc whether corresponding ETL information is provided or
validation not. Change log should maintain in every mapping doc.

Validation 1. Validate the source and target table structure against corresponding
mapping doc.
2. Source data type and target data type should be same
3. Length of data types in both source and target should be equal
4. Verify that data field types and formats are specified
5. Source data type length should not less than the target data type
length
6. Validate the name of columns in the table against mapping doc.

Constraint Ensure the constraints are defined for specific table as expected
Validation

Data 1. The data type and length for a particular attribute may vary in files or
consistency tables though the semantic definition is the same.
issues 2. Misuse of integrity constraints

Completeness 1. Ensure that all expected data is loaded into target table.
Issues 2. Compare record counts between source and target.
3. Check for any rejected records
4. Check data should not be truncated in the column of target tables
5. Check boundary value analysis
6. Compares unique values of key fields between data loaded to WH and
source data

6/11
Correctness 1. Data that is misspelled or inaccurately recorded
Issues 2. Null, non-unique or out of range data

Transformation Transformation

Data Quality 1. Number check: Need to number check and validate it


2. Date Check: They have to follow date format and it should be same
across all records
3. Precision Check
4. Data check
5. Null check

Null Validate Verify the null values, where “Not Null” specified for a specific column.

Duplicate 1. Needs to validate the unique key, primary key and any other column
Check should be unique as per the business requirements are having any
duplicate rows
2. Check if any duplicate values exist in any column which is extracting
from multiple columns in source and combining into one column
3. As per the client requirements, needs to be ensure that no duplicates
in combination of multiple columns within target only

Date Validation Date values are using many areas in ETL development for
1. To know the row creation date
2. Identify active records as per the ETL development perspective
3. Identify active records as per the business requirements perspective
4. Sometimes based on the date values the updates and inserts are
generated.

Complete Data 1. To validate the complete data set in source and target table minus a
Validation query in a best solution
2. We need to source minus target and target minus source
3. If minus query returns any value those should be considered as
mismatching rows
4. Needs to matching rows among source and target using intersect
statement
5. The count returned by intersect should match with individual counts of
source and target tables
6. If minus query returns of rows and count intersect is less than source
count or target table then we can consider as duplicate rows are
existed.

Data Cleanness Unnecessary columns should be deleted before loading into the staging area.

7/11
Types of ETL Bugs

Type of Bugs Description

User interface bugs/cosmetic Related to GUI of application


bugs Font style, font size, colors, alignment, spelling
mistakes, navigation and so on

Boundary Value Analysis (BVA) Minimum and maximum values


related bug

Equivalence Class Partitioning Valid and invalid type


(ECP) related bug

Input/Output bugs Valid values not accepted


Invalid values accepted

8/11
Calculation bugs Mathematical errors
Final output is wrong

Load Condition bugs Does not allows multiple users


Does not allows customer expected load

Race Condition bugs System crash & hang


System cannot run client platforms

Version control bugs No logo matching


No version information available
This occurs usually in Regression Testing

H/W bugs Device is not responding to the application

Help Source bugs Mistakes in help documents

Difference between Database testing and ETL testing

ETL Testing Data Base Testing

Verifies whether data is moved as expected The primary goal is to check if the data is
following the rules/ standards defined in
the Data Model

Verifies whether counts in the source and target Verify that there are no orphan records
are matching Verifies whether the data and foreign-primary key relations are
transformed is as per expectation maintained

Verifies that the foreign primary key relations are Verifies that there are no redundant
preserved during the ETL tables and database is optimally
normalized

Verifies for duplication in loaded data Verify if data is missing in columns where
required

Responsibilities of an ETL tester


Key responsibilities of an ETL tester are segregated into three categories

9/11
Stage table/ SFS or MFS
Business transformation logic applied
Target table loading from stage file or table after applying a transformation.

Some of the responsibilities of an ETL tester are

Test ETL software


Test components of ETL datawarehouse
Execute backend data-driven test
Create, design and execute test cases, test plans and test harness
Identify the problem and provide solutions for potential issues
Approve requirements and design specifications
Data transfers and Test flat file
Writing SQL queries3 for various scenarios like count test

ETL Performance Testing and Tuning


ETL Performance Testing is a confirmation test to ensure that an ETL system can handle the
load of multiple users and transactions. The goal of performance tuning is to optimize
session performance by eliminating performance bottlenecks. To tune or improve the
performance of the session, you have to identify performance bottlenecks and eliminate it.
Performance bottlenecks can be found in source and target databases, the mapping, the
session and the system. One of the best tools used for Performance Testing is Informatica.

Automation of ETL Testing


The general methodology of ETL testing is to use SQL scripting or do “eyeballing” of data..
These approaches to ETL testing are time-consuming, error-prone and seldom provide
complete test coverage. To accelerate, improve coverage, reduce costs, improve Defect
detection ration of ETL testing in production and development environments, automation is
the need of the hour. One such tool is Informatica.

Best Practices for ETL Testing


1. Make sure data is transformed correctly
2. Without any data loss and truncation projected data should be loaded into the data
warehouse
3. Ensure that ETL application appropriately rejects and replaces with default values and
reports invalid data
4. Need to ensure that the data loaded in data warehouse within prescribed and
expected time frames to confirm scalability and performance
5. All methods should have appropriate unit tests regardless of visibility
10/11
6. To measure their effectiveness all unit tests should use appropriate coverage
techniques
7. Strive for one assertion per test case
8. Create unit tests that target exceptions

Checkout - ETL Testing Interview Questions & Answers

11/11

You might also like