ETL Testing Process
ETL Testing Process
ETL testing verifies that an ETL process accurately extracts, transforms, and
loads data according to the specifications. ETL testing is done by validating and/or
comparing the input and output data transformed by the ETL process.
– Make sure the row count between the source and the target
table is matching.
– Compare all the customer data in the source and the table to
ensure that ETL loaded the data in the destination table as per
the mapping rules.
3. ETL Data Transformation Testing
Data Transformation Tests ensures that every row has
transformed successfully based on the mapping document.
Testing Data transformations involve reconciling the data
between source and destination to verify that the ETL is
transforming the data as expected. For example,
In the case above the gender table has M, F and Others. The
ETL testing involves reconciling so that the Gender attributes
in the customer table will only have one of those three values.
6. ETL Integration Testing
ETL integration testing is done to verify that the ETL process
has integrated the data correctly. One of the key purposes of
an ETL process is to integrate data from multiple data sources
or multiple subject areas.
Vertical
Integrati
on
Testing: I
n this case
data is
brought in
from
multiple
data
sources
and
integrated
into a
table.
Example
in this
type of
integratio
n
customer
list from
CRM
system
and
accountin
g system
is
integrated
in a single
unified
list. The
integratio
n must
ensure
that:
Attributes from multiple sources are mapped correctly to the
destination
No duplicate records exist.
Horizontal
Integration
Testing:
In this
scenario
data from
multiple
subject
areas and
sources
are linked
together to
form
meaningfu
l
relationshi
p. A
typical
example is
to link the
salesperso
n data
with sales
data to
calculate
the
commissio
n.
Mostly
referential
integrity
/foreign
keys are
created,
and
different
tables are
linked to
together.
ETL integration testing involves creation of multiple ETL
testing rules to verify if the data integration is done correctly.
This is true because even though there might be one ETL
process that integrates the data, it nevertheless contains
multiple business rules for data transformation. ETL testing
must ensure that each of those integration rules are
implemented correctly. This testing includes all the above
types of testing.
– Ensure the data is going to the respective attributes
– No duplicate entities exists and at the same time no
unrelated entities are unified.
– Ensure the entities are linked correctly.
7. ETL Performance Testing
Even if the ETL process is coded correctly it is possible that,
when executed it takes unreasonably more time to finish the
job. ETL performance testing measure and the time taken to
finish processing a certain number of records vs. user
expectations. The ETL performance metrics are usually
measured in the number of rows processed per seconds.
To measure performance three metric are needed, ETL
processes start time, ETL process end time and number of
records processed. The sources for the above metrics are:
ETL Test
Test Description
Scenarios
Record Level
These are record level ETL tests
Scenarios
This is a primary test, to check if all the available records are populated –
Record Count
Nothing more, nothing less. This test ensures that the ETL process has loaded
Testing
all the records. But it does not know if the data in the records is correct.
Duplicate records happens if primary key or unique key constraints are not
Duplicate Records
implemented in the database. In such cases specific ETL Tests are needed to
Testing
ensure duplicate records are not generated by the ETL process.
Record In many scenarios transaction level records are aggregated by time, or other
Aggregation Test dimensions. Test are needed to ensure that the dimension chosen for the
aggregation of records are correct.
Often ETL developers miss or adding filters or sometimes, forget to remove
Row Filter Testing filters that were added during testing. Create ETL tests to ensure proper data
filters are implemented as per requirements.
The type ii dimensions ETL logic retires old records and inserts new records.
Type II dimension
This Test to ensure that only one valid record is present, and the
Testing
expiry dates don’t overlap.
Attribute Level
These are attribute level tests.
Scenarios
During the development of the ETL process the developer might do mistake
Data mapping
in mapping the source and target attributes. This ETL test ensure that the data
Testing
is getting populated in the correct target attributes.
There are many mathematical calculations used to populate calculated fields.
Calculations –
This ETL test ensures that the calculations are done correctly by the ETL
Numeric and date
process.
Various string manipulation and operations such as CONACT, SUBSTRING,
Expressions –
TRIM, are done on strings. This test ensures string transformations are done
String
correctly by the ETL process.
Many time the data processed by the ETL process truncate the data and/or if
the target column has shorter size the data can be get truncated. This ETL test
Data Truncation
ensures string data is not truncated by the ETL process or during the load
time.
This can happen if the datatype is not chosen correctly in either the ETL
Data Rounding – process variables or the target table datatypes.
Numbers and dates Numbers can get rounded; dates can lose time or second components. Ensure
decimal data is not rounded incorrectly.
This mostly happens with string datatypes as it accepts data in almost any
Formatting Issues -
format. Many cases dates are p The date Ensure the date, or string data is
Date and Strings
formatted correctly.
Reference Data or Ensure that the child or transaction attributes have reference data that are
Dimension Lookup present in the master.
Aggregate This involves testing of summarized (balances, snapshot, aggregates)
Scenarios data.
Aggregate
Ensure the data aggregations of data is done correctly.
calculation
Ensure the number of records populated is not more and/or less than the
Simple Row counts expected number of records. The row count in the destination matches to the
source system.
Match the sums of numeric values between source and target to ensure the
Simple Sums
numbers are correct.
Grouped Row
Reconcile counts for different groups between source and target.
Count
Group Sums Reconcile aggregate sums for different groups between source and target.
Execution
This testing involves testing of ETL processes related to their executions.
Scenarios
Often data is loaded in increments based on delta logic. This ETL Test
Incremental Load ensures the incremental loads are reconciling correctly with source and no
gaps or overlapping are generated.
Normally you won’t expect same data to be processed again. But in many
Multi Execution situations the data is reprocessed or accidently executed. This test ensures
Tests multiple reruns of the ETL process with the same data do not generate extra
records.
The data processing must finish within the required timeframe. ETL
ETL Performance
performance test ensures that the ETL processing time is acceptable by
Test
checking the run logs.