Top 25 ETL Testing Interview Questions
Top 25 ETL Testing Interview Questions
Top 25 ETL Testing Interview Questions
Answers
Following are frequently asked questions in interviews for freshers as well experienced ETL tester and
developer.
1) What is ETL?
In data warehousing architecture, ETL is an important component, which manages the data for any
business process. ETL stands for Extract, Transform and Load. Extract does the process of reading
data from a database. Transform does the converting of data into a format that could be appropriate for
reporting and analysis. While, load does the process of writing the data into the target database.
Verify that the projected data is loaded into the data warehouse without any truncation and data
loss
Make sure that ETL application reports invalid data and replaces with default values
Make sure that data loads at expected time frame to improve scalability and performance
3) Mention what are the types of data warehouse applications and what is the difference between
data mining and data warehousing?
Info Processing
Analytical Processing
Data Mining
Data mining can be define as the process of extracting hidden predictive information from large
databases and interpret the data while data warehousing may make use of a data mine for analytical
processing of the data in a faster way. Data warehousing is the process of aggregating data from
multiple sources into one common repository
Business Objects XI
SAS business warehouse
Additive Facts
Semi-additive Facts
Non-additive Facts
Cubes are data processing units comprised of fact tables and dimensions from the data warehouse. It
provides multi-dimensional analysis.
OLAP stands for Online Analytics Processing, and OLAP cube stores large data in muti-dimensional
form for reporting purposes. It consists of facts called as measures categorized by dimensions.
Tracing level is the amount of data stored in the log files. Tracing level can be classified in two
Normal and Verbose. Normal level explains the tracing level in a detailed manner while verbose
explains the tracing levels at each and every row.
Grain fact can be defined as the level at which the fact information is stored. It is also known as Fact
Granularity
A fact table without measures is known as Factless fact table. It can view the number of occurring
events. For example, it is used to record an event such as employee count in a company.
A transformation is a repository object which generates, modifies or passes data. Transformation are of
two types Active and Passive
12) Explain what is partitioning, hash partitioning and round robin partitioning?
To improve performance, transactions are sub divided, this is called as Partitioning. Partioning enables
Informatica Server for creating of multiple connection to various sources
Round-Robin Partitioning:
In each partition where the number of rows to process are approximately same this partioning
is applicable
Hash Partitioning:
For the purpose of partitioning keys to group data among partitions Informatica server applies
a hash function
It is used when ensuring the processes groups of rows with the same partitioning key in the
same partition need to be ensured
The advantage of using the DataReader Destination Adapter is that it populates an ADO recordset
(consist of records and columns) in memory and exposes the data from the DataFlow task by
implementing the DataReader interface, so that other application can consume the data.
14) Using SSIS ( SQL Server Integration Service) what are the possible ways to update table?
Use Cache
15) In case you have non-OLEDB (Object Linking and Embedding Database) source for the
lookup what would you do?
In case if you have non-OLEBD source for the lookup then you have to use Cache to load data and use
it as source
16) In what case do you use dynamic cache and static cache in connected and unconnected
transformations?
Dynamic cache is used when you have to update master table and slowly changing dimensions
(SCD) type 1
17) Explain what are the differences between Unconnected and Connected lookup?
Connected lookup supports user defined Unconnected look up does not support
default values user defined default values
A data source view allows to define the relational schema which will be used in the analysis services
databases. Rather than directly from data source objects, dimensions and cubes are created from data
source views.
19) Explain what is the difference between OLAP tools and ETL tools ?
While OLAP is meant for reporting purpose in OLAP data available in multi-directional model.
With the power connect option you extract SAP data using Informatica
Import the source into the Source Analyzer. Between Informatica and SAP Powerconnect act
as a gateaway. The next step is to generate the ABAP code for the mapping then only
Informatica can pull data from SAP
To connect and import sources from external systems Power Connect is used
21) Mention what is the difference between Power Mart and Power Center?
Suppose to process huge volume of data Suppose to process low volume of data
22) Explain what staging area is and what is the purpose of a staging area?
Data staging is an area where you hold the data temporary on data warehouse server. Data staging
includes following steps
For the various business process to identify the common dimensions, BUS schema is used. It comes
with a conformed dimensions along with a standardized definition of information
Data purging is a process of deleting data from data warehouse. It deletes junk data's like rows with
null values or extra spaces.
Schema objects are the logical structure that directly refer to the databases data. Schema objects
includes tables, views, sequence synonyms, indexes, clusters, functions packages and database links
Workflow: It's a set of instructions that tell the server how to execute tasks
Session: It is a set of parameters that tells the server how to move data from sources to target