0% found this document useful (0 votes)
12 views

ETL Testing Interview Questions And Answers

na

Uploaded by

qavasutesting
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

ETL Testing Interview Questions And Answers

na

Uploaded by

qavasutesting
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

ETL Testing Interview Questions And Answers

Q) ETL Testing Vs DB Testing


Compare ETL Testing and DB Testing

ETL Testing DB Testing

Business Intelligence reporting Goal is to integrate data

Business flow environment based on earlier data Applicable to business flow systems

Informatica, Cognos and QuerySurge can be used QTP and Selenium tools for automation

Analysing data may have potential impact Architectural implementation involves high impact.

Dimensional model Entity relationship model

Analytics are processed Transactions are processed

Denormalized data is used Data used is normalized

Q1) What exactly do you mean by ETL?


ETL stands for Extract Transform Load and is widely regarded as one of the essential tools in the data
warehousing architecture. Its main task is to handle data management for the businesses process which is
complex and are useful to the business in many ways. Extracting simply means reading the data from a
database. Transformation means converting the data into a form which is suitable for analysis and
reporting. The load on the other side handles the process of writing and managing the data into the
database which users want to target simply.

Q2) There is a group of parameters that direct the server regarding the movement of the data
from the source to the target. What it is called as?
It is called as Session.

Q3) Do you have any idea about the ETL testing and the operations that are a part of the same?
Well, there are certain important tasks that are opted in this. It simply means verifying the data in terms
of its transformation in the correct or the right manner as per the needs of a business. It also includes the
verification of the projected data. The users can also check whether the data is successfully loaded in the
warehouse or not without worrying about the loss of data. The improvement in scalability, as well as the
performance can also be assured from this directly. In addition to this, the ETL simply replaces the default
values which are not always useful to the users.
Q4) How can you put Power Center different from the Power Mart?
Power Mart is good to be considered only when the data processing requirements are low. On the other
side, the Power Center can simply process bulk time in a short span of time. Power Center can easily
support ERP such as SAP while no support of the same is available on the ERP. The local repository can be
supported by the Mart while the center cannot perform this task reliably.

Q5) What is partitioning in ETL


The transactions are always needed to be divided for the better performance. The same processes are
known as Partitioning. It simply makes sure that the server can directly access the sources through
multiple connections.

Q6) Name a few tools that you can easily use with the ETL
There are many tools that can be considered. However, it is not always necessary that a user needs all of
them at the same time. Also, which tool is used simply depends on the preference and the task that needs
to be accomplished. Some of the commonly used ones are Oracle Warehouse Builder, Congos Decision
Stream, SAS Enterprise ETL server and SAS Business warehouse.

Q7) What do you understand by the term fact in the ETL and what are the types of the same?
Basically, it is regarded as a federal component that generally belongs to a model which has multiple
dimensions. The same can also be used when it comes to considering the measures that belong to
analyzation. The facts are generally useful for providing the dimensions that largely maters in the ETL.
The commonly used types of facts in ETL are Semi-additive facts, Additive Facts, as well as Non-additive
Facts.

Q8) What exactly do you know about the tracing level and the types of the same?
There are files logs and there is a limit on them when it comes to storing data in them. The Tracing level is
nothing but the amount of data that can be easily stored on the same. These levels clearly explain the
tracing levels and in a manner that provides all the necessary information regarding the same. There are
two types of same and they are:

1. Verbose
2. Normal
Q9) Do you think the data warehousing and the data mining is different from one another. How
the same can be associated with the warehousing applications?
Well, the warehousing applications that are important generally include the analytical process,
Information processing, as well as Data mining. There are actually a very large number of predictive that
needs to be extracted from the database with a lot of data. The warehousing sometimes depends on the
mining for the operations involved. The data mining is useful for the analytical process while the other
doesn’t. The data can simply be aggregated from the sources that are different through the warehousing
approach while the same is not possible in case of mining.
Q10) Do you have any information regarding the Grain of Fact?
The fact information can be stored at a level that is known as grain fact. The other name of this is Fact
Granularity. It is possible for the users to change the name of the same when the need of same is realized.
There are multiple files that are associated with the same and the users can use this for changing the
name of all of them directly.

Q11) It is possible to load the data and use it as a source?


Yes, in ETL it is possible. This task can be accomplished simply by using the Cache. The users must make
sure that the Cache is free and is generally optimized before it is used for this task. At the same time, the
users imply make sure that the desired outcomes can simply be assured without making a lot of efforts.

Q12) What is Factless fact table in ETL?


It is defined as the table without measures in the ETL. There are a number of events that can be managed
directly with the same. It can also record events that are related to the employees or with the
management and this task can be accomplished in a very reliable manner.

Q13) What exactly do you mean by the Transformation? What are the types of same?
It is basically regarded as the repository object which is capable to produce the data and can even modify
and pass it in a reliable manner. The two commonly used transformations are Active and Passive.

Q14) What is the exact purpose of an ETL according to you?


It is actually very beneficial for the extracting of data and from the systems that are based on legacy.

Q15) Can you define measures in a simple statement?


Well, they can be called as the number data which is generally based on the columns and is generally
present in a fact table by default.

Q16) When you will make use of the Lookup Transformation?


It is one of the finest and in fact one of the very useful approaches in the ETL. It simply makes sure that
the users can get a related value from a table and with the help of a column value that seems useless. In
addition to this, it simply makes sure of boosting the performance of a dimensions table which is changing
at a very slow rate. Also, there are situations when the records already remain present in the table.
Dealing with such issues can be made possible with the help of Lookup transformation.

Q17) What do you understand by Data Purging?


There are needs and situations when the data needs to be deleted from the data warehouse. It is a very
daunting task to delete the data in bulk. The Purging is an approach that can delete multiple files at the
same time and enable users to maintain speed as well as efficiency. A lot of extra space can be created
simply with this.

Q18) Can you tell something about Bus Schema?


Dimension identification is something that is very important in the ETL and the same is largely handled
by the Bus Schema.

Q19) Are your familiar with the Dynamic and the Static Cache?
When it comes to updating the master table, the dynamic cache can be opted. Also, the users are free to
use it for changing the dimensions. On the other side, the users can simply manage the flat files through
the Static Cache. It is possible to deploy both the Dynamic and the Static Cache at the same time
depending on the task and the overall complexity of the final outcome.

Q20) What do you mean by staging area?


It is an area which is used when it comes to holding the information or the data temporary on the server
that controls the data warehouse. There are certain steps that are included and the prime one among
them is Surrogate assignments.

Q21) What are the types of Partitioning you are familiar with?
There are two types of Partitioning that is common in ETL and they are:

1. Hash
2. Round-robin
Q22) Can you tell a few benefits of using the Data Reader Destination Adapter?
There are ADO record sets which generally consists of columns and records. When it comes to populating
them in a simple manner, the Data Reader Destination Adapter is very useful. It simply exposes the data
flow and let the users impose various restrictions on the data which is required in many cases.

Q23) Is it possible to extract the SAP data with the help of Informatica?
There is a power connect option that simply let the users perform this task and in a very reliable manner.
It is necessary to import the source code in the analyzer before you accomplish this task.

Q24) What do you mean by the term Mappet?


This is actually an approach that is useful for creating or arranging the different sets in the
transformation. It simply let user accomplish other tasks also that largely matters and are related to the
data warehouse.

Q25) What are commercial ETL tools?

1. Ab Initio
2. Adeptia ETL
3. Business Objects Data Services
4. Informatica PowerCenter
5. Business Objects Data Integrator (BODI) Confluent
6. DBSoftLab
Q26) Can you explain the fact table?
In datawarehouse fact table is the central table of star schema.

Q27) What are the types of measures?


There are 3 types of measures:

 Additive Measures - Can be joined across any dimensions of fact table.


 Semi Additive Measures - Can be joined across only some of dimensions of fact table.
 Non Additive Measures - Cannot be joined across any dimensions of fact table.

Q27) Give an brief on Grain of Fact?


Grain fact functionality defined as a level/stage where the fact information will be stored. Also called as
Fact Granularity.

Q28) Define Transformation?


In ETL, Transformation involves, data cleansing, Sorting the data, Combining or merging and appying teh
business rules to the data for improvisong the data for quality and accuracy in ETL process.

Q29) What is Lookup Transformation?


The Lookup transformation accomplished lookups by joining information in input columns with columns
in a reference dataset. You utilize the lookup to get to extra data in a related table that depends on values
in like common columns.

Q30) Is it possible to update the table using the SQL service?


Yes, it is actually possible and the users can perform this task simply and in fact without worrying about
anything. The users generally have several options to accomplish this task easily. The methods that can be
helpful in this matter are using a staging cable, using a SQL command, using the MSSQL, as well as using
the Cache.

Q31) How can you define a Workflow?


It is basically a group that contains instructions that let the server perform the executions related tasks.

Q32) Tell one basic difference between the Connected Lookup and Unconnected ones?
Mapping is common in the connected lookups while it is not so common in the latter. It is only used in the
Unconnected lookup only when the function is already defined. . There are several values that can be
returned from the Connected Lookup while the Unconnected Lookup has a strict upper limit on the same.

Q33) Tell something about the Data source view and how it is significant?
There are several analysis services databases that largely depend on the relational schema and the prime
task of the Data source is to define the same. They are also helpful in creating the cubes and dimensions
with the help of which the users can set the dimensions in a very easy manner.

Q34) What are objects in the Schema?


These are basically considered as the logical structures that are related with the data. They generally
contain tables, views, synonyms, clusters as well as function packages. In addition to this, there are
several database links which are present in them.

Q35) What are Cubes in the ETL and how they are different from that of OLAP?
There are things on which the data processing depends largely and cubes are one among them, they are
generally regarded as the units of the same that provide useful information regarding the dimensions and
fact tables. When it comes to multi-dimensions analysis, the same can simply be assured from this. On the
other side, the Online Analytics Processing stores a large data in the dimensions that are more than one.
This is generally done for making the reporting process more smooth and reliable. All the facts in it are
categorized and this is exactly what that makes them easy to understand.

Q36) Unconnected Vs Connected Lookups


Connected Lookup Unconnected Lookup

Either dynamic or Stactic Cache can be used. Can use only Static Cache.

We can return multiple rows from the same row Can return only one output port

It supports user-defined values It wont support user defined values

We can pass any number of values to another Can pass one output value to one transformation
tranformation.

Cache has all lookup columns that are used in Cache has all the lookups or output ports of lookup conditions and
mapping. return port.

Q37) Define Bus Schema?


In data warehouse, BUs Schema is used for identifying the most common dimensions in business process.
In one word its is definte dimension and a standardized equivalent of facts.

Q38) What data purging means?


Data Purging - Common word that used in data warehousing for deleting or erasing data from the storage.

Q39) Schema Objects means?


Schema objects can be defined as the logical structures, where as DB stores the schema object logically
within a database tablespace. Schema Objects can be the tables, clusters or views, sequence or indexes,
functions packages and db links.

Q40) Can you brief the terms Mapplet, Session, Workflow and Worklet?
 Mapplet : Reusable object that contains a set of transformations.
 Worklet: Represents set of workflow tasks
 Workflow: Customs the tasks for each record that need to execute.
 Session: set of instructions that instructs how to flow the data to the target.

Question 2 : Explain Concept of Extraction,transformation and Loading.


Answer :
Extraction :
Take data from an external source and move it to the warehouse pre-processor database.

Transformation:
Transform data task allows point-to-point generating, modifying and transforming data.

Loading:
Load data task adds records to a database table in a warehouse.

Question 3 : What is difference between Manual Testing and ETL Testing?(90% asked ETL
Testing Interview Questions)
Answer :
1.The main difference between manual testing and ETL testing is manual testing is related to the
functionality of the program and ETL testing is related to the databases and its count.

2.ETL is the automated testing process where you don’t need any technical knowledge other
than the software. Also, ETL testing is extremely faster, systematic and assure top results as
needed by the businesses.

3.Manual testing is highly time-consuming where you need technical knowledge to write the
test cases and the scripts. It is slow, needs efforts, and highly prone to errors.

Question 4 : Explain Need Of ETL Testing.(100% asked ETL Testing Interview Questions)
Answer :
Now a days we are migrating the tons of systems from old technology to new technology. At the
time of migration activities user also needs to migrate the data as well from old DBMS to latest
DBMS.So there is huge need to test that data is correct from target side.The following are some
bullet points where i have explained the necessity of ETL Testing :

 To keep a check on the Data which are being transferred from one system (Old system) to
the other (New system).
 To keep a track on the efficiency and speed of the process.
 To be well acquainted with the ETL process before it gets implemented into your business
and production.
Question 5 : Where user can use ETL concepts. Give some examples.
Answer :
 Before ETL tool user needs to write a long code for data transformation to data loading
 ETL makes the life simple and one tool will manage all the scenarios of transformation and
loading of the data
 There are following examples where we are using the ETL :
Example 1 : Data warehousing :
The ETL is used in data warehousing concepts. User needs to fetch the data from multiple
heterogeneous systems and loads it in data warehouse database.ETL Concept is mainly used
here to extract the data from source,transform the data and load it in to target systems.

Example 2: Data Migrations

The data migrations are difficult efforts if you use the PL SQL or T-SQL development to do. If you
want to migrate the data using simple way use different ETL tools.

Example 3 : Mergers and Aquisitions

Now a days lot of companies are merging in to different MNCs. To move the data from one
company to another company the ETL concepts is been used.

Question 6 : Explain how ETL is used in third party data management.(100% asked ETL
Testing Interview Questions)
Answer :
The big organizations always gives different application development to different kind of
vendors.Means not a single vendor is managing everything. Lets take example of
Telecommunication project where billing is managed by one company and CRM is managed by
other company.If CRM company needs some data from the company who is managing the
Billing. That company will receive a data feed from the other company. To load the data from
the feed ETL process is used.

Question 7 : Explain how ETL is used in Data warehousing?


Answer :
The most common example of ETL is ETL is used in Data warehousing.User needs to fetch the
historical data as well as current data for developing data warehouse. The Data warehouse data
is nothing but combination of historical data as well as transactional data. Its data sources might
be different.User needs to fetch the data from multiple heterogeneous systems and load it in to
single target system which is also called as data warehouse.

As The ETL definition suggests that ETL is nothing but Extract,Transform and loading of the
data;This process needs to be used in data warehousing widely. The simple example of this is
managing sales data in shopping mall. If user wants the historical data as well as current data in
the shopping mall first step is always user needs to follow the ETL process.Then that data will
be used for reporting purpose.

Question 8 : Explain difference between ETL and BI tools?


Answer :
An ETL tool is used to extract data from different data sources, transform the data, and load it
into a DW system. In contrast, a BI tool is used to generate interactive and adhoc reports for
end-users, dashboard for senior management, data visualizations for monthly, quarterly, and
annual board meetings.

Most common ETL tools include − SAP BO Data Services (BODS), Informatica, Microsoft – SSIS,
Oracle Data Integrator ODI, Talend Open Studio, Clover ETL Open source, etc.

Most common BI tools include − SAP Business Objects, SAP Lumira, IBM Cognos, JasperSoft,
Microsoft BI Platform, Tableau, Oracle Business Intelligence Enterprise Edition, etc.
Question 9 : What is difference between ETL Testing and Database Testing?(80%
asked ETL Testing Interview Questions)
Answer :
Following are different difference points between ETL testing and database testing :

ETL Testing DB Testing

Business Intelligence reporting Goal is to integrate data

Business flow environment based on earlier data Applicable to business flow systems

Informatica, Cognos and QuerySurge can be used QTP and Selenium tools for automation

Analysing data may have potential impact Architectural implementation involves high im

Dimensional model Entity relationship model

Analytics are processed Transactions are processed

Denormalized data is used Data used is normalized

Question 10 :What are different characteristics of Data Warehouse?(100% asked ETL


Testing Interview Questions)
Answer :
1. Data warehouse is a database which is separate from operational database which stores
historical information also.
2. Data warehouse database contains transactional as well as analytical data.
3. Data warehouse helps higher management to take stratagic as well as tactical decisions
using historical or current data.
4. Data warehouse helps consolidated historical data analysis.
5. Data warehouse helps business user to see the current trends to run the business.
6. Data warehouse is used for reporting and data analysis purpose.
Question 11 : What are different types of Data warehouse Systems?
Answer :
1. Data Mart
2. Online Analytical Processing (OLAP)
3. Online Transactional Processing
4. Predictive Analysis
Question 12 : Which are different steps of ETL testing process?(100% asked ETL Testing
Interview Questions)
Answer :
Following are different steps included in ETL Testing :

Step 1 :Analyzing the requirement:


Understanding the business structure and their particular requirement.
Step 2 :Validation and Test Estimation:
An estimation of time and expertise required to carry on with the procedure.

Step 3 : Test Planning and Designing the testing environment:


Based on the inputs from the estimation, an ETL environment is planned and worked out.
Step 4 :Test Data preparation and Execution :
Data for the test is prepared and executed as per the requirement.
Step 5 : Summary Report:
Upon the completion of the test run, a brief summary report is prepared for improvising and
concluding.

These are different steps in ETL testing process.

Question 13 : How ETL is used in Data migration projects. Explain with example.(60%
asked ETL Testing Interview Questions)
Answer :
ETL tools are widely used in data migration projects. If the organization is managing the data in
oracle 10 g previously and now organization wants to go for SQL server cloud database then
there is need to migrate the data from Source to Target.To do this kind of migration the ETL
tools are very useful. If user wants to write the code of ETL it is very time consuming process.
To make this simple the ETL tools are very useful in which the coding is simple as compare to
PL SQL or T-SQL code.So ETL process i very useful in Data migration projects.

Question 14 : Explain multiple steps to choose ETL tool .(90% asked ETL Testing
Interview Questions)
Answer :
The choosing the ETL tool is very difficult thing. You need to consider lot of factors while
choosing the correct ETL tool according to the project.Choosing the ETL tool for specific project
is very stratagic move even you need it for a small project.Make sure that ETL tool migrations
are no small efforts.In this section i would like to give you some bullet points to consider while
choosing your ETL tool.

1.Data Connectivity :
ETL tool should be communicate with any source of data no matter where it comes from.This is
very critical.

2.Performance :
Moving and changing a data requires some serious processing power. So you need to check the
performance factors.

3.Transformation Flexibility :
Matching, Merging and changing the data is very critical.ETL data should provide these and
many transformation packages which allow modifications to the data in transformation phase
with simple drag and drop.

4.Data Quality :
Your data is not clean. The only way to leverage your data when your data is consistent and
clean.

5.Flexible data Acquisition options :


Once the ETL is ready you need to check that ETL will work on previous data as well as new
coming data.

6.Commited ETL Vendor :


You are playing with the organization data while doing the ETL process.So Choose vendor who
is very well known in the industry and whose support is really great.
Question 16 : Name some important ETL bugs.(70% asked ETL Testing Interview
Questions)
Answer :
There are following popular ETL Bugs :

1.Source bugs

2.Calculation bugs

3.ECP related bugs

4.load condition bugs

5.The User-Interface bugs.

Question 17: What is Operational data Source


Answer :
 ODS stands for Operational Data Store.
 ODS Comes between staging area & Data Warehouse. The data is ODS will be at the low
level of granularity.
 Once data was populated in ODS aggregated data will be loaded into EDW through ODS.

Question 18 : Explain imporatance of mapping sheet.Who is responsible to create


mapping sheet?
Answer ;
The ETL mapping sheet contains all necessary information from the source file and stores the
details in rows and columns. This mapping sheet helps experts in writing SQL queries to speed
up the testing process.
The Database designer is responsible to create the mapping sheet.

Question 19 : What is fact and what are its types.(100% asked ETL Testing Interview
Questions)
Answer ;
It is a central component of a multi-dimensional model which contains the measures to be
analyzed. Facts are related to dimensions.

Types of facts are:


Additive: A measure can participate arithmetic calculations using all or any dimensions.
Ex: Sales profit

Semi additive: A measure can participate arithmetic calculations using some dimensions.
Ex: Sales amount

Non Additive:A measure can’t participate arithmetic calculations using dimensions.


Ex: temperature

Question 20 : Explain Data extraction phase in ETL with its type.(90% asked ETL Testing
Interview Questions)
Answer :
The data extraction is nothing but extracting the data from multiple heterogeneous sources with
using ETL tools.
There are 2 Types of Data Extraction

1.Full Extraction : All the data from source systems or operational systems gets extracted to
staging area. (Initial Load)
2.Partial Extraction : Sometimes we get notification from the source system to update specific
date. It is called as Delta load.
Source System Performance : The Extraction strategies should not affect source system
performance.
Question 21 : What is Dimensions? Explain with example.
Answer :
Dimension table is table which describes the business entities of an enterprise which describes
the objects in a fact table.Dimension table has primary key which uniquely identifies each
dimension row.Dimension table is sometimes called as lookup or reference table.The primary
key of dimension table is used to associate relationship between fact table which contains
foreign key.Dimension tables are normally in de-normalized form because these tables are only
used to analyse the data and not used to execute transactions.

The fields in a dimension table is used to complete following 3 important requirement :

1. Query Constrainting
2. Grouping /Filtering
3. Report labeling
Following are different examples of dimensions:

1.Time

2.Location

3.Item

4.Branch

Question 22 : Explain about data transformation in ETL.


Answer :
 Data Extracted from source system is in to Raw format. We need to transform it before
loading in to target server.
 Data has to be cleaned, mapped and transformed
 There are following steps in transformation:
1.Selection :Select data to load in target
2.Matching : Match the data with target system
3.Data Transforming: We need to change data as per target table structures.
Question 23 : What are different examples of data transformation in ETL.
Answer:
There are following examples of data transformation:

1.Standardizing data : Data is fetched from multiple sources so it needs to be standardized as


per the target system.
2.Character set conversion : Need to transform the character sets as per the target systems.
(Firstname and last name example).
3.Calculated and derived values: In source system there is first val and second val and in
target we need the calculation of first val and second val.
4.Data Conversion in different formats : If in source system date in in DDMMYY format and in
target the date is in DDMONYYYY format then this transformation needs to be done at
transformation phase.
Question 24 : Explain partitioning in ETL?
Answer :
The transactions are always needed to be divided for the better performance. The same
processes are known as Partitioning. It simply makes sure that the server can directly access the
sources through multiple connections.

Question 25 : What is data loading? Explain its types.


Answer:
Data loading phase loads the prepared data from staging tables to main tables.

There are following types of data loading:

 Initial load : Populating all the data tables from source system and loads it in to data
warehouse table.
 Incremental Load : Applying the ongoing changes as necessary in periodic manner.
 Full Refresh : Completely erases the data from one or more tables and reload the fresh data.
Question 26 : What are different types of ETL tool?
Answer :
There are following types of ETL tools:

1.Enterprise ETL tools :

Informatica

IBM Datastage

Abnitio

MS SQL server integration services

Clover ETL

2.Open Source ETL Tools :

1.Pentaho

2.Kettle

You might also like