FAQ Fo ETL TESTING
FAQ Fo ETL TESTING
From Source system data will be coming some data from tables and some data from flat files. From
source system via FTP protocol the files will be move to the landing area. In landing area again we
have three different layers one layer is Root layer, Reject layer and another one is Archive layer.
The files will move to root folder from source via FTP protocol from the root folder data will be
extract by ETL mechanism. Before going to extracting the data from root folder to there will be a pre
validations done by the ETL mechanism like checking the file formats, file naming conversions,
delimiters, data formats and header and footers etc… is there all validations are correct then ETL
mechanism will extract the data and load into the staging area. In the staging area all the
transformations will be applied and transform the data into required business format/logic/rules and
then we load the data into the data ware house database. From there will prepare the reports.
After the completion of this process the data will be removed from the root folder and move
to the archive folder. In archive folder data will be present up to 30 days.
If file format not correct then the data will not be extract by the ETL mechanism and job will
be failed and file will be moved to the reject folder.
• Tell me some basic validations which you have done in your last project.
• I check table structure and data types.
• I check source and target record count
• I check duplicate records in target table.
• I compare source and target column mapping.
• Apart from these what are the validations you have done.
• I have validated surrogate key values
• Validated lookup table data
• Validated fact values properly calculated or not as per the requirement.
• I have validated initial and incremental load data.
• What are the conditions for the column mapping (minus query)?
• Both the query column count should be same
• All the column data types should be same
• And column order should be same
Delete:
• Delete will delete data from the table and table structure remains same.
• We can roll back the data.
• We can delete specific selective data by using where clause.
• Will delete the data row by row.
• Low performance
• Will not delete your memory space
Truncate:
• Will delete the data from the table and table structure will be remains same.
• Cannot roll back the data
• Cannot delete the specific selective data because where clause doesn’t support by the
truncate command
• Will delete the whole data at once
• High performance
• It will delete the whole space of the array.
• What are the difference between primary key and surrogate key?
Primary key:
• PK is used to maintain unique record in the OLTP database’
• PK values entered by the user
• PK values can be alpha and numeric values.
• PK values are belongs to the business data/ table data.
• PK sequence order can be randomly.
Surrogate key:
• SK is used to maintain unique record in Data warehouse database
• SK values are generated by the ETL mechanism/System
• SK values can be numeric only.
• SK values are doesn’t belongs to business data/table data
• SK sequence order should be sequence order.
Procedure:
• Procedure may or may not return a value
• Procedure cannot call by SQL statements
• Procedure normally used for execute the business logic
• Return more than one value
Note: we can increase the performance of the application by reducing traffic of functions
and reducing the CPU load.
• What is a primary key and unique key constraint?
Unique: The unique constraint uniquely identifies each record in a data base table.
Primary key: Primary key constraint uniquely identifies the each record in data base table.
Primary key must contain unique values.
Primary key column cannot contain NULL value.
Each table should have primary keys and each table can have only one primary key.
Note:
• The unique and primary key constraints both provide a guarantee for uniqueness for a
column or set of column.
• A primary key constraint automatically has a unique and NOT NULL constraint defined
on it.
• Note that you can have many unique constraints per table, but only one primary key
constraint per table.
• What is joins?
A SQL joins is a query that combines rows from two or more tables, based on a relationship
between certain columns in the tables.
Join condition:
• Most join queries contain at least one join condition, either in the FROM clause or in
the WHERE clause.
• The join condition compares two columns each from different table.
• Cross join?
Cross join returns the Cartesian product of the rows from table in the join. In other words, It
will produce rows which combines each row from the first table with each row from second
table.
Syntax: select * from emp, dept;
Union all is faster than union, union’s duplicate elimination requires sorting operation, which
takes time.
Wild cards:
% allows you to match any string of any length
_ allows you to match on a single character.
DENSE RANK: Dense rank is same as rank it will also generate rank values but it does not skip
the rank value.
Null values not consider the rank.
• INLINE query?
When we write a inner query after the from clause of an SQL statement.
• Sub query?
• Sub query or inner query or nested query is a query in a query. A sub query is usually
added in the WHERE clause of the SQL statement. Most of the time, a sub query is
used when you know how to search a value using a SELECT statement but don’t know
the exact value.
• Sub queries are an alternative way of returning data from multiple tables.
• What is a view?
A view is a virtual table. Which provide access to a subset of column from one or more table.
A view can derive its data from one or more table, An output of query can be stored in view.
View act like a table but it does not physically take any space. View is good way to present
data from one user to another user instead of accessing the table directly. A view in oracle is
nothing but a stored sql scripts. Views itself contain no data.