0% found this document useful (0 votes)
21 views138 pages

Etl Imp

Database testing involves checking data validity, integrity, and performance of databases as well as testing procedures, triggers, and functions. Key aspects include verifying field sizes, constraints, indexes, stored procedures, front-end and back-end field sizes matching, and data loading between source and target systems.

Uploaded by

suman duggi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views138 pages

Etl Imp

Database testing involves checking data validity, integrity, and performance of databases as well as testing procedures, triggers, and functions. Key aspects include verifying field sizes, constraints, indexes, stored procedures, front-end and back-end field sizes matching, and data loading between source and target systems.

Uploaded by

suman duggi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 138

IMPORTANT QUERIES - PART 1

Data bas testing basically include the following.


1)Data validity testing.
2)Data Integritity testing
3)Performance related to data base.
4)Testing of Procedure,triggers and functions.
for doing data validity testing you should be good in SQL queries
For data integrity testing you should know about referintial integrity and different constraint.
For performance related things you should have idea about the table structure and design.
for testing Procedure triggers and functions you should be able to understand the same.
What we normally check for in the Database Testing? 
In DB testing we need to check for,
1. The field size validation
2. Check constraints.
3. Indexes are done or not (for performance related issues)
4. Stored procedures
5. The field size defined in the application is matching with that in the db.
How to Test database in Manually? Explain with an example? 
Observing that opertaions, which are operated on front-end is effected on back-end or not.
The approach is as follows :
While adding a record thr' front-end check back-end that addition of record is effected or not. So same
for delete, update,...... Ex:Enter employee record in database thr' front-end and check if the record is
added or not to the back-end(manually).
How to check a trigger is fired or not, while doing database testing? 
It can be verified by querying the common audit log where we can able to see the triggers fired.
Is a A fast database retrieval rate a testable requirement? 
No. I do not think so. Since the requirement seems to be ambiguous. The SRS should clearly mention
the performance or transaction requirements i.e. It should say like 'A DB retrival rate of 5 micro sec'.
How to test a SQL Query in Winrunner? without using DataBase CheckPoints? 
By writing scripting procedure in the TCL we can connect to the database and we can test data base and
queries.
The exact proccess should be:
1)connect to the database
db_connect("query1",DRIVER={drivername};SERVER=server_name;
UID=uidname;PWD=password;DBQ=database_name ");
2)Execute the query
db_excecute_query("query1","write query u want to execute");
-Condition to be mentioned-

3)disconnect the connection


db_disconnect("query");
How do you test whether the database is updated as and when an information are added in
the front end?Give me an example? 
It depends on what level of testing you are doing.When you want to save something from front end
obviously, it has to store somewhere in the database
You will need to find out the relevant tables involved in saving the records.
Data Mapping from front end to the tables.Then enter the data from front end and save.
Go to database, fire queries to get the same date from the back end.
What are the different stages involved in Database Testing? 
verify field level data in the database with respect to frontend transactions
verify the constraint (primary key,forien key ....)
verify the performance of the procedures
verify the triggrs (execution of triggers)
verify the transactions (begin,commit,rollback)
What is database testing and what we test in database testing? 
An1:
Database testing is all about testing joins, views, inports and exports , testing the procedures, checking
locks, indexing etc. Its not about testing the data in the database.
Usually database testing is performed by DBA.

An2:
Database testing involves some in depth knowledge of the given application and requires more defined
plan of approach to test the data.
Key issues include:
1) Data Integrity
2) Data Validity
3) Data Manipulation and updates
Tester must be aware of the database design concepts and implementation rules.

An3:
Data bas testing basically include the following.
1)Data validity testing.
2)Data Integritity testing
3)Performance related to data base.
4)Testing of Procedure,triggers and functions.
for doing data validity testing you should be good in SQL queries
For data integrity testing you should know about referintial integrity and different constraint.
For performance related things you should have idea about the table structure and design.
for testing Procedure triggers and functions you should be able to understand the same.
How to test data loading in Data base testing? 
You have to do the following things while you are involving in Data Load testing.
1. You have know about source data (table(s), columns, datatypes and Contraints)
2. You have to know about Target data (table(s), columns, datatypes and Contraints)
3. You have to check the compatibility of Source and Target.
4. You have to Open corresponding DTS package in SQL Enterprise Manager and run the DTS package
(If you are using SQL Server).
5. Then you should compare the column's data of Source and Target.
6. You have to check the number to rows of Source and Target.
7. Then you have to update the data in Source and see the change is reflecting in Target or not.
8. You have to check about junk character and NULLs.
Explain Database testing? 
An1:
here database testing means test engineer should test the data integrity, data accessing,query
retriving,modifications,updation and deletion etc

An2:
Database tests are supported via ODBC using the following functions:
SQLOpen, SQLClose, SQLError, SQLRetrieve, SQLRetrieveToFile, SQLExecQuery, SQLGetSchema and
SQLRequest.
You can carry out cursor type operations by incrementing arrays of returned datasets.

All SQL queries are supplied as a string. You can execute stored procedures for instance on SQL Server
you could use �Exec MyStoredProcedure� and as long as that stored procedure is registered on the
SQL Server database then it should execute however you cannot interact as much as you may like by
supplying say in/out variables, etc but for most instances it will cover your database test requirements

An3:
Data bas testing basically include the following.
1)Data validity testing.
2)Data Integritity testing
3)Performance related to data base.
4)Testing of Procedure,triggers and functions.

for doing data validity testing you should be good in SQL queries
For data integrity testing you should know about referintial integrity and different constraint.
For performance related things you should have idea about the table structure and design.
for testing Procedure triggers and functions you should be able to understand the same.

An4:
Data base testing generally deals with the follwoing:
a)Checking the integrity of UI data with Database Data
b)Checking whether any junk data is displaying in UI other than that stored in Database
c)Checking execution of stored procedures with the input values taken from the database tables
d)Checking the Data Migration .
e)Execution of jobs if any
What is data driven test? 
An1:
Data driven test is used to test the multinumbers of data in a data-table, using this we can easy to
replace the paramerers in the same time from deferent locations.
e.g: using .xsl sheets.

An2:
Re-execution of our test with different input values is called Re-testing. In validate our Project
calculations, test engineer follows retesting manner through automation tool.Re-teting is also called
DataDriven Test.There are 4 types of datadriven tests.
1) Dynamic Input submissiion ( key driven test) : Sometines a test engineer conducts retesting with
different input values to validate the calculation through dynamic submission.For this input submission,
test engineer use this function in TSL scriipt-- create_input_dialog ("label");
2) Data Driven Files Through FLAT FILES ( .txt,.doc) : Sometimes testengineer conducts re-testing
depends on flat file contents. He collect these files from Old Version databases or from customer side.
3)Data Driven Tests From FRONTEND GREAVES : Some times a test engineer create automation scripts
depends on frontend objects values such as (a) list (b) menu (c) table (d) data window (5) ocx etc.,
4)Data Driven Tests From EXCEL SHEET : sometimes a testengineer follows this type of data driven test
to execute their script for multiple inputs. These multiple inputs consists in excel sheet columns. We
have to collect this testdata from backend tables .
How to Test Database Procedures and Triggers? 
Before testing Data Base Procedures and Triggers, Tester should know that what is the Input and out
put of the procedures/Triggers, Then execute� Procedures and� Triggers, if you get answer that�
Test Case will be� pass� other wise fail.
These requirements should get from DEVELOPER
How to test a DTS package created for data insert update and delete? What should be
considered in the above case while testing it?W hat conditions are to be checked if the data is
inserted, updated or deleted using a text files? 
Data Integrity checks should be performed. IF the database schema is 3rd normal form, then that
should be maintained. Check to see if any of the constraints have thrown an error. The most important
command will have to be the DELETE command. That is where things can go really wrong.
Most of all, maintain a backup of the previous database.
How do you test whether a database in updated when information is entered in the front
end? 
It depend on your application interface..

1. If your application provides view functionality for the entered data, then you can verify that from front
end only. Most of the time Black box test engineers verify the functionality in this way.

2. If your application has only data entry from front end and there is no view from front end, then you
have to go to Database and run relevent SQL query.

3. You can also use database checkpoint function in WinRunner.


What steps does a tester take in testing Stored Procedures? 
First the tester should to go through the requirement, as to why the particular stored procedure is
written for.
Then check whether all the required indexes, joins, updates, deletions are correct comparing with the
tables mentions in the Stored Procedure. And also he has to ensure whether the Stored Procedure
follows the standard format like comments, updated by, etc.
Then check the procedure calling name, calling parameters, and expected reponses for different sets of
input parameters.
Then run the procedure yourself with database client programs like TOAD, or mysql, or Query Analyzer
Rerun the procedure with different parameters, and check results against expected values.
Finally, automate the tests with WinRunner.
How to use SQL queries in WinRunner/QTP? 
in QTP
using output databse check point and database check point ,
select SQL manual queries option
and enter the "select" queris to retrive data in the database and compare the expected and actual
What SQL statements have you used in Database Testing? 
The most important statement for database testing is the SELECT statement, which returns data rows
from one or multiple tables that satisfies a given set of criteria.
You may need to use other DML (Data Manipulation Language) statements like INSERT, UPDATE and
DELTE to manage your test data.
You may also need to use DDL (Data Definition Language) statements like CREATE TABLE, ALTER
TABLE, and DROP TABLE to manage your test tables.
You may also need to some other commands to view table structures, column definitions, indexes,
constraints and store procedures.
What is way of writing testcases for database testing? 
An1:
You have to do the following for writing the database testcases.
1. First of all you have to understand the functional requirement of the application throughly.
2. Then you have to find out the back end tables used, joined used between the tables, cursors used (if
any), tiggers used(if any), stored procedures used (if any), input parmeter used and output parameters
used for developing that requirement.
3. After knowing all these things you have to write the testcase with different input values for checking
all the paths of SP.
One thing writing testcases for backend testing not like functinal testing. You have to use white box
testing techniques.

An2:
To write testcase for database its just like functional testing.
1.Objective:Write the objective that you would like to test. eg: To check the shipment that i load thru
xml is getting inserted for perticular customer.
2.Write the method of input or action that you do. eg:Load an xml with all data which can be added to a
customer.
3.Expected :Input should be viewd in database. eg: The shipment should be loaded sucessfully for that
customer,also it should be seen in application.4.You can write ssuch type of testcases for any
functionality like update,delete etc.

An3:
At first we need to go through the documents provided.
Need to know what tables, stored procedures are mentioned in the doc.
Then test the functionality of the application.

Simultaneously, start writing the DB testcases.. with the queries you have used at the backend while
testing, the tables and stored procedures you have used in order to get the desired results. Trigers that
were fired. Based on the stored procedure we can know the functionality for a specific peice of the
application. So that we can write queries related to that. From that we make DB testcases also.
What we normally check for the Database Testing? 
An1:
In DB testing we need to check for,
1. The field size validation
2. Check constraints.
3. Indexes are done or not (for performance related issues)
4. Stored procedures
5. The field size defined in the application is matching with that in the db.

An2:
Database testing involves some indepth knowledge of the given application and requires more defined
plan of approach to test the data. Key issues include :
1) data Integrity
2) data validity
3) data manipulation and updates.

Tester must be aware of the database design concepts and implementation rules

What is your tuning approach if SQL query taking long time? Or how do u tune
SQL query?

If query taking long time then First will run the query in Explain Plan, The
explain plan process stores data in the PLAN_TABLE.
 it will give us execution plan of the query like whether the query is using the
relevant indexes on the joining columns or indexes to support the query are
missing.
If joining columns doesn’t have index then it will do the full table scan if it is full
table scan the cost will be more then will create the indexes on the joining
columns and will run the query  it should give  better performance and also 
needs to analyze the tables if analyzation happened long back. The ANALYZE
statement can be used to gather statistics for a specific table, index or cluster
using
ANALYZE TABLE employees COMPUTE STATISTICS;
If still have performance issue then will use HINTS, hint is nothing but a clue.
We can use hints like
 ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer 
ALL_ROWS is usually used for batch processing or data warehousingsystems.
(/*+ ALL_ROWS */) 
 FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer 
FIRST_ROWS is usually used for OLTP systems.
(/*+ FIRST_ROWS */) 
 CHOOSE
One of the hints that 'invokes' the Cost based optimizer 
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS, based
on statistics gathered.
 HASH
Hashes one table (full scan) and creates a hash index for that table. Then
hashes other table and uses hash index to find corresponding records. Therefore
not suitable for < or > join conditions.
/*+ use_hash */

Hints are most useful to optimize the query performance.


Posted by Sandeep Reddy at 5:28 AM No comments: 
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: FAQ'S, Oracle

What is the difference between sub-query & co-related sub query?


A sub query is executed once for the parent statement
whereas the correlated sub query is executed once for each
row of the parent query.
Sub Query:
Example:
 Select deptno, ename, sal from emp a  where sal  in (select sal from Grade 
where sal_grade=’A’ or  sal_grade=’B’)
Co-Related Sun query:
Example:
Find all employees who earn more than the average salary in their department.
SELECT last-named, salary, department_id  FROM employees A
WHERE salary > (SELECT AVG (salary)
FROM employees B WHERE B.department_id =A.department_id
Group by B.department_id)

Sub-query Co-related sub-query


A sub-query is executed once for the Where as co-related sub-query is
parent Query executed once for each row of the
parent query.
Example: Example:
Select * from emp where deptno in Select a.* from emp e where sal >=
(select deptno from dept); (select avg(sal) from emp a where
a.deptno=e.deptno group by 
a.deptno);
Posted by Sandeep Reddy at 5:27 AM No comments: 
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: FAQ'S, Oracle

View & Materialized View & Inline View


View:

Why Use Views?


• To restrict data access
• To make complex queries easy
• To provide data independence
  A simple view is one that:
– Derives data from only one table
– Contains no functions or groups of data
– Can perform DML operations through the view.

 A complex view is one that:


– Derives data from many tables
– Contains functions or groups of data
– Does not always allow DML operations through the view
 
 
A view has a logical existence but a materialized view has
a physical existence.Moreover a materialized view can be
Indexed, analysed and so on....that is all the things that
we can do with a table can also be done with a materialized
view.
 
We can keep aggregated data into materialized view. we can schedule the MV to
refresh but table can’t.MV can be created based on multiple tables.
 
Materialized View:

 
In DWH materialized views are very essential because in reporting side if we do
aggregate calculations as per the business requirement report performance
would be de graded. So to improve report performance rather than doing report
calculations and joins at reporting side if we put same logic in the MV then we
can directly select the data from MV without any joins and aggregations. We can
also schedule MV (Materialize View).
Inline view:

If we write a select statement in from clause that is nothing but inline view.
 
Ex:
Get dept wise max sal along with empname and emp no.
 
Select a.empname, a.empno, b.sal, b.deptno
From EMP a, (Select max (sal) sal, deptno from EMP group by deptno) b
Where
a.sal=b.sal and

a.deptno=b.deptno
Posted by Sandeep Reddy at 5:26 AM No comments: 
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: FAQ'S, Oracle

Differences between where clause and having clause

Where clause Having clause


Both where and having clause can be used to filter the data.
Where as in where clause it is not But having clause we need to use it
mandatory. with the group by.
Where clause applies to the individual Where as having clause is used to test
rows. some condition on the group rather
than on individual rows.
Where clause is used to restrict rows. But having clause is used to restrict
groups.
Restrict normal query by where Restrict group by function by having
In where clause every record is filtered In having clause it is with aggregate
based on where. records (group by functions).

Order of where and having:


SELECT column, group_function
FROM table
[WHERE condition]
[GROUP BY group_by_expression]
[HAVING group_condition]
[ORDER BY column];

The WHERE clause cannot be used to restrict groups. you use the
HAVING clause to restrict groups.
Posted by Sandeep Reddy at 5:25 AM No comments: 
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: FAQ'S, Oracle

Difference between Rowid and Rownum?


ROWID
A globally unique identifier for a row in a database. It is created at the time the
row is inserted into a table, and destroyed when it is removed from a
table.'BBBBBBBB.RRRR.FFFF' where BBBBBBBB is the block number, RRRR is the
slot(row) number, and FFFF is a file number. 

ROWNUM
For each row returned by a query, the ROWNUM pseudo column returns a
number indicating the order in which Oracle selects the row from a table or set
of joined rows. The first row selected has a ROWNUM of 1, the second has 2, and
so on.

You can use ROWNUM to limit the number of rows returned by a query, as in this
example:

SELECT * FROM employees WHERE ROWNUM < 10;

Rowid Row-num
Rowid is an oracle internal id that is Row-num is a row number returned
allocated every time a new record is by a select statement.
inserted in a table. This ID is unique
and cannot be changed by the user.
Rowid is permanent. Row-num is temporary.
Rowid is a globally unique identifier The row-num pseudocoloumn returns
for a row in a database. It is created a number indicating the order in
at the time the row is inserted into which oracle selects the row from a
the table, and destroyed when it is table or set of joined rows.
removed from a table.
Posted by Sandeep Reddy at 5:24 AM No comments: 
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: FAQ'S, Oracle

What is the Difference between Delete, Truncate and Drop?


DELETE
The DELETE command is used to remove rows from a table. A WHERE clause can
be used to only remove some rows. If no WHERE condition is specified, all rows
will be removed. After performing a DELETE operation you need to COMMIT or
ROLLBACK the transaction to make the change permanent or to undo it.
TRUNCATE
TRUNCATE removes all rows from a table. The operation cannot be rolled back.
As such, TRUCATE is faster and doesn't use as much undo space as a DELETE.
DROP
The DROP command removes a table from the database. All the tables' rows,
indexes and privileges will also be removed. The operation cannot be rolled
back.

Posted by Sandeep Reddy at 5:15 AM No comments: 


Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: FAQ'S, Oracle

What is the difference between view and materialized view?

View Materialized view


A view has a logical existence. It does A materialized view has a physical
not contain data. existence.
Its not a database object. It is a database object.
We cannot perform DML operation on We can perform DML operation on
view. materialized view.
When we do select * from view it will When we do select * from materialized
fetch the data from base table. view it will fetch the data from
materialized view.
In view we cannot schedule to refresh. In materialized view we can schedule
to refresh.
We can keep aggregated data into
materialized view. Materialized view
can be created based on multiple
tables.

NORMALIZATION and NORMAL FORM'S


Some Oracle databases were modeled according to the rules of normalization
that were intended to eliminate redundancy.
Obviously, the rules of normalization are required to understand your
relationships and functional dependencies
First Normal Form:

A row is in first normal form (1NF) if all underlying domains contain atomic
values only.
 Eliminate duplicative columns from the same table.
 Create separate tables for each group of related data and identify each
row with a unique column or set of columns (the primary key).
Second Normal Form:

An entity is in Second Normal Form (2NF) when it meets the requirement of


being in First Normal Form (1NF) and additionally:
 Does not have a composite primary key. Meaning that the primary key
can not be subdivided into separate logical entities.
 All the non-key columns are functionally dependent on the entire primary
key.
 A row is in second normal form if, and only if, it is in first normal form and
every non-key attribute is fully dependent on the key.
 2NF eliminates functional dependencies on a partial key by putting the
fields in a separate table from those that are dependent on the whole key. An
example is resolving many: many relationships using an intersecting entity.
Third Normal Form:

An entity is in Third Normal Form (3NF) when it meets the requirement of being
in Second Normal Form (2NF) and additionally:
 Functional dependencies on non-key fields are eliminated by putting them
in a separate table. At this level, all non-key fields are dependent on the primary
key.
 A row is in third normal form if and only if it is in second normal form and
if attributes that do not contribute to a description of the primary key are move
into a separate table. An example is creating look-up tables.
Boyce-Codd Normal Form:

Boyce Codd Normal Form (BCNF) is a further refinement of 3NF. In his later
writings Codd refers to BCNF as 3NF. A row is in Boyce Codd normal form if, and
only if, every determinant is a candidate key. Most entities in 3NF are already in
BCNF.
Fourth Normal Form:

An entity is in Fourth Normal Form (4NF) when it meets the requirement of


being in Third Normal Form (3NF) and additionally:
Has no multiple sets of multi-valued dependencies. In other words, 4NF states
that no entity can have more than a single one-to-many relationship

SUBQUERIES AND EXISTS


SUBQUERIES

Nesting of queries, one within the other is termed as a subquery.


A statement containing a subquery is called a parent query.
Subqueries are used to retrieve data from tables that depend on the
values in the table itself.

TYPES

Ø  Single row subqueries


Ø  Multi row subqueries
Ø  Multiple subqueries
Ø  Correlated subqueries

SINGLE ROW SUBQUERIES

In single row subquery, it will return one value.

Ex:
    SQL> select * from emp where sal > (select sal from emp where
empno = 7566);

     EMPNO   ENAME      JOB        MGR    HIREDATE    SAL   COMM    


DEPTNO
    ---------- ---------- --------- ---------- ------------  ------- ----------
----------
      7788         SCOTT   ANALYST    7566   19-APR-87  
3000                    20
      7839         KING     PRESIDENT            17-NOV-81 
5000                    10
      7902         FORD    ANALYST     7566   03-DEC-81  
3000                    20
MULTI ROW SUBQUERIES

In multi row subquery, it will return more than one value. In such cases
we should include operators like any, all, in or not in between the
comparision operator and the subquery.

Ex:
     SQL> select * from emp where sal > any (select sal from emp where
sal between 2500 and
             4000);

     EMPNO    ENAME      JOB      MGR     HIREDATE   SAL   COMM    


DEPTNO
    ---------- ---------- --------- ---------- -----------   -------- ----------
----------
      7566         JONES   MANAGER   7839 02-APR-81   2975         
20
      7788         SCOTT   ANALYST    7566 19-APR-87    3000          
20
      7839         KING     PRESIDENT          17-NOV-81   5000           
10
      7902         FORD    ANALYST     7566 03-DEC-81    3000            
20

       SQL> select * from emp where sal > all (select sal from emp where
sal between 2500 and  
              4000);
  
     EMPNO      ENAME    JOB       MGR     HIREDATE     SAL    COMM 
DEPTNO
     ---------- ---------- --------- ---------- -------------  ------ ----------
----------
      7839         KING     PRESIDENT            17-NOV-81 
5000                    10

MULTIPLE SUBQUERIES
There is no limit on the number of subqueries included in a where
clause. It allows nesting of a query within a subquery.

Ex:
     SQL> select * from emp where sal = (select max(sal) from emp
where sal < (select
              max(sal) from emp));

     EMPNO      ENAME   JOB      MGR       HIREDATE   SAL   COMM    


DEPTNO
     ---------- ---------- --------- ---------- ------------  ------- ----------
----------
      7788         SCOTT   ANALYST  7566    19-APR-87
3000                    20
     7902          FORD    ANALYST   7566    03-DEC-81 
3000                    20

CORRELATED SUBQUERIES

A subquery is evaluated once for the entire parent statement where as a


correlated subquery is evaluated once for every row processed by the
parent statement.
Ex:
     SQL> select distinct deptno from emp e where 5 <= (select
count(ename) from emp where
             e.deptno = deptno);
    DEPTNO
    ----------
        20
        30

EXISTS

Exists function is a test for existence. This is a logical test for the return
of rows from a query.

Ex:
     Suppose we want to display the department numbers which has
more than 4 employees.
     SQL> select deptno,count(*) from emp group by deptno having
count(*) > 4;

   DEPTNO   COUNT(*)
   ---------    ----------
       20             5
       30             6

     From the above query can you want to display the names of
employees?
      SQL> select deptno,ename, count(*) from emp group by
deptno,ename having count(*) > 4;

     no rows selected

     The above query returns nothing because combination of deptno and
ename never return     
     more than one count.
     The solution is to use exists which follows.
      
      SQL> select deptno,ename from emp e1 where exists (select * from
emp e2
             where e1.deptno=e2.deptno group by e2.deptno having
count(e2.ename) > 4) order by
             deptno,ename;
 

    DEPTNO   ENAME
     ---------- ----------
        20            ADAMS
        20            FORD
        20            JONES
        20            SCOTT
        20            SMITH
        30            ALLEN
        30            BLAKE
        30            JAMES
        30            MARTIN
        30            TURNER
        30            WARD

NOT EXISTS

SQL> select deptno,ename from emp e1 where not exists (select * from


emp e2
        where e1.deptno=e2.deptno group by e2.deptno having
count(e2.ename) > 4) order by
        deptno,ename;

   DEPTNO ENAME
    --------- ----------
       10             CLARK
       10             KING

       10             MILLER

JOINS
The purpose of a join is to combine the data across tables.
A join is actually performed by the where clause which combines the
specified rows of tables.
If a join involves in more than two tables then oracle joins first two
tables based on the joins condition and then compares the result with
the next table and so on.

TYPES
       Equi join
       Non-equi join
       Self join
       Natural join
       Cross join
       Outer join
Ø  Left outer
Ø  Right outer
Ø  Full outer
       Inner join
       Using clause
       On clause

Assume that we have the following tables.


SQL> select * from dept;
    DEPTNO DNAME      LOC
     ------ ---------- ----------
        10            mkt        hyd
        20            fin        bang
        30            hr         bombay

SQL> select * from emp;

       EMPNO   ENAME      JOB       MGR     DEPTNO


      ---------- ---------- ---------- ---------- ----------
       111         saketh     analyst           444         10
       222         sudha     clerk                333         20
       333         jagan      manager         111         10
       444         madhu    engineer         222         40

EQUI JOIN

A join which contains an ‘=’ operator in the joins condition.

Ex:
     SQL> select empno,ename,job,dname,loc from emp e,dept d where
e.deptno=d.deptno;

          EMPNO     ENAME      JOB    DNAME      LOC


          ---------- ---------- ---------- ---------- ----------
            111           saketh    analyst    mkt        hyd
            333           jagan      manager  mkt        hyd
            222           sudha      clerk        fin        bang

USING CLAUSE

SQL> select empno,ename,job ,dname,loc from emp e join dept d


using(deptno);

          EMPNO     ENAME      JOB    DNAME      LOC


          ---------- ---------- ---------- ---------- ----------
            111           saketh    analyst    mkt        hyd
            333           jagan      manager  mkt        hyd
            222           sudha      clerk        fin        bang
ON CLAUSE

SQL>  select empno,ename,job,dname,loc from emp e join dept d


on(e.deptno=d.deptno);

          EMPNO     ENAME      JOB    DNAME      LOC


          ---------- ---------- ---------- ---------- ----------
            111           saketh    analyst    mkt        hyd
            333           jagan      manager  mkt        hyd
            222           sudha      clerk        fin        bang

NON-EQUI JOIN

A join which contains an operator other than ‘=’ in the joins condition.

Ex:
     SQL> select empno,ename,job,dname,loc from emp e,dept d where
e.deptno > d.deptno;

          EMPNO     ENAME    JOB      DNAME      LOC


          ---------- ---------- ---------- ---------- ----------
       222    sudha      clerk          mkt        hyd
       444    madhu     engineer   mkt        hyd
       444    madhu     engineer   fin          bang
       444    madhu     engineer   hr           bombay

SELF JOIN

Joining the table itself is called self join.

Ex:
     SQL> select e1.empno,e2.ename,e1.job,e2.deptno from emp e1,emp
e2 where
             e1.empno=e2.mgr;

     EMPNO     ENAME    JOB      DEPTNO


     ---------- ---------- ---------- ----------
       111          jagan      analyst         10
       222          madhu      clerk           40
       333          sudha      manager      20
       444          saketh     engineer      10
NATURAL JOIN

Natural join compares all the common columns.

Ex:
     SQL> select empno,ename,job,dname,loc from emp natural join dept;

     EMPNO   ENAME      JOB      DNAME    LOC


    ---------- ---------- ---------- ---------- ----------
       111          saketh     analyst     mkt        hyd
       333          jagan      manager   mkt        hyd
       222          sudha      clerk         fin          bang

CROSS JOIN

This will gives the cross product.

Ex:
     SQL> select empno,ename,job,dname,loc from emp cross join dept;

 EMPNO  ENAME    JOB        DNAME      LOC


---------- ---------- ---------- ---------- ----------
       111     saketh   analyst      mkt        hyd
       222     sudha    clerk          mkt        hyd
       333     jagan     manager   mkt        hyd
       444     madhu   engineer   mkt        hyd
       111     saketh   analyst      fin          bang
       222     sudha    clerk          fin          bang
       333     jagan     manager   fin          bang
       444     madhu   engineer   fin          bang
       111     saketh   analyst      hr           bombay
       222     sudha    clerk          hr           bombay
       333     jagan     manager   hr           bombay
       444     madhu   engineer   hr           bombay

OUTER JOIN
Outer join gives the non-matching records along with matching records.

LEFT OUTER JOIN

This will display the all matching records and the records which are in
left hand side table those that are not in right hand side table.

Ex:
     SQL> select empno,ename,job,dname,loc from emp e left outer join
dept d
             on(e.deptno=d.deptno);
Or
      SQL> select empno,ename,job,dname,loc from emp e,dept d where
e.deptno=d.deptno(+);
    

     EMPNO     ENAME   JOB       DNAME      LOC


     ---------- ---------- ---------- ---------- ----------
       111          saketh    analyst       mkt        hyd
       333          jagan      manager    mkt        hyd
       222          sudha     clerk           fin          bang
       444          madhu    engineer

RIGHT OUTER JOIN

This will display the all matching records and the records which are in
right hand side table those that are not in left hand side table.

Ex:
     SQL> select empno,ename,job,dname,loc from emp e right outer join
dept d
              on(e.deptno=d.deptno);
Or
      SQL> select empno,ename,job,dname,loc from emp e,dept d where
e.deptno(+) = d.deptno;

     EMPNO    ENAME     JOB      DNAME      LOC


     ---------- ---------- ---------- ---------- ----------
       111          saketh     analyst      mkt        hyd
       333          jagan       manager   mkt        hyd
       222          sudha      clerk          fin          bang
                                                       hr           bombay

FULL OUTER JOIN

This will display the all matching records and the non-matching records
from both tables.

Ex:
     SQL> select empno,ename,job,dname,loc from emp e full outer join
dept d
              on(e.deptno=d.deptno);

 EMPNO   ENAME    JOB        DNAME      LOC


---------- ---------- ---------- ---------- ----------
       333     jagan     manager    mkt        hyd
       111     saketh   analyst       mkt        hyd
       222     sudha    clerk           fin        bang
       444     madhu   engineer
                                                   hr         bombay

INNER JOIN

This will display all the records that have matched.

Ex:
     SQL> select empno,ename,job,dname,loc from emp inner join dept
using(deptno);

     EMPNO     ENAME   JOB        DNAME    LOC


     ---------- ---------- ---------- ---------- ----------
       111          saketh     analyst      mkt       hyd
       333          jagan       manager   mkt       hyd
       222          sudha      clerk          fin         bang

SYNONYM AND SEQUENCE


SYNONYM

A synonym is a database object, which is used as an alias for a table,


view or sequence.
TYPES
Ø  Private
Ø  Public
Private synonym is available to the particular user who creates.
Public synonym is created by DBA which is available to all the users.

ADVANTAGES

Ø  Hide the name and owner of the object.


Ø  Provides location transparency for remote objects of a distributed
database.

CREATE AND DROP

SQL> create synonym s1 for emp;


SQL> create public synonym s2 for emp;
SQL> drop synonym s1;

SEQUENCE

A sequence is a database object, which can generate unique, sequential


integer values.
It can be used to automatically generate primary key or unique key
values.
A sequence can be either in an ascending or descending order.

Syntax:
      Create sequence <seq_name> [increment bty n] [start with n]
[maxvalue n] [minvalue n]
                                                         [cycle/nocycle] [cache/nocache];

By defalult the sequence starts with 1, increments by 1 with minvalue of


1 and with nocycle,  nocache.
Cache option pre-alloocates a set of sequence numbers and retains them
in memory for faster access.

Ex:
     SQL> create sequence s;
     SQL> create sequence s increment by 10 start with 100 minvalue 5
maxvalue 200 cycle  
             cache 20;

USING SEQUENCE

SQL> create table student(no number(2),name varchar(10));


SQL> insert into student values(s.nextval, ‘saketh’);

Ø  Initially currval is not defined and nextval is starting value.


Ø  After that nextval and currval are always equal.

CREATING ALPHA-NUMERIC SEQUENCE

SQL> create sequence s start with 111234;


SQL> Insert into student values (s.nextval || translate   
         (s.nextval,’1234567890’,’abcdefghij’));

ALTERING SEQUENCE

We can alter the sequence to perform the following.


Ø  Set or eliminate minvalue or maxvalue.
Ø  Change the increment value.
Ø  Change the number of cached sequence numbers.

Ex:
     SQL> alter sequence s minvalue 5;
     SQL> alter sequence s increment by 2;
     SQL> alter sequence s cache 10;

DROPPING SEQUENCE

SQL> drop sequence s;

SET OPERATORS
TYPES

Ø  Union
Ø  Union all
Ø  Intersect
Ø  Minus

UNION

This will combine the records of multiple tables having the same
structure.

Ex:
     SQL> select * from student1 union select * from student2;

UNION ALL

This will combine the records of multiple tables having the same
structure but including duplicates.

Ex:
     SQL> select * from student1 union all select * from student2;

INTERSECT

This will give the common records of multiple tables having the same
structure.

Ex:
     SQL> select * from student1 intersect select * from student2;

MINUS

This will give the records of a table whose records are not in other
tables having the same structure.
Ex:
     SQL> select * from student1 minus select * from student2;

ROLLUP GROUPING CUBE


These are the enhancements to the group by feature.

USING ROLLUP

This will give the salaries in each department in each job category along
wih the total salary   
fot individual departments and the total salary of all the departments.

SQL> Select deptno,job,sum(sal) from emp group by rollup(deptno,job);

                        DEPTNO   JOB         SUM(SAL)


----------  ---------   ----------
        10    CLERK           1300
        10    MANAGER     2450
        10    PRESIDENT   5000
        10                          8750
        20    ANALYST       6000
        20    CLERK           1900
        20    MANAGER     2975
        20                        10875
        30    CLERK             950
        30    MANAGER      2850
        30    SALESMAN     5600
        30                          9400
                          29025

USING GROUPING

 In the above query it will give the total salary of the individual
departments but with a  
 blank in the job column and gives the total salary of all the departments
with blanks in
 deptno and job columns.
 
 To replace these blanks with your desired string grouping will be used
  SQL> select decode(grouping(deptno),1,'All
Depts',deptno),decode(grouping(job),1,'All             
          jobs',job),sum(sal) from emp group by rollup(deptno,job);

DECODE(GROUPING(DEPTNO),1,'ALLDEPTS',DEP DECODE(GR  
SUM(SAL)
-----------------------------------  ----------------------------------   
--------------
10                                                        CLERK             1300
10                                                         MANAGER                 2450
10                                                        PRESIDENT               5000
10                                                        All jobs                       8750
20                                                        ANALYST                   6000
20                                                        CLERK             1900
20                                                        MANAGER                  2975
20                                                        All jobs                    10875
30                                                        CLERK              950
30                                                        MANAGER                  2850
30                                                        SALESMAN                 5600
30                                                        All jobs                       9400
All Depts                                                        All jobs                    
29025

   Grouping will return 1 if the column which is specified in the grouping


function has been
   used in rollup.
   Grouping will be used in association with decode.

USING CUBE

This will give the salaries in each department in each job category, the
total salary for individual departments, the total salary of all the
departments and the salaries in each job category.

SQL> select decode(grouping(deptno),1,’All
Depts’,deptno),decode(grouping(job),1,’All 
        Jobs’,job),sum(sal) from emp group by cube(deptno,job);

DECODE(GROUPING(DEPTNO),1,'ALLDEPTS',DEP DECODE(GR  
SUM(SAL)
-----------------------------------  ------------------------------------ 
------------
10                                            CLERK                         1300
10                                            MANAGER                              2450
10                                            PRESIDENT                           5000
10                                            All Jobs                                  8750
20                                            ANALYST                               6000
20                                            CLERK                         1900
20                                            MANAGER                              2975
20                                            All Jobs                               10875
30                                            CLERK                            950
30                                            MANAGER                               2850
30                                            SALESMAN                             5600
30                                            All Jobs                                  9400
All Depts                                            ANALYST                              
6000
All Depts                                            CLERK                         4150
All Depts                                            MANAGER                             
8275
All Depts                                            PRESIDENT                          
5000
All Depts                                            SALESMAN                            
5600

All Depts                                            All Jobs                               


29025
Posted by Sandeep Reddy at 3:54 AM 

GROUP BY AND HAVING -Order of Execution


GROUP BY

Using group by, we can create groups of related information.


Columns used in select must be used with group by, otherwise it was
not a group by expression.

Ex:
     SQL> select deptno, sum(sal) from emp group by deptno;

                        DEPTNO   SUM(SAL)


---------- ----------
        10       8750
        20      10875
        30       9400

     SQL> select deptno,job,sum(sal) from emp group by deptno,job;

                        DEPTNO  JOB         SUM(SAL)


---------- ---------   ----------
        10   CLERK            1300
        10   MANAGER      2450
        10   PRESIDENT   5000
        20   ANALYST       6000
        20   CLERK           1900
        20   MANAGER     2975
        30   CLERK             950
        30   MANAGER      2850
        30   SALESMAN    5600

HAVING

This will work as where clause which can  be used only with group by
because of absence of where clause in group by.
Ex:
     SQL> select deptno,job,sum(sal) tsal from emp group by deptno,job
having sum(sal) > 3000;

              DEPTNO   JOB              TSAL


   ----------  ---------      ----------
        10    PRESIDENT    5000
        20    ANALYST        6000
        30    SALESMAN     5600

SQL> select deptno,job,sum(sal) tsal from emp group by deptno,job


having sum(sal) > 3000 
         order by job;

                   DEPTNO    JOB          TSAL


 ----------  ---------    ----------
          20          ANALYST       6000
                       10        PRESIDENT   5000
     30        SALESMAN    5600

ORDER OF EXECUTION

Ø  Group the rows together based on group by clause.


Ø  Calculate the group functions for each group.
Ø  Choose and eliminate the groups based on the having clause.

Ø  Order the groups based on the specified column.

PARTITIONS (which will help for tuning queries)


A single logical table can be split into a number of physically separate
pieces based on ranges of key values. Each of the parts of the table is
called a partition.
A non-partitioned table can not be partitioned later.

TYPES

Ø  Range partitions
Ø  List partitions
Ø  Hash partitions
Ø  Sub partitions

ADVANTAGES

Ø  Reducing downtime for scheduled maintenance, which allows


maintenance operations to be carried out on selected partitions while
other partitions are available to users.
Ø  Reducing downtime due to data failure, failure of a particular partition
will no way affect other partitions.
Ø  Partition independence allows for concurrent use of the various
partitions for various purposes.

ADVANTAGES OF PARTITIONS BY STORING THEM IN DIFFERENT


TABLESPACES

Ø  Reduces the possibility of data corruption in multiple partitions.


Ø  Back up and recovery of each partition can be done independently.

DISADVANTAGES

Ø  Partitioned tables cannot contain any columns with long or long raw
datatypes, LOB types or object types.

RANGE PARTITIONS

a) Creating range partitioned table


     SQL> Create table student(no number(2),name varchar(2)) partition
by range(no) (partition 
             p1 values less than(10), partition p2 values less than(20),
partition p3 values less     
             than(30),partition p4 values less than(maxvalue));

    ** if you are using maxvalue for the last partition, you can not add a
partition.
b) Inserting records into range partitioned table
     SQL> Insert into student values(1,’a’);          -- this will go to p1
     SQL> Insert into student values(11,’b’);        -- this will go to p2
     SQL> Insert into student values(21,’c’);        -- this will go to p3
     SQL> Insert into student values(31,’d’);        -- this will go to p4
c) Retrieving records from range partitioned table
     SQL> Select *from student;
     SQL> Select *from student partition(p1);
d) Possible operations with range partitions
v  Add
v  Drop
v  Truncate
v  Rename        
v  Split
v  Move
v  Exchange
e) Adding a partition
     SQL> Alter table student add partition p5 values less than(40);
f) Dropping a partition
    SQL> Alter table student drop partition p4;
g) Renaming a partition
     SQL> Alter table student rename partition p3 to p6;
h) Truncate a partition
     SQL> Alter table student truncate partition p6;
i) Splitting a partition
    SQL> Alter table student split partition p2 at(15) into (partition
p21,partition p22);
j) Exchanging a partition
    SQL> Alter table student exchange partition p1 with table student2;
k) Moving a partition
     SQL> Alter table student move partition p21 tablespace saketh_ts;
LIST PARTITIONS

a) Creating list partitioned table


     SQL> Create table student(no number(2),name varchar(2)) partition
by list(no) (partition p1     
            values(1,2,3,4,5), partition p2 values(6,7,8,9,10),partition p3
values(11,12,13,14,15),
            partition p4 values(16,17,18,19,20));
 b) Inserting records into list partitioned table
      SQL> Insert into student values(1,’a’);         -- this will go to p1
      SQL> Insert into student values(6,’b’);         -- this will go to p2
      SQL> Insert into student values(11,’c’);       -- this will go to p3
      SQL> Insert into student values(16,’d’);       -- this will go to p4
c) Retrieving records from list partitioned table
     SQL> Select *from student;
     SQL> Select *from student partition(p1);
d) Possible operations with list partitions
v  Add
v  Drop
v  Truncate
v  Rename        
v  Move
v  Exchange
e) Adding a partition
     SQL> Alter table student add partition p5 values(21,22,23,24,25);
f) Dropping a partition
     SQL> Alter table student drop partition p4;
g) Renaming a partition
     SQL> Alter table student rename partition p3 to p6;
h) Truncate a partition
     SQL> Alter table student truncate partition p6;
i) Exchanging a partition
    SQL> Alter table student exchange partition p1 with table student2;
j) Moving a partition
    SQL> Alter table student move partition p2 tablespace saketh_ts;

HASH PARTITIONS

a) Creating hash partitioned table


     SQL> Create table student(no number(2),name varchar(2)) partition
by hash(no) partitions  
             5;
Here oracle automatically gives partition names like
                                    SYS_P1
                                    SYS_P2
                                    SYS_P3
                                    SYS_P4
                                    SYS_P5
b) Inserting records into hash partitioned table
     it will insert the records based on hash function calculated by taking
the partition key
     SQL> Insert into student values(1,’a’);         
     SQL> Insert into student values(6,’b’);         
     SQL> Insert into student values(11,’c’);       
     SQL> Insert into student values(16,’d’);       
c) Retrieving records from hash partitioned table
     SQL> Select *from student;
     SQL> Select *from student partition(sys_p1);
d) Possible operations with hash partitions
v  Add
v  Truncate
v  Rename        
v  Move
v  Exchange
e) Adding a partition
     SQL> Alter table student add partition p6 ;
f) Renaming a partition
    SQL> Alter table student rename partition p6 to p7;
g) Truncate a partition
     SQL> Alter table student truncate partition p7;
h) Exchanging a partition
     SQL> Alter table student exchange partition sys_p1 with table
student2;
i) Moving a partition
    SQL> Alter table student move partition sys_p2 tablespace saketh_ts;

SUB-PARTITIONS WITH RANGE AND HASH


Subpartitions clause is used by hash only. We can not create
subpartitions with list and hash partitions.

a) Creating subpartitioned table


     SQL> Create table student(no number(2),name varchar(2),marks
number(3))
             Partition by range(no) subpartition by hash(name) subpartitions
3
             (Partition p1 values less than(10),partition p2 values less
than(20));
    
This will create two partitions p1 and p2 with three subpartitions for
each partition
                        P1 –   SYS_SUBP1
                                    SYS_SUBP2
                                    SYS_SUBP3
                        P2 –   SYS_SUBP4
                                    SYS_SUBP5
                                    SYS_SUBP6
     ** if you are using maxvalue for the last partition, you can not add a
partition.
b) Inserting records into subpartitioned table
     SQL> Insert into student values(1,’a’);          -- this will go to p1
     SQL> Insert into student values(11,’b’);        -- this will go to p2
c) Retrieving records from subpartitioned table
     SQL> Select *from student;
     SQL> Select *from student partition(p1);
     SQL> Select *from student subpartition(sys_subp1);
d) Possible operations with subpartitions
v  Add
v  Drop
v  Truncate
v  Rename        
v  Split
e) Adding a partition
     SQL> Alter table student add partition p3 values less than(30);
f) Dropping a partition
     SQL> Alter table student drop partition p3;
g) Renaming a partition
     SQL> Alter table student rename partition p2 to p3;
h) Truncate a partition
     SQL> Alter table student truncate partition p1;
i) Splitting a partition

     SQL> Alter table student split partition p3 at(15) into (partition


p31,partition p32);

VARRAYS AND NESTED TABLES


VARRAYS

A varying array allows you to store repeating attributes of a record in a


single row but with limit.

Ex:
    1) We can create varrays using oracle types as well as user defined
types.
         a) Varray using pre-defined types
              SQL> Create type va as varray(5) of varchar(10);/
         b) Varrays using user defined types
              SQL> Create type addr as object(hno number(3),city
varchar(10));/
              SQL> Create type va as varray(5) of addr;/
    2) Using varray in table
         SQL> Create table student(no number(2),name
varchar(10),address va);
    3) Inserting values into varray table
         SQL> Insert into student values(1,’sudha’,va(addr(111,’hyd’)));
         SQL> Insert into student
values(2,’jagan’,va(addr(111,’hyd’),addr(222,’bang’)));
    4) Selecting data from varray table
         SQL> Select * from student;
         -- This will display varray column data along with varray and
adt;
         SQL> Select no,name, s.* from student s1, table(s1.address)
s;
         -- This will display in general format
    5) Instead of s.* you can specify the columns in varray
         SQL> Select no,name, s.hno,s.city from student
s1,table(s1.address) s;

    -- Update and delete not possible in varrays.


    -- Here we used table function which will take the varray column as
input for producing
        output excluding varray and types.

      
       

       NESTED TABLES

A nested table is, as its name implies, a table within a table. In this case
it is a table that is represented as a column within another table.
Nested table has the same effect of varrays but has no limit.

Ex:
    1) We can create nested tables using oracle types and user defined
types which has no limit
         a) Nested tables using pre-defined types
              SQL> Create type nt as table of varchar(10);/
         b) Nested tables using user defined types
              SQL> Create type addr as object(hno number(3),city
varchar(10));/
              SQL> Create type nt as table of addr;/
    2) Using nested table in table
         SQL> Create table student(no number(2),name
varchar(10),address nt) nested table  
                  address store as student_temp;
    3) Inserting values into table which has nested table
         SQL> Insert into student values (1,’sudha’,nt(addr(111,’hyd’)));
         SQL> Insert into student values
(2,’jagan’,nt(addr(111,’hyd’),addr(222,’bang’)));
    4) Selecting data from table which has nested table
         SQL> Select * from student;
         -- This will display nested table column data along with nested
table and adt;
         SQL> Select no,name, s.* from student s1, table(s1.address)
s;
         -- This will display in general format
    5) Instead of s.* you can specify the columns in nested table
         SQL> Select no,name, s.hno,s.city from student
s1,table(s1.address) s;
    6) Inserting nested table data to the existing row
         SQL> Insert into table(select address from student where
no=1)
                  values(addr(555,’chennai’));
    7) Update in nested tables
         SQL> Update table(select address from student where no=2) s
set s.city=’bombay’ where
                 s.hno = 222;
    8) Delete in nested table

         SQL> Delete table(select address from student where no=3) s


where s.hno=333;

CONSTRAINTS & OPERATIONS WITH CONSTRAINTS & DEFERRABLE


CONSTRAINTS
Constraints are categorized as follows.

Domain integrity constraints


ü  Not null
ü  Check

Entity integrity constraints


ü  Unique
ü  Primary key

Referential integrity constraints


ü  Foreign key

Constraints are always attached to a column not a table.


We can add constraints in three ways.

ü  Column level -- along with the column definition


ü  Table level                 -- after the table definition
ü  Alter level                  -- using alter command

While adding constraints you need not specify the name but the type
only, oracle will internally name the constraint.
If you want to give a name to the constraint, you have to use the
constraint clause.
NOT NULL

This is used to avoid null values.


We can add this constraint in column level only.

Ex:
     SQL> create table student(no number(2) not null, name varchar(10),
marks number(3));
     SQL> create table student(no number(2) constraint nn not null,
name varchar(10), marks
             number(3));
CHECK

This is used to insert the values based on specified condition.


We can add this constraint in all three levels.

Ex:
     COLUMN LEVEL

     SQL> create table student(no number(2) , name varchar(10), marks


number(3) check
             (marks > 300));
      SQL> create table student(no number(2) , name varchar(10), marks
number(3) constraint ch 
             check(marks > 300));

      TABLE LEVEL

      SQL> create table student(no number(2) , name varchar(10), marks


number(3), check
             (marks > 300));
      SQL> create table student(no number(2) , name varchar(10), marks
number(3), constraint
             ch check(marks > 300));

      ALTER LEVEL

      SQL> alter table student add check(marks>300);


      SQL> alter table student add constraint ch check(marks>300);
UNIQUE

This is used to avoid duplicates but it allow nulls.


We can add this constraint in all three levels.

Ex:
      COLUMN LEVEL

     SQL> create table student(no number(2) unique, name varchar(10),


marks number(3));
      SQL> create table student(no number(2)  constraint un unique,
name varchar(10), marks
             number(3));
       
      TABLE LEVEL
     
      SQL> create table student(no number(2) , name varchar(10), marks
number(3),
             unique(no));
      SQL> create table student(no number(2) , name varchar(10), marks
number(3), constraint
             un unique(no));

      ALTER LEVEL

      SQL> alter table student add unique(no);


      SQL> alter table student add constraint un unique(no);

PRIMARY KEY

This is used to avoid duplicates and nulls. This will work as combination
of unique and not null.
Primary key always attached to the parent table.
We can add this constraint in all three levels.

Ex:
      COLUMN LEVEL
      SQL> create table student(no number(2) primary key, name
varchar(10), marks number(3));
      SQL> create table student(no number(2)  constraint pk primary key,
name varchar(10),
             marks number(3));
     
      TABLE LEVEL

      SQL> create table student(no number(2) , name varchar(10), marks


number(3),
             primary key(no));
      SQL> create table student(no number(2) , name varchar(10), marks
number(3), constraint
             pk primary key(no));
     
      ALTER LEVEL

      SQL> alter table student add primary key(no);


      SQL> alter table student add constraint pk primary key(no);

FOREIGN KEY

This is used to reference the parent table primary key column which
allows duplicates.
Foreign key always attached to the child table.
We can add this constraint in table and alter levels only.

Ex:
      TABLE LEVEL

     SQL> create table emp(empno number(2), ename varchar(10),


deptno number(2),
             primary key(empno), foreign key(deptno) references
dept(deptno));
      SQL> create table emp(empno number(2), ename varchar(10),
deptno number(2),
             constraint pk primary key(empno), constraint fk foreign
key(deptno) references
             dept(deptno));
      ALTER LEVEL
    
      SQL> alter table emp add foreign key(deptno) references
dept(deptno);
                SQL> alter table emp add constraint fk foreign key(deptno)
references dept(deptno);

Once the primary key and foreign key relationship has been created
then you can not remove any parent record if the dependent childs
exists.
       
USING ON DELTE CASCADE

By using this clause you can remove the parent record even it childs
exists.
Because when ever you remove parent record oracle automatically
removes all its dependent records from child table, if this clause is
present while creating foreign key constraint.

Ex:
      TABLE LEVEL

     SQL> create table emp(empno number(2), ename varchar(10),


deptno number(2),
             primary key(empno), foreign key(deptno) references
dept(deptno) on delete cascade);
      SQL> create table emp(empno number(2), ename varchar(10),
deptno number(2),
             constraint pk primary key(empno), constraint fk foreign
key(deptno) references
             dept(deptno) on delete cascade);
     
      ALTER LEVEL

      SQL> alter table emp add foreign key(deptno) references


dept(deptno) on delete cascade;
                SQL> alter table emp add constraint fk foreign key(deptno)
references dept(deptno) on
                     delete cascade;
COMPOSITE KEYS

A composite key can be defined on a combination of columns.


We can define composite keys on entity integrity and referential
integrity constraints.
Composite key can be defined in table and alter levels only.

Ex:
      UNIQUE (TABLE LEVEL)
     
      SQL> create table student(no number(2) , name varchar(10), marks
number(3),
             unique(no,name));
      SQL> create table student(no number(2) , name varchar(10), marks
number(3), constraint
             un unique(no,name));

      UNIQUE (ALTER LEVEL)

      SQL> alter table student add unique(no,name);


      SQL> alter table student add constraint un unique(no,name);

     PRIMARY KEY (TABLE LEVEL)

      SQL> create table student(no number(2) , name varchar(10), marks


number(3),
             primary key(no,name));
      SQL> create table student(no number(2) , name varchar(10), marks
number(3), constraint
             pk primary key(no,name));

      PRIMARY KEY (ALTER LEVEL)

      SQL> alter table student add primary key(no,anme);


      SQL> alter table student add constraint pk primary key(no,name);

      FOREIGN KEY (TABLE LEVEL)


     SQL> create table emp(empno number(2), ename varchar(10),
deptno number(2), dname 
             varchar(10), primary key(empno), foreign key(deptno,dname)
references
             dept(deptno,dname));

      SQL> create table emp(empno number(2), ename varchar(10),


deptno number(2), dname 
             varchar(10), constraint pk primary key(empno), constraint fk
foreign
             key(deptno,dname) references dept(deptno,dname));

      FOREIGN KEY (ALTER LEVEL)


    
      SQL> alter table emp add foreign key(deptno,dname) references
dept(deptno,dname);
                SQL> alter table emp add constraint fk foreign
key(deptno,dname) references
                     dept(deptno,dname);

DEFERRABLE CONSTRAINTS

Each constraint has two additional attributes to support deferred


checking of constraints.

Ø  Deferred initially immediate


Ø  Deferred initially deferred

Deferred initially immediate checks for constraint violation at the time


of insert.
Deferred initially deferred checks for constraint violation at the time of
commit.

Ex:
     SQL> create table student(no number(2), name varchar(10), marks
number(3), constraint
             un unique(no) deferred initially immediate);

     SQL> create table student(no number(2), name varchar(10), marks


number(3), constraint
             un unique(no) deferred initially deferred);

     SQL> alter table student add constraint un unique(no) deferrable


initially deferred;
     
      SQL> set constraints all immediate;
     This will enable all the constraints violations at the time of inserting.
      SQL> set constraints all deferred;
     This will enable all the constraints violations at the time of commit.

OPERATIONS WITH CONSTRAINTS

Possible operations with constraints as follows.

Ø  Enable
Ø  Disable
Ø  Enforce
Ø  Drop

ENABLE

This will enable the constraint. Before enable, the constraint will check
the existing data.

Ex:
     SQL> alter table student enable constraint un;

DISABLE

This will disable the constraint.

Ex:
     SQL> alter table student enable constraint un;

ENFORCE

This will enforce the constraint rather than enable for future inserts or
updates.
This will not check for existing data while enforcing data.
Ex:
     SQL> alter table student enforce constraint un;

DROP

This will remove the constraint.


Ex:
     SQL> alter table student drop constraint un;
Once the table is dropped, constraints automatically will drop.

To find the nth row of a table


SQL> Select *from emp where rowid = (select max(rowid) from emp
where rownum <= 4);
Or
SQL> Select *from emp where rownum <= 4 minus select *from emp
where rownum <= 3;

To find duplicate rows


SQL> Select *from emp where rowid in (select max(rowid) from emp
group by empno,          
ename, mgr, job, hiredate, comm, deptno, sal);
Or
SQL> Select empno,ename,sal,job,hiredate,comm , count(*) from
emp group by
empno,ename,sal,job,hiredate,comm  having count(*) >=1;

To delete duplicate rows


SQL> Delete emp where rowid in (select max(rowid) from emp group
by
empno,ename,mgr,job,hiredate,sal,comm,deptno);

To find the count of duplicate rows


SQL> Select ename, count(*) from emp group by ename having
count(*) >= 1;

How to display alternative rows in a table?


SQL> select *from emp where (rowid,0) in (select
rowid,mod(rownum,2) from emp);
Getting employee details of each department who is drawing
maximum sal?
SQL> select *from emp where (deptno,sal) in
( select deptno,max(sal)  from emp group by deptno);

How to get number of employees in each department  , in which


department is having more than 2500 employees?
SQL> Select deptno,count(*) from emp group by  deptno having
count(*) >2500;

To reset the time to the beginning of the day

SQL> Select to_char(trunc(sysdate),’dd-mon-yyyy
hh:mi:ss am’) from dual;

To find nth maximum sal


SQL> Select *from emp where sal in (select max(sal) from (select
*from emp order by sal)

where rownum <= 5);


Posted by Sandeep Reddy at 4:03 AM No comments: 
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: IMP QUERIES, Oracle

INDEXES - Detailed Indexes(End to End)

Index is typically a listing of keywords accompanied by the location


of information on a subject. We can create indexes explicitly to speed
up SQL statement execution on a table. The index points directly to
the location of the rows containing the value.

Some Imp Notes:


Bitmap indexes are most appropriate for columns having low distinct
values—such as GENDER, MARITAL_STATUS, and RELATION. This
assumption is not completely accurate, however. In reality, a bitmap
index is always advisable for systems in which data is not frequently
updated by many concurrent systems. In fact, as I'll demonstrate here,
a bitmap index on a column with 100-percent unique values (a column
candidate for primary key) is as efficient as a B-tree index.
When to Create an Index
You should create an index if:
A column contains a wide range of values
A column contains a large number of null values
One or more columns are frequently used together in a WHERE clause
or a join condition
The table is large and most queries are expected to retrieve less than 2
to 4 percent of the rows
By default if u create index that is nothing but b-tree index.

WHY INDEXES?

Indexes are most useful on larger tables, on columns that are likely
to appear in where clauses as simple equality.

TYPES

Ø  Unique index
Ø  Non-unique index
Ø  Btree index
Ø  Bitmap index
Ø  Composite index
Ø  Reverse key index
Ø  Function-based index
Ø  Descending index
Ø  Domain index
Ø  Object index
Ø  Cluster index
Ø  Text index
Ø  Index organized table
Ø  Partition index
v  Local index
ü  Local prefixed
ü  Local non-prefixed
v       Global index
ü       Global prefixed
ü       Global non-prefixed

UNIQUE INDEX
Unique indexes guarantee that no two rows of a table have duplicate
values in the columns that define the index. Unique index is
automatically created when primary key or unique constraint is
created.

Ex:
SQL> create unique index stud_ind on student(sno);

NON-UNIQUE INDEX

Non-Unique indexes do not impose the above restriction on the


column values.

Ex:
SQL> create index stud_ind on student(sno);

BTREE INDEX or ASCENDING INDEX

The default type of index used in an oracle database is the btree


index. A btree index is designed to provide both rapid access to
individual rows and quick access to groups of rows within a range.
The btree index does this by performing a succession of value
comparisons. Each comparison eliminates many of the rows.

Ex:
SQL> create index stud_ind on student(sno);

BITMAP INDEX

This can be used for low cardinality columns: that is columns in


which the number of distinct values is snall when compared to the
number of the rows in the table.

Ex:
SQL> create bitmap index stud_ind on student(sex);

COMPOSITE INDEX
A composite index also called a concatenated index is an index
created on multiple columns of a table. Columns in a composite index
can appear in any order and need not be adjacent columns of the
table.

Ex:
SQL> create bitmap index stud_ind on student(sno, sname);

REVERSE KEY INDEX

A reverse key index when compared to standard index, reverses each


byte of the column being indexed while keeping the column order.
When the column is indexed in reverse mode then the column values
will be stored in an index in different blocks as the starting value
differs. Such an arrangement can help avoid performance
degradations in indexes where modifications to the index are
concentrated on a small set of blocks.

Ex:
SQL> create index stud_ind on student(sno, reverse);

We can rebuild a reverse key index into normal index using the
noreverse keyword.

Ex:
SQL> alter index stud_ind rebuild noreverse;

FUNCTION BASED INDEX

This will use result of the function as key instead of using column as
the value for the key.

Ex:
SQL> create index stud_ind on student(upper(sname));

DESCENDING INDEX
The order used by B-tree indexes has been ascending order. You can
categorize data in B-tree index in descending order as well. This
feature can be useful in applications where sorting operations are
required.

Ex:
SQL> create index stud_ind on student(sno desc);

TEXT INDEX

Querying text is different from querying data because words have


shades of meaning, relationships to other words, and opposites. You
may want to search for words that are near each other, or words that
are related to thers. These queries would be extremely difficult if all
you had available was the standard relational operators. By
extending SQL to include text indexes, oracle text permits you to ask
very complex questions about the text.

To use oracle text, you need to create a text index on the column in
which the text is stored. Text index is a collection of tables and
indexes that store information about the text stored in the column.

TYPES

There are several different types of indexes available in oracle 9i. The
first, CONTEXT is supported in oracle 8i as well as oracle 9i. As of
oracle 9i, you can use the CTXCAT text index fo further enhance your
text index management and query capabilities.

Ø  CONTEXT
Ø  CTXCAT
Ø  CTXRULE

The CTXCAT index type supports the transactional synchronization of


data between the base table and its text index.
With CONTEXT indexes, you need to manually tell oracle to update
the values in the text index after data changes in base
table. CTXCAT index types do not generate score values during the
text queries.
HOW TO CREATE TEXT INDEX?

You can create a text index via a special version of the create index
comman. For context index, specify the ctxsys.context index type and
for ctxcat index, specify the ctxsys.ctxcat index type.

Ex:
Suppose you have a table called BOOKS with the following columns
Title, Author, Info.

SQL> create index book_index on books(info) indextype is


ctxsys.context;
SQL> create index book_index on books(info) indextype is
ctxsys.ctxcat;

TEXT QUERIES

Once a text index is created on the info column of BOOKS table, text-


searching capabilities increase dynamically.

CONTAINS & CATSEARCH

CONTAINS function takes two parameters – the column name and the


search string.

Syntax:
Contains(indexed_column, search_str);

If you create a CTXCAT index, use the CATSEARCH function in place


of CONTAINS.CATSEARCH takes three parameters – the column
name, the search string and the index set.

Syntax:
Contains(indexed_column, search_str, index_set);

HOW A TEXT QEURY WORKS?

When a function such as CONTAINS or CATSEARCH is used in query,


the text portion of the query is processed by oracle text. The
remainder of the query is processed just like a regular query within
the database. The result of the text query processing and the regular
query processing are merged to return a single set of records to the
user.
SEARCHING FOR AN EXACT MATCH OF A WORD

The following queries will search for a word called ‘prperty’ whose
score is greater than zero.

SQL> select * from books where contains(info, ‘property’) > 0;


SQL> select * from books where catsearch(info, ‘property’, null) > 0;

Suppose if you want to know the score of the ‘property’ in each book,
if score values for individual searches range from 0 to 10 for each
occurrence of the string within the text then use the score function.

SQL> select title, score(10) from books where contains(info,


‘property’, 10) > 0;

SEARCHING FOR AN EXACT MATCH OF MULTIPLE WORDS

The following queries will search for two words.

SQL> select * from books where contains(info,


‘property AND harvests’) > 0;
SQL> select * from books where catsearch(info,
‘property AND harvests’, null) > 0;

Instead of using AND you could hae used an ampersand(&). Before


using this method, set define off so the & character will not be seen
as part of a variable name.

SQL> set define off


SQL> select * from books where contains(info, ‘property & harvests’)
> 0;
SQL> select * from books where catsearch(info, ‘property  harvests’,
null) > 0;

The following queries will search for more than two words.
SQL> select * from books where contains(info,
‘property AND harvests AND workers’) > 0;
SQL> select * from books where catsearch(info, ‘property harvests
workers’, null) > 0;

The following queries will search for either of the two words.

SQL> select * from books where contains(info,


‘property OR harvests’) > 0;

Instead of OR you can use a vertical line (|).


SQL> select * from books where contains(info, ‘property | harvests’)
> 0;
SQL> select * from books where catsearch(info,
‘property | harvests’, null) > 0;

In the following queries the ACCUM(accumulate) operator adds


together the scores of the individual searches and compares the
accumulated score to the threshold value.

SQL> select * from books where contains(info,


‘property ACCUM harvests’) > 0;
SQL> select * from books where catsearch(info,
‘property ACCUM harvests’, null) > 0;

Instead of OR you can use a comma(,).

SQL> select * from books where contains(info, ‘property , harvests’)


> 0;
SQL> select * from books where catsearch(info, ‘property , harvests’,
null) > 0;

In the following queries the MINUS operator subtracts the score of


the second term’s search from the score of the first term’s search.

SQL> select * from books where contains(info,


‘property MINUS harvests’) > 0;
SQL> select * from books where catsearch(info,
‘property NOT harvests’, null) > 0;
Instead of MINUS you can use – and instead of NOT you can use ~.

SQL> select * from books where contains(info, ‘property - harvests’)


> 0;
SQL> select * from books where catsearch(info,
‘property ~ harvests’, null) > 0;

SEARCHING FOR AN EXACT MATCH OF A PHRASE

The following queries will search for the phrase. If the search phrase
includes a reserved word within oracle text, the you must use curly
braces ({}) to enclose text.

SQL> select * from books where contains(info, ‘transactions {and}


finances’) > 0;
SQL> select * from books where catsearch(info, ‘transactions {and}
finances’, null) > 0;

You can enclose the entire phrase within curly braces, in which case
any reserved words within the phrase will be treated as part of the
search criteria.

SQL> select * from books where contains(info, ‘{transactions and


finances}’) > 0;
SQL> select * from books where catsearch(info, ‘{transactions and
finances}’, null) > 0;

SEARCHING FOR WORDS THAT ARE NEAR EACH OTHER

The following queries will search for the words that are in between
the search terms.

SQL> select * from books where contains(info,


‘workers NEAR harvests’) > 0;

Instead of NEAR you can use ;.

SQL> select * from books where contains(info, ‘workers ; harvests’)


> 0;
In CONTEXT index queries, you can specify the maximum number of
words between the search terms.

SQL> select * from books where contains(info, ‘NEAR((workers,


harvests),10)’ > 0;

USING WILDCARDS DURING SEARCHES

You can use wildcards to expand the list of valid search terms used
during your query. Just as in regular text-string wildcard processing,
two wildcards are available.

%        -           percent sign; multiple-character wildcard


_          -           underscore; single-character wildcard

SQL> select * from books where contains(info, ‘worker%’) > 0;


SQL> select * from books where contains(info, ‘work___’) > 0;

SEARCHING FOR WORDS THAT SHARE THE SAME STEM

Rather than using wildcards, you can use stem-expansion capabilities


to expand the list of text strings. Given the ‘stem’ of a word, oracle
will expand the list of words to search for to include all words having
the same stem. Sample expansions are show here.

Play    -           plays playing played playful

SQL> select * from books where contains(info, ‘$manage’) > 0;

SEARCHING FOR FUZZY MATCHES

A fuzzy match expands the specified search term to include words


that are spelled similarly but that do not necessarily have the same
word stem. Fuzzy matches are most helpful when the text contains
misspellings. The misspellings can be either in the searched text or in
the search string specified by the user during the query.

The following queries will not return anything because its search
does not contain the word ‘hardest’.
SQL> select * from books where contains(info, ‘hardest’) > 0;

It does, however, contains the word ‘harvest’. A fuzzy match will


return the books containing the word ‘harvest’ even though ‘harvest’
has a different word stem thant the word used as the search term.

To use a fuzzy match, precede the search term with a question mark,
with no space between the question mark and the beginning of the
search term.

SQL> select * from books where contains(info, ‘?hardest’) > 0;

SEARCHING FOR WORDS THAT SOUND LIKE OTHER WORDS

SOUNDEX, expands search terms based on how the word sounds. The
SOUNDEX expansion method uses the same text-matching logic
available via the SOUNDEX function in SQL.

To use the SOUNDEX option, you must precede the search term with
an exclamation mark(!).

SQL> select * from books where contains(info, ‘!grate’) > 0;

INDEX SYNCHRONIZATION

When using CONTEXT indexes, you need to manage the text index


contents; the text indexes are not updated when the base table is
updated. When the table was updated, its text index is out of sync
with the base table. To sync of the index, execute
theSYNC_INDEX procedure of the CTX_DDL package.

SQL> exec CTX_DDL.SYNC_INDEX(‘book_index’);

INDEX SETS

Historically, problems with queries of text indexes have occurred


when other criteria are used alongside text searches as part of the
where clause. To improve the mixed query capability, oracle features
index sets. The indexes within the index set may be structured
relational columns or on text columns.
To create an index set, use the CTX_DDL package to create the index
set and add indexes to it. When you create a text index, you can then
specify the index set it belongs to.

SQL> exec CTX_DDL.CREATE_INDEX_SET(‘books_index_set’);

The add non-text indexes.

SQL> exec CTX_DDL.ADD_INDEX(‘books_index_set’, ‘title_index’);

Now create a CTXCAT text index. Specify ctxsys.ctxcat as the index


type, and list the index set in the parameters clause.

SQL> create index book_index on books(info) indextype is


ctxsys.ctxcat parameters(‘index set books_index_set’);

INDEX-ORGANIZED TABLE

An index-organized table keeps its data sorted according to the


primary key column values for the table. Index-organized tables
store their data as if the entire table was stored in an index.
An index-organized table allows you to store the entire table’s data in
an index.
Ex:
SQL> create table student (sno number(2),sname
varchar(10),smarks number(3) constraint
pk primary key(sno) organization index;

PARTITION INDEX

Similar to partitioning tables, oracle allows you to partition indexes


too. Like table partitions,  index partitions could be in different
tablespaces.

LOCAL INDEXES

Ø  Local keyword tells oracle to create a separte index for each


partition.
Ø  In the local prefixed index the partition key is specified on the left
prefix. When the underlying table is partitioned baes on, say two
columns then the index can be prefixed on the first column specified.
Ø  Local prefixed indexes can be unique or non unique.
Ø  Local indexes may be easier to manage than global indexes.

Ex:
SQL> create index stud_index on student(sno) local;

GLOBAL INDEXES

Ø  A global index may contain values from multiple partitions.


Ø  An index is global prefixed if it is partitioned on the left prefix of
the index columns.
Ø  The global clause allows you to create a non-partitioned index.
Ø  Global indexes may perform uniqueness checks faster than local
(partitioned) indexes.
Ø  You cannot create global indexes for hash partitions or
subpartitions.

Ex:
SQL> create index stud_index on student(sno) global;

Similar to table partitions, it is possible to move them from one


device to another. But unlike table partitions, movement of index
partitions requires individual reconstruction of the index or each
partition (only in the case of global index).

Ex:
SQL> alter index stud_ind rebuild partition p2

Ø  Index partitions cannot be dropped manually.


Ø  They are dropped implicitly when the data they refer to is dropped
from the partitioned table.

MONITORING USE OF INDEXES

Once you turned on the monitoring the use of indexes, then we can
check whether the table is hitting the index or not.
To monitor the use of index use the follwing syntax.

Syntax:
alter index index_name monitoring usage;

then check for the details in V$OBJECT_USAGE view.

If you want to stop monitoring use the following.

Syntax:

alter index index_name nomonitoring usage;

IMPORTANT QUERIES - PART 2

Get duplicate rows from the table:


Select empno, count (*) from EMP group by empno having count (*)>1;
Remove duplicates in the table:
Delete from EMP where rowid not in (select max (rowid) from EMP group by
empno);
Below query transpose columns into rows.

Name No Add1 Add2


abc 100 Hyd bang
xyz 200 Mysore pune

Select name, no, add1 from A


UNION
Select name, no, add2 from A;
Below query transpose rows into columns.
select
emp_id,
max(decode(row_id,0,address))as address1,
max(decode(row_id,1,address)) as address2,
max(decode(row_id,2,address)) as address3
from (select emp_id,address,mod(rownum,3) row_id from temp order by
emp_id )
group by emp_id
Other query:
select
emp_id,
max(decode(rank_id,1,address)) as add1,
max(decode(rank_id,2,address)) as add2,
max(decode(rank_id,3,address))as add3
from
(select emp_id,address,rank() over (partition by emp_id order by
emp_id,address )rank_id from temp )
group by
emp_id
Rank query:
Select empno, ename, sal, r from (select empno, ename, sal, rank () over
(order by sal desc) r from EMP);
Dense rank query:
The DENSE_RANK function works acts like the RANK function except that it
assigns consecutive ranks:
Select empno, ename, Sal, from (select empno, ename, sal, dense_rank ()
over (order by sal desc) r from emp);
Top 5 salaries by using rank:
Select empno, ename, sal,r from (select empno,ename,sal,dense_rank() over
(order by sal desc) r from emp) where r<=5;
Or
Select * from (select * from EMP order by sal desc) where
rownum<=5;
2 nd highest Sal:
Select empno, ename, sal, r from (select empno, ename, sal, dense_rank ()
over (order by sal desc) r from EMP) where r=2;
Top sal:
Select * from EMP where sal= (select max (sal) from EMP);
How to display alternative rows in a table?
SQL> select *from emp where (rowid, 0) in (select rowid,mod(rownum,2)
from emp);

Hierarchical queries
Starting at the root, walk from the top down, and eliminate employee Higgins
in the result, but
process the child rows.
SELECT department_id, employee_id, last_name, job_id, salary
FROM employees
WHERE last_name! = ’Higgins’
START WITH manager_id  IS NULL
CONNECT BY PRIOR employee_id = menagerie;
Posted by Sandeep Reddy at 5:32 AM No comments: 
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: IMP QUERIES, Oracle

What is Hints and Why hints Require?

It is a perfect valid question to ask why hints should be used. Oracle comes
with an optimizer that promises to optimize a query's execution plan. When
this optimizer is really doing a good job, no hints should be required at all.
Sometimes, however, the characteristics of the data in the database are
changing rapidly, so that the optimizer (or more accuratly, its statistics) are
out of date. In this case, a hint could help.
You should first get the explain plan of your SQL and determine what
changes can be done to make the code operate without using hints if
possible. However, hints such as ORDERED, LEADING, INDEX, FULL, and the
various AJ and SJ hints can take a wild optimizer and give you optimal
performance
Tables analyze and update Analyze Statement
The ANALYZE statement can be used to gather statistics for a specific table,
index or cluster. The statistics can be computed exactly, or estimated based
on a specific number of rows, or a percentage of rows:
ANALYZE TABLE employees COMPUTE STATISTICS;
ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 15 PERCENT;
EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES');

Automatic Optimizer Statistics Collection


By default Oracle 10g automatically gathers optimizer statistics using a
scheduled job called GATHER_STATS_JOB. By default this job runs within
maintenance windows between 10 P.M. to 6 A.M. week nights and all day on
weekends. The job calls the
DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC internal procedure
which gathers statistics for tables with either empty or stale statistics, similar
to the DBMS_STATS.GATHER_DATABASE_STATS procedure using the
GATHER AUTO option. The main difference is that the internal job prioritizes
the work such that tables most urgently requiring statistics updates are
processed first.
Hint categories:
Hints can be categorized as follows:
ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer 
ALL_ROWS is usually used for batch processing or data warehousingsystems.
(/*+ ALL_ROWS */) 
FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer 
FIRST_ROWS is usually used for OLTP systems.
(/*+ FIRST_ROWS */) 
CHOOSE
One of the hints that 'invokes' the Cost based optimizer 
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS,
based on statistics gathered.
Hints for Join Orders,
Hints for Join Operations,
Hints for Parallel Execution, (/*+ parallel(a,4) */)  specify degree either 2 or
4 or 16
Additional Hints
HASH
Hashes one table (full scan) and creates a hash index for that table. Then
hashes other table and uses hash index to find corresponding records.
Therefore not suitable for < or > join conditions.
/*+ use_hash */
Use Hint to force using index

SELECT /*+INDEX (TABLE_NAME INDEX_NAME) */ COL1,COL2 FROM


TABLE_NAME
Select  ( /*+ hash  */ ) empno from
ORDERED-à This hint forces tables to be joined in the order specified. If you
know table X has fewer rows, then ordering it first may speed execution in a
join.
PARALLEL (table, instances)àThis specifies the operation is to be done in
parallel.

If index is not able to create then will go for  /*+ parallel(table, 8)*/-----For
select and update example---in where clase like st,not in ,>,< ,<> then we
will use.
Posted by Sandeep Reddy at 5:30 AM No comments: 
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: FAQ'S, Oracle

Explain plan
Explain plan will tell us whether the query properly using indexes or
not.whatis the cost of the table whether it is doing full table scan or not,
based on these statistics we can tune the query.
The explain plan process stores data in the PLAN_TABLE. This table can be
located in the current schema or a shared schema and is created using in
SQL*Plus as follows:
SQL> CONN sys/password AS SYSDBA
Connected
SQL> @$ORACLE_HOME/rdbms/admin/utlxplan.sql
SQL> GRANT ALL ON sys.plan_table TO public;

SQL> CREATE PUBLIC SYNONYM plan_table FOR sys.plan_table;


Posted by Sandeep Reddy at 5:29 AM No comments: 
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: FAQ'S, Oracle

What is your tuning approach if SQL query taking long time? Or how do u
tune SQL query?

If query taking long time then First will run the query in Explain Plan, The
explain plan process stores data in the PLAN_TABLE.
it will give us execution plan of the query like whether the query is using the
relevant indexes on the joining columns or indexes to support the query are
missing.
If joining columns doesn’t have index then it will do the full table scan if it is
full table scan the cost will be more then will create the indexes on the
joining columns and will run the query  it should give  better performance
and also  needs to analyze the tables if analyzation happened long back. The
ANALYZE statement can be used to gather statistics for a specific table, index
or cluster using
ANALYZE TABLE employees COMPUTE STATISTICS;
If still have performance issue then will use HINTS, hint is nothing but a clue.
We can use hints like
ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer 
ALL_ROWS is usually used for batch processing or data warehousingsystems.
(/*+ ALL_ROWS */) 
FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer 
FIRST_ROWS is usually used for OLTP systems.
(/*+ FIRST_ROWS */) 
CHOOSE
One of the hints that 'invokes' the Cost based optimizer 
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS,
based on statistics gathered.
HASH
Hashes one table (full scan) and creates a hash index for that table. Then
hashes other table and uses hash index to find corresponding records.
Therefore not suitable for < or > join conditions.
/*+ use_hash */

Hints are most useful to optimize the query performance.


Posted by Sandeep Reddy at 5:28 AM No comments: 
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: FAQ'S, Oracle

What is the difference between sub-query & co-related sub query?


A sub query is executed once for the parent statement
whereas the correlated sub query is executed once for each
row of the parent query.
Sub Query:
Example:
Select deptno, ename, sal from emp a  where sal  in (select sal from Grade 
where sal_grade=’A’ or  sal_grade=’B’)
Co-Related Sun query:
Example:
Find all employees who earn more than the average salary in their
department.
SELECT last-named, salary, department_id  FROM employees A
WHERE salary > (SELECT AVG (salary)
FROM employees B WHERE B.department_id =A.department_id
Group by B.department_id)

Sub-query Co-related sub-query


A sub-query is executed once for Where as co-related sub-query is
the parent Query executed once for each row of the
parent query.
Example: Example:
Select * from emp where deptno Select a.* from emp e where sal
in (select deptno from dept); >= (select avg(sal) from emp a
where a.deptno=e.deptno group
by  a.deptno);
Posted by Sandeep Reddy at 5:27 AM No comments: 
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: FAQ'S, Oracle

View & Materialized View & Inline View


View:

Why Use Views?


To restrict data access
To make complex queries easy
To provide data independence
A simple view is one that:
Derives data from only one table
Contains no functions or groups of data
Can perform DML operations through the view.

A complex view is one that:


Derives data from many tables
Contains functions or groups of data
Does not always allow DML operations through the
view

A view has a logical existence but a materialized view has


a physical existence.Moreover a materialized view can be
Indexed, analysed and so on....that is all the things that
we can do with a table can also be done with a materialized
view.

We can keep aggregated data into materialized view. we can schedule the MV
to refresh but table can’t.MV can be created based on multiple tables.

Materialized View:

In DWH materialized views are very essential because in reporting side if we


do aggregate calculations as per the business requirement report
performance would be de graded. So to improve report performance rather
than doing report calculations and joins at reporting side if we put same logic
in the MV then we can directly select the data from MV without any joins and
aggregations. We can also schedule MV (Materialize View).
Inline view:

If we write a select statement in from clause that is nothing but inline view.

Ex:
Get dept wise max sal along with empname and emp no.

Select a.empname, a.empno, b.sal, b.deptno


From EMP a, (Select max (sal) sal, deptno from EMP group by deptno) b
Where
a.sal=b.sal and

a.deptno=b.deptno
Posted by Sandeep Reddy at 5:26 AM No comments: 
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: FAQ'S, Oracle
Differences between where clause and having clause

Where clause Having clause


Both where and having clause can be used to filter the data.
Where as in where clause it is not But having clause we need to use it
mandatory. with the group by.
Where clause applies to the Where as having clause is used to
individual rows. test some condition on the group
rather than on individual rows.
Where clause is used to restrict But having clause is used to restrict
rows. groups.
Restrict normal query by where Restrict group by function by having
In where clause every record is In having clause it is with aggregate
filtered based on where. records (group by functions).

Order of where and having:


SELECT column, group_function
FROM table
[WHERE condition]
[GROUP BY group_by_expression]
[HAVING group_condition]
[ORDER BY column];

The WHERE clause cannot be used to restrict groups. you use the
HAVING clause to restrict groups.

Data Transformation Test


Data transformation test:

1.It’s a process of converting,cleansing,scrubbing,merging the data into


required business format.

2.Validating that the data is transformed correctly or not,based on the


business rules/business requirements,
Business rules/requirements validating can be the most complex and
important part of ETL testing. 
3.An ETL application(i.e.Informatica,Datastage..etc) with significant
transformation logic between source and target,where test should be make
sure that the datatype of each column of each table is as per the functional
and mapping specifications,If no specific details are mentioned in functional
and mapping specifications about the tables/schema's then test should be
make sure on the below concerns:

a)The datatype of source column and destination column(Target column) are


same or not.

b)The destination column length is equal to or greater than the source


column length.

c)Validation should be done that all the data specified gets extracted.

d)Test should include the check to see that the transformation and cleansing
process are working correctly.

e)Make sure that all the types of data transformations are working and
meeting the FS/MS and Business requirements/rules.

The following types of data transformation makes place in staging


Data Cleansing 
Data Scrubbing  
Data Aggregation  
Data Merging 

Posted by Sandeep Reddy at 9:30 PM No comments: 


Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest

Data Completeness Test

1.Data completeness test are designed to verify that all the expected data
loads into the DWH.

2.It includes running detailed tests to verify that all records,all fields and full
contents of each field are loaded correctly or not.

3.Strategies to consider includes:


a)Record counts must compared between source and the target data.

b)Comparing record counts between source and data loaded to the


warehouse (Target) and also rejected records in warehouse (Target).

c)Comparing unique values of key fields between source data and data
loaded to the warehouse (Target) column mapping from the source or stage.

4.Populating the full contents of each field to validate that have no truncation
occurs at any step in the process for example if the source data fields is
having string(30) and make sure that to test it with 30 characters.
Posted by Sandeep Reddy at 4:30 AM No comments: 
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest

Tuesday, August 27, 2013

Test Strategy for ETL Testing / Standard Tests for ETL Testing
There will be some standard tests for DWH that should be carried out as part
of testing for every DWH Project.

These are Strategies for testing ETL Applications are Identified as below:

Data Completeness Testing


Data Transformation Testing
Data Quality Testing
Initial Load / Full Load Testing
Incremental Load Testing
Presentation Layer Testing /  BI Testing / Report Testing
Integration Testing / System Integration Testing / SIT
Load and Performance Testing
UAT Testing / User Acceptance Testing
Regression Testing

High Level Description

Data Completeness Testing             -  


Ensures that all expected data is loaded

Data Transformation Testing            - 


Ensures that all data is transformed correctly according to business rules

Data Quality Testing -


Ensure that the etl applications correctly rejects,substitutes the default values
and reports invalid data

Initial Load / Full Load Testing -


Ensures that all the very first time data loaded correctly and als ensure the
truncating process

Incremental Load Testing -


Ensures that after first load all data is getting updating maintaining
versioning and inserting new records  properly

Presentation Layer Testing /  BI Testing / Report Testing  -


Testing BI Reports in DWH testing and comparing data correctness from DWH
data and Reports

Integration Testing / System Integration Testing / SIT  -


Ensure that the ETL process functions with other upstream and downstream
process

Load and Performance Testing -


Ensure that the data loads and queries perform within expected Timeframes
UAT Testing / User Acceptance Testing  -
Ensure that solution  and current expectations and anticipates full
expectations

Regression Testing -
Ensures the new data updates have not broken any existing functionality or
process.

Posted by Sandeep Reddy at 2:53 AM 1 comment: 


Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest

Sunday, May 26, 2013

ETL Testing Online Training & Course Content

ETL Testing Course Content by

Sandeep Manem  @ +91-9885320101, +91-8237320101

DataWare Housing Concepts:


What is Data Ware House?
Difference between OLTP and DataWare
Housing
Data Acquisition
Data Extraction
Data Transformation
Data Loading
Data Marts
Dependent Data Mart
Independent Data Mart
Data Base Design
Star Schema
Snow Flake Schema
Fact constellation Schema
SCD(slowly changing dimension)
Type-1 SCD
Type-2 SCD
Type-3 SCD
Basic Concepts in SQL
Overview of ETL Tool Architecture
White Box and Black BOX Testing
Functionality on Different Transformation
Rules
Data Ware House Life Cycle
Different Types of Testing Techniques in ETL
Minus Queing
Count  Queing
ETL Testing Concepts
1.Introduction
What is use of testing
What is quality & standards
Responsibilities of a ETL Tester
2.Software development life cycle
Waterfall model
V-model
Agile model & methodology
Prototype model
Spiral model
3.Testing methodologies
White box testing
Black box testing
Grey box testing
ETL Testing Work Flow Process
How to Prepare the  ETL Test Plan
How to design the Test cases in ETL Testing.
How to reporting the Bugs in ETL Testing ?
ETL Testing Responsibilities in DataStage,
Informatica, Abinitio etc;
How to detect the bugs through database
queries
ETL Performing Testing & Performing Tuning
Projects
Projects on Different Domains(Banking , Health Care, Telecom , Insurance)

OR

ETL Testing Course Content by

Sandeep Manem  @ +91-9885320101, +91-8237320101

Oracle Basics & Concepts


DBMS
RDBMS
DBMS vs RDBMS
Why Database Required
Different types of databases and difference
ASCII vs UNICODE
PL/SQL Basics
Oracle Architecture
Diff B/W Database & Files
OLTP
OLAP, ROLAP, MOLAP
METADATA
DDL,DML,DCL
BASIC ADMIN ACTIVITIES
DATATYPES
TABLES
SQL ,SUB QUERIES,CORELATED SUB QUERY
INNER QUERY,OUTER QUERY
FUNCTIONS AND TYPES AND IMPORTANCE
JOINS AND DIFFERENT TYPES
VIEWS n MATERIALIZED VIEWS
INDEX
CONSTRAINTS
REFERENTIAL INTEGRITY
PARTITIONING
PERFORMANCE TUNING
DIFFERENT TYPES OF TECHNIQES
DATABASE SCHEMA
DWH Concepts
WHAT IS DWH ?WHY WE REQUIRE THAT ?
DWH ARCHITECTURE
DATA MART and TYPES
DM vs DWH
DATA CLEANZING
DATA SCRUBING
DATA MASKING
NORMALIZATION
ODS
STG AREA
DSS
Diff B/w OLTP vs ODS,OLAP vs DSS
DIMENTION MODELING
DIMENSIONS
FACTS
AGGREGATES
DWH SCHEMA designing
STAR SCHEMA,SNOWFLAKE SCHEMA,GALAXY SCHEMA,FCS
SLOWLY CHANGING DIMENSIONS
SCD TYPE1,TYPE2,TYPE3
INITIAL LOAD
INCREMENTAL LOAD
FULL LOAD
CDC- change data capture
FAQ’S
ETL Testing Concepts
Introduction of ETL-Extract,Transform,Load
ETL TOOLS and Diff Types of ETL Tools
ETL ARCHITECTURE
ETL TESTING AND WHY WE REQUIRE
DIFFERENT ETL TOOLS ARCHITECTURES
SDLC and Methods/Models
STLC and Methods/Models
SDLC vs STLC
Reverse Engineering
QC(Quality Center and BugZilla)
Roles and Responsibilities
Minus,Duplicate,Count,Intersection,etc…
Detect Defects
Defects Logging and Reporting
How to prepare Queries very quickly with the help of mapping
Performance Tuning and Performance Testing,Report Testing,UI Testing
Quality and different standards that tester should follow,Why?
Testplan Preparation
Testcases Preparation
Preparation of Test data
Process Of ETL Testing
Testing Concepts
Whitebox Testing
Blackbox Testing
Gray Box Testing
Regression Testing
Smoke Testing vs Sanity Testing
User Testing
Unit testing
Intigration testing
Module testing
System testing
UAT
ETL Tool and Testing
Data Extract
Data Transform
Data Load
Import Source
Import Target
Mappings,Maplets
Workflows,Worklets
Transformations,Functionalities,Rules and Techniques
Import and Export
Coping and Rules
Queries Preparation based on Transformations
Importance of ETL Testing
Creating of Mappings,Sessions,Workflows
Running of Mappings,Sessions,Workflows
Analyzing of Mappings,Sessions,Workflows
Tasks and Types
Practice:
Testing scenarios, creation of test cases and scripts
Test case execution and defect tracking and reporting
Preparation of Test data
Practice ETL Testing with Real Time Scenarios and FAQ’s
Resume preparation.

Posted by Sandeep Reddy at 11:56 PM No comments: 


Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest

ETL Testing Use Cases & Benefits


ETL Testing

ETL Testing in Less Time, With Greater Coverage, to Deliver Trusted Data
Much ETL testing today is done by SQL scripting or “eyeballing” of data on
spreadsheets. These approaches to ETL testing are very time-consuming,
error-prone, and seldom provide complete test coverage. Informatica Data
Validation Option provides an ETL testing tool that can accelerate and
automate ETL testing in both production environments and development &
test. This means that you can deliver complete, repeatable and auditable test
coverage in less time with no programming skills required. 

ETL Testing Use Cases


Production Validation Testing (testing data before moving into
production). Sometimes called “table balancing” or “production
reconciliation,” this type of ETL testing is done on data as it is being moved
into production systems. The data in your production systems has to be right
in order to support your business decision making.  Informatica Data
Validation Option provides the ETL testing automation and management
capabilities to ensure that your production systems are not compromised by
the data update process.
Source to Target Testing (data is transformed). This type of ETL testing
validates that the data values after a transformation are the expected data
values. The Informatica Data Validation Option has a large set of pre-built
operators to build this type of ETL testing with no programming skills
required.
Application Upgrades (same-to-same ETL testing). This type of ETL testing
validates that the data coming from an older application or repository is
exactly the same as the data in the new application or repository. Must of
this type of ETL testing can be automatically generated, saving substantial
test development time.
Benefits of ETL Testing with Data Validation Option
Production Reconciliation. Informatica Data Validation Option provides
automation and visibility for ETL testing, to ensure that you deliver trusted
data in your production system updates.
IT Developer Productivity.  50% to 90% less time and resources required
to do ETL testing
Data Integrity.  Comprehensive ETL testing coverage means lower business
risk and greater confidence in the data.

Posted by Sandeep Reddy at 11:52 PM No comments: 


Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest

Wednesday, January 30, 2013

DBMS
A database is a collection of occurrence of multiple record types containing
the relationship between records, data aggregate and data items. A database
may be defined as
      A database is a collection of interrelated data store together without
harmful and unnecessary redundancy (duplicate data) to serve multiple
applications
      The data is stored so that they are independent of programs, which use
the data. A common and control approach is used in adding the new data,
modifying and retrieving existing data or deletion of data within the database
A running database has function in a corporation, factory, government
department and other organization. Database is used for searching the data
to answer some queries. A database may be design for batch processing, real
time processing or on line processing.

DATABASE SYSTEM
      Database System is an integrated collection of related files along with the
detail about their definition, interpretation, manipulation and maintenance. It
is a system, which satisfied the data need for various applications in an
organization without unnecessary redundancy. A database system is based
on the data. Also a database system can be run or executed by using
software called DBMS (Database Management System). A database system
controls the data from unauthorized access.

Foundation Data Concept:


A hierarchy of several levels of data has been devised that differentiates
between different groupings, or elements, of data. Data are logically
organized into:

Character
      It is the most basic logical data element. It consists of a single
alphabetic, numeric, or other symbol.

Field
      It consists of a grouping of characters. A data field represents an
attribute (a characteristic or quality) of some entity (object, person, place, or
event).

Record
      The related fields of data are grouped to form a record. Thus, a record
represents a collection of attributes that describe an entity. Fixed-length
records contain, a fixed number of fixed-length data fields. Variable-length
records contain a variable number of fields and field lengths.

File
      A group of related records is known as a data file, or table. Files are
frequently classified by the application for which they ar primarily used, such
as a payroll file or an inventory file, or the type of data they contain, such as
a document file or a graphical image file. Files are also classified by their
permanence, for example, a master file versus a transaction file. A
transaction file would contain records of all transactions occurring during a
period, whereas a master file contains all the permanent records. A history
file is an obsolete transaction or master file retained for backup purposes or
for long-term historical storage called archival storage.

Database

      It is an integrated collection of logically related records or objects. A


database consolidates records previously stored in separate files into a
common pool of data records that provides data for many applications. The
data stored in a database is independent of the application programs using it
and o the ‘type of secondary storage devices on which it is stored.
Posted by Sandeep Reddy at 5:22 AM 2 comments: 
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest

ETL Testing Online Training Course Content

We need to aware of below topics

Oracle Basics & Concepts


1. DBMS
2. RDBMS
3. DBMS vs RDBMS
4. Why Database Required
5. Different types of databases and difference
6. ASCII vs UNICODE
7. PL/SQL Basics
8. Oracle Architecture
9. Diff B/W Database & Files
10. OLTP
11. OLAP, ROLAP, MOLAP
12. METADATA
13. DDL,DML,DCL
14. BASIC ADMIN ACTIVITIES
15. DATATYPES
16. TABLES
17. SQL ,SUB QUERIES,CORELATED SUB QUERY
18. INNER QUERY,OUTER QUERY
19. FUNCTIONS AND TYPES AND IMPORTANCE
20. JOINS AND DIFFERENT TYPES
21. VIEWS n MATERIALIZED VIEWS
22. INDEX
23. CONSTRAINTS
24. REFERENTIAL INTEGRITY
25. PARTITIONING
26. PERFORMANCE TUNING
27. DIFFERENT TYPES OF TECHNIQES
28. DATABASE SCHEMA

DWH Concepts
1. WHAT IS DWH ?WHY WE REQUIRE THAT ?
2. DWH ARCHITECTURE
3. DATA MART and TYPES
4. DM vs DWH
5. DATA CLEANZING
6. DATA SCRUBING
7. DATA MASKING
8. NORMALIZATION
9. ODS
10. STG AREA
11. DSS

ETL Testing Course Contents

12. Diff B/w OLTP vs ODS,OLAP vs DSS


13. DIMENTION MODELING
14. DIMENSIONS
15. FACTS
16. AGGREGATES
17. DWH SCHEMA designing
18. STAR SCHEMA,SNOWFLAKE SCHEMA,GALAXY SCHEMA,FCS
19. SLOWLY CHANGING DIMENSIONS
20. SCD TYPE1,TYPE2,TYPE3
21. INITIAL LOAD
22. INCREMENTAL LOAD
23. FULL LOAD
24. CDC- change data capture
25. FAQ’S

ETL Testing Concepts


1. Introduction of ETL-Extract,Transform,Load
2. ETL TOOLS and Diff Types of ETL Tools
3. ETL ARCHITECTURE
4. ETL TESTING AND WHY WE REQUIRE
5. DIFFERENT ETL TOOLS ARCHITECTURES
6. SDLC and Methods/Models
7. STLC and Methods/Models
8. SDLC vs STLC
9. Reverse Engineering
10. QC(Quality Center and BugZilla)
11. Roles and Responsibilities
12. Minus,Duplicate,Count,Intersection,etc…
13. Detect Defects
14. Defects Logging and Reporting
15. How to prepare Queries very quickly with the help of mapping
16. Performance Tuning and Performance Testing,Report Testing,UI Testing
17. Quality and different standards that tester should follow,Why?
18. Testplan Preparation
19. Testcases Preparation
20. Preparation of Test data
21. Process Of ETL Testing

Testing Concepts
1. Whitebox Testing
2. Blackbox Testing
3. Gray Box Testing
4. Regression Testing
5. Smoke Testing vs Sanity Testing
6. User Testing
7. Unit testing
8. Intigration testing
9. Module testing
10. System testing
11. UAT

ETL Tool and Testing

1. Data Extract
2. Data Transform
3. Data Load
4. Import Source
5. Import Target
6. Mappings,Maplets
7. Workflows,Worklets
8. Transformations,Functionalities,Rules and Techniques
9. Import and Export
10. Coping and Rules
11. Queries Preparation based on Transformations
12. Importance of ETL Testing
13. Creating of Mappings,Sessions,Workflows
14. Running of Mappings,Sessions,Workflows
15. Analyzing of Mappings,Sessions,Workflows
16. Tasks and Types

Practice:

Testing scenarios, creation of test cases and scripts


Test case execution and defect tracking and reporting
Preparation of Test data
Practice ETL Testing with Real Time Scenarios and FAQ’s

Check the below link

The best Online Training for ETL Testing is as follows

 Call:  Sandeep Manem


          +91-8237320101
          +91-9885320101
Trained around 100+ students and placed around 90+ in Top MNC's with
90% success rate.

Posted by Sandeep Reddy at 5:18 AM 16 comments: 


Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest

ETL Testing Challenges


ETL testing is quite different from conventional testing. There are many
challenges we faced while performing data warehouse testing. Here is the list
of few ETL testing challenges I experienced on my project:
- Incompatible and duplicate data.
- Loss of data during ETL process.
- Unavailability of inclusive test bed.
- Testers have no privileges to execute ETL jobs by their own.
- Volume and complexity of data is very huge.
- Fault in business process and procedures.
- Trouble acquiring and building test data.
- Missing business flow information.

Data is important for businesses to make the critical business decisions. ETL
testing plays a significant role validating and ensuring that the business
information is exact, consistent and reliable. Also, it minimizes hazard of data
loss in production.

Hope these tips will help ensure your ETL process is accurate and the data
warehouse build by this is a competitive advantage for your business.

Posted by Sandeep Reddy at 5:11 AM No comments: 


Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest

Difference between Database and Data Warehouse Testing

There is a popular misunderstanding that database testing and data


warehouse is similar while the fact is that both hold different direction in
testing.

 Database testing is done using smaller scale of data normally with OLTP
(Online transaction processing) type of databases while data warehouse
testing is done with large volume with data involving OLAP (online analytical
processing) databases.
 In database testing normally data is consistently injected from uniform
sources while in data warehouse testing most of the data comes from
different kind of data sources which are sequentially inconsistent.
We generally perform only CRUD (Create, read, update and delete) operation
in database testing while in data warehouse testing we use read-only (Select)
operation.
Normalized databases are used in DB testing while demoralized DB is used in
data warehouse testing.
There are number of universal verifications that have to be carried out for
any kind of data warehouse testing. Below is the list of objects that are
treated as essential for validation in ETL testing:
- Verify that data transformation from source to destination works as
expected
- Verify that expected data is added in target system
- Verify that all DB fields and field data is loaded without any truncation
- Verify data checksum for record count match
- Verify that for rejected data proper error logs are generated with all details
- Verify NULL value fields
- Verify that duplicate data is not loaded
- Verify data integrity

Posted by Sandeep Reddy at 5:10 AM 3 comments: 


Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest

Saturday, January 26, 2013

ETL Testing Process

ETL Testing Process:


Similar to any other testing that lies under Independent Verification and
Validation, ETL also go through the same phase.

Business and requirement understanding


Validating
Test Estimation
Test planning based on the inputs from test estimation and business
requirement
Designing test cases and test scenarios from all the available inputs
Once all the test cases are ready and are approved, testing team proceed to
perform pre-execution check and test data preparation for testing
Lastly execution is performed till exit criteria are met
Upon successful completion summary report is prepared and closure process
is done.
It is necessary to define test strategy which should be mutually accepted by
stakeholders before starting actual testing. A well defined test strategy will
make sure that correct approach has been followed meeting the testing
aspiration. ETL testing might require writing SQL statements extensively by
testing team or may be tailoring the SQL provided by development team. In
any case testing team must be aware of the results they are trying to get
using those SQL statements.

Posted by Sandeep Reddy at 4:08 AM No comments: 


Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest

ETL Testing Techniques

ETL Testing Techniques:


Verify that data is transformed correctly according to various
business requirements and rules.
2) Make sure that all projected data is loaded into the data
warehouse without any data loss and truncation.
3) Make sure that ETL application appropriately rejects, replaces
with default values and reports invalid data.
4) Make sure that data is loaded in data warehouse within
prescribed and expected time frames to confirm improved
performance and scalability.

Apart from these 4 main ETL testing methods other testing methods like
integration testing and user acceptance testing is also carried out to make
sure everything is smooth and reliable.

Posted by Sandeep Reddy at 4:07 AM No comments: 


Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest

why ETL Testing require for Data Warehouse

Why do organizations need Data Warehouse?


Organizations with organized IT practices are looking forward to create a next
level of technology transformation. They are now trying to make themselves
much more operational with easy-to-interoperate data. Having said that data
is most important part of any organization, it may be everyday data or
historical data. Data is backbone of any report and reports are the baseline
on which all the vital management decisions are taken.

Most of the companies are taking a step forward for constructing their data
warehouse to store and monitor real time data as well as historical data.
Crafting an efficient data warehouse is not an easy job. Many organizations
have distributed departments with different applications running on
distributed technology. ETL tool is employed in order to make a flawless
integration between different data sources from different departments. ETL
tool will work as an integrator, extracting data from different sources;
transforming it in preferred format based on the business transformation
rules and loading it in cohesive DB known are Data Warehouse.

Well planned, well defined and effective testing scope guarantees smooth
conversion of the project to the production. A business gains the real
buoyancy once the ETL processes are verified and validated by independent
group of experts to make sure that data warehouse is concrete and robust.

ETL or Data warehouse testing is categorized into four different


engagements irrespective of technology or ETL tools used:

New Data Warehouse Testing – New DW is built and verified from scratch.
Data input is taken from customer requirements and different data sources
and new data warehouse is build and verified with the help of ETL tools.
Migration Testing – In this type of project customer will have an existing
DW and ETL performing the job but they are looking to bag new tool in order
to improve efficiency.
Change Request – In this type of project new data is added from different
sources to an existing DW. Also, there might be a condition where customer
needs to change their existing business rule or they might integrate the new
rule.
Report Testing – Report are the end result of any Data Warehouse and the
basic propose for which DW is build. Report must be tested by validating
layout, data in the report and calculation.

Data Completeness Test

1.Data completeness test are designed to verify that all the expected data
loads into the DWH.

2.It includes running detailed tests to verify that all records,all fields and full
contents of each field are loaded correctly or not.

3.Strategies to consider includes:


a)Record counts must compared between source and the target data.

b)Comparing record counts between source and data loaded to the


warehouse (Target) and also rejected records in warehouse (Target).

c)Comparing unique values of key fields between source data and data
loaded to the warehouse (Target) column mapping from the source or stage.

4.Populating the full contents of each field to validate that have no truncation
occurs at any step in the process for example if the source data fields is
having string(30) and make sure that to test it with 30 characters.

Data Transformation Test


Data transformation test:

1.It’s a process of converting,cleansing,scrubbing,merging the data into required


business format.

2.Validating that the data is transformed correctly or not,based on the business


rules/business requirements,
Business rules/requirements validating can be the most complex and important
part of ETL testing. 

3.An ETL application(i.e.Informatica,Datastage..etc) with significant


transformation logic between source and target,where test should be make sure
that the datatype of each column of each table is as per the functional and
mapping specifications,If no specific details are mentioned in functional and
mapping specifications about the tables/schema's then test should be make sure
on the below concerns:

a)The datatype of source column and destination column(Target column) are


same or not.

b)The destination column length is equal to or greater than the source column
length.

c)Validation should be done that all the data specified gets extracted.
d)Test should include the check to see that the transformation and cleansing
process are working correctly.

e)Make sure that all the types of data transformations are working and meeting
the FS/MS and Business requirements/rules.

The following types of data transformation makes place in staging


1. Data Cleansing 
2. Data Scrubbing  
3. Data Aggregation  
4. Data Merging 

Test Strategy for ETL Testing / Standard Tests for ETL Testing
There will be some standard tests for DWH that should be carried out as part of
testing for every DWH Project.

These are Strategies for testing ETL Applications are Identified as below:

1)      Data Completeness Testing


2)      Data Transformation Testing
3)      Data Quality Testing
4)      Initial Load / Full Load Testing
5)      Incremental Load Testing
6)      Presentation Layer Testing /  BI Testing / Report Testing
7)      Integration Testing / System Integration Testing / SIT
8)      Load and Performance Testing
9)      UAT Testing / User Acceptance Testing
10)   Regression Testing

                     High Level Description

    Data Completeness Testing             -  


Ensures that all expected data is loaded

       Data Transformation Testing            - 


 Ensures that all data is transformed correctly according to business rules

Data Quality Testing -


Ensure that the etl applications correctly rejects,substitutes the default values
and reports invalid data

 Initial Load / Full Load Testing -


       Ensures that all the very first time data loaded correctly and als ensure the
truncating process

Incremental Load Testing -


       Ensures that after first load all data is getting updating maintaining versioning
and inserting new records  properly

  Presentation Layer Testing /  BI Testing / Report Testing  -


Testing BI Reports in DWH testing and comparing data correctness from DWH
data and Reports

Integration Testing / System Integration Testing / SIT  -


Ensure that the ETL process functions with other upstream and downstream
process

 Load and Performance Testing -


Ensure that the data loads and queries perform within expected Timeframes

 UAT Testing / User Acceptance Testing  -


Ensure that solution  and current expectations and anticipates full expectations

Regression Testing -
Ensures the new data updates have not broken any existing functionality or
process.

What is BI?
Business Intelligence refers to a set of methods and techniques that are used by
organizations for tactical and strategic decision making. It leverages methods
and technologies that focus on counts, statistics and business objectives to
improve business performance.
The objective of Business Intelligence is to better understand customers and
improve customer service, make the supply and distribution chain more efficient,
and to identify and address business problems and opportunities quickly.
Warehouse is used for high level data analysis purpose. It
is used for predictions, time series analysis, financial
Analysis, what -if simulations etc. Basically it is used

for better decision making.

What is a Data Warehouse(DWH)?


Data Warehouse is a "Subject-Oriented, Integrated, Time-Variant Nonvolatile
collection of data in support of decision making".
   In terms of design data warehouse and data mart are almost the same.
In general a Data Warehouse is used on an enterprise level and a Data Marts is
used on a business division/department level.  
1.1.                     Subject Oriented
Data that gives information about a particular subject instead of about a
company's ongoing operations.
1.2.                     Integrated
Data that is gathered into the data warehouse from a variety of sources and
merged into a coherent whole.
1.3.                     Time-variant
All data in the data warehouse is identified with a particular time period.
1.4.                     Non-volatile
Data is stable in a data warehouse. More data is added but data is never
removed. 

What is a DataMart(DM)?
Datamart is usually sponsored at the department level and developed with a
specific  details or subject in mind, a Data Mart is a  subset of data warehouse
with a focused objective.

What is the difference between a data warehouse and a data mart?

 In terms of design data warehouse and data mart are almost the same.
In general a Data Warehouse is used on an enterprise level and a Data Marts is
used on a business division/department level.
A data mart only contains data specific to a particular subject areas.  

1.1.                     Difference between data mart and data warehouse

Data Mart Data Warehouse


Data mart is usually sponsored at the Data warehouse is a “Subject-Oriented,
department level and developed with a Integrated, Time-Variant, Nonvolatile
specific issue or subject in mind, a data collection of data in support of decision
mart is a data warehouse with a making”.
focused objective.
A data mart is used on a business A data warehouse is used on an
division/ department level. enterprise level
A Data Mart is a subset of data from a A Data Warehouse is simply an
Data Warehouse. Data Marts are built integrated consolidation of data from a
for specific user groups. variety of sources that is specially
designed to support strategic and
tactical decision making.
By providing decision makers with only The main objective of Data Warehouse
a subset of data from the Data is to provide an integrated
Warehouse, Privacy, Performance and environment and coherent picture of
Clarity Objectives can be attained. the business at a point in time.

What are the most important features of a data warehouse?


   DRILL DOWN, DRILL ACROSS, Graphs, PI charts, dashboards and TIME
HANDLING
To be able to drill down/drill across is the most basic requirement of an end user
in a data warehouse. Drilling down most directly addresses the natural end-user
need to see more detail in an result. Drill down should be as generic as possible
becuase there is absolutely no good way to predict users drill-down path.

What is a Schema? & Types of Schemas?


Graphical Representation of the datastructure.
First Phase in implementation of Universe        

There are three types of schema 


Out of three schema's two schemas will be used in real ime and one more is for
knowledge purpose
1.Star Schema          -Real Time Usage
2.Snowflake Schema   -Real Time Usage
3.Fact Constellation Schema -Knowledge Purpose

  What is a star schema? 


   Star schema is a data warehouse schema where there is only one  "fact table"
and many denormalized dimension tables.
Fact table contains primary keys from all the dimension tables and other
numeric columns columns of additive, numeric facts.

     

 What is a snowflake schema? 

   Unlike Star-Schema, Snowflake  schema contain normalized dimension tables


in a tree like structure with many nesting levels.
Snowflake schema is easier to maintain but queries require more joins.

What is a Fact Constellation/Galaxy schema? 

Fact constellation is also known as galaxy schema. It is nothing but a schema


which contains multiple fact tables shares dimensions. It is a collection of star
schemas which shares their dimension. So it is called as a galaxy schema.
Or
We can also say that the combination of both star and snowflake is nothing but
galaxy schema or fact constellation
Interview FAQ's:
What does it mean by grain of the star schema? 
   In Data warehousing grain refers to the level of detail available in a given fact
table as well as to the level of detail provided by a star schema.
It is usually given as the number of records per key within the table. In general,
the grain of the fact table is the grain of the star schema.       

What is the difference between snow flake and star schema

Star Schema Snow Flake Schema


The star schema is the simplest data Snowflake schema is a more complex
warehouse scheme. data warehouse model than a star
schema.
In star schema each of the dimensions In snow flake schema at least one
is represented in a single table .It hierarchy should exists between
should not have any hierarchies dimension tables.
between dims.
It contains a fact table surrounded by It contains a fact table surrounded by
dimension tables. If the dimensions are dimension tables. If a dimension is
de-normalized, we say it is a star normalized, we say it is a snow flaked
schema design. design.
In star schema only one join In snow flake schema since there is
establishes the relationship between relationship between the dimensions
the fact table and any one of the tables it has to do many joins to fetch
dimension tables. the data.
A star schema optimizes the Snowflake schemas normalize
performance by keeping queries simple dimensions to eliminated redundancy.
and providing fast response time. All The result is more complex queries and
the information about the each level is reduced query performance.
stored in one row.
It is called a star schema because the It is called a snowflake schema
diagram resembles a star. because the diagram resembles a
snowflake.

What is Fact and Dimension?

A "fact" is a numeric value that a business wishes to count or sum.  


A "dimension" is essentially an entry point for getting at the facts. Dimensions
are things of interest to the business.

A set of level properties that describe a specific aspect of a business, used for
analyzing the factual measures.

What is Fact Table?


A Fact Table in a dimensional model consists of one or more numeric facts of
importance to a business.  Examples of facts are as follows:
        the number of products sold
        the value of products sold
        the number of products produced
the number of service calls received

What is Factless Fact Table?


Factless fact table captures the many-to-many relationships between
dimensions, but contains no numeric or textual facts. They are often used to
record events or coverage information.
Common examples of factless fact tables include:
 Identifying product promotion events (to determine promoted products
that didn't sell)
 Tracking student attendance or registration events
 Tracking insurance-related accident events 

Types of facts?

 Additive: Additive facts are facts that can be summed up through all of
the dimensions in the fact table.
 Semi-Additive: Semi-additive facts are facts that can be summed up for
some of the dimensions in the fact table, but not the others.
 Non-Additive: Non-additive facts are facts that cannot be summed up for
any of the dimensions present in the fact table.
 Cumulative: This type of fact table describes what has happened over a
period of time. For example, this fact table may describe the total sales by
product by store by day. The facts for this type of fact tables are mostly additive
facts. The first example presented here is a cumulative fact table.
 Snapshot: This type of fact table describes the state of things in a
particular instance of time, and usually includes more semi-additive and non-
additive facts. The second example presented here is a snapshot fact table.
Fact Table Example:
Time ID Product ID Customer ID Unit
Sold

1 17 2 1

3 21 3 2

1 4 1 1

What is Dimension Table?

Dimension tables contain details about each instance of an object. For example,
the items dimension table would contain a record for each item sold in the store.
It might include information such as the cost of the item, the supplier, color,
sizes, and similar data. 

 Types of Dimensions?

 SCD(Slowly Changing Dimension)


 Conformed Dimension
 Junk Dimension/Dirty Dimension
 De-Generated Dimension
 Bridge Dimension

1.                       What is Conformed Dimension?


Conformed Dimensions (CD): these dimensions are something that is built once
in your model and can be reused multiple times with different fact tables.   For
example, consider a model containing multiple fact tables, representing different
data marts.  Now look for a dimension that is common to these facts tables.  In
this example let’s consider that the product dimension is common and hence can
be reused by creating short cuts and joining the different fact tables.Some of the
examples are time dimension, customer dimensions, product dimension.
2.                       What is Junk Dimension?
A "junk" dimension is a collection of random transactional codes, flags and/or
text attributes that are unrelated to any particular dimension. The junk
dimension is simply a structure that provides a convenient place to store the
junk attributes. A good example would be a trade fact in a company that brokers
equity trades.
When you consolidate lots of small dimensions and instead of having 100s of
small dimensions, that will have few records in them, cluttering your database
with these mini ‘identifier’ tables, all records from all these small dimension
tables are loaded into ONE dimension table and we call this dimension table Junk
dimension table.  (Since we are storing all the junk in this one table) For
example: a company might have handful of manufacture plants, handful of order
types, and so on, so forth, and we can consolidate them in one dimension table
called junked dimension table
It’s a dimension table which is used to keep junk attributes
3.                       What is De Generated Dimension?
An item that is in the fact table but is stripped off of its description, because the
description belongs in dimension table, is referred to as Degenerated
Dimension.  Since it looks like dimension, but is really in fact table and has been
degenerated of its description, hence is called degenerated dimension..
Degenerated Dimension: a dimension which is located in fact table known as
Degenerated dimension
4.                       What is slowly Changing Dimension? 
   Slowly changing dimensions refers to the change in dimensional attributes
over time.
An example of slowly changing dimension is a Resource dimension where
attributes of a particular employee  change over time like their designation
changes or dept changes etc.

Types of SCD Implementation:


Type 1 Slowly Changing Dimension
In Type 1 Slowly Changing Dimension, the new information simply overwrites
the original information. In other words, no history is kept.
In our example, recall we originally have the following table:
Customer Key Name State
1001 Christina Illinois
After Christina moved from Illinois to California, the new information replaces
the new record, and we have the following table:
Customer Key Name State
1001 Christina California
Advantages:
- This is the easiest way to handle the Slowly Changing Dimension problem,
since there is no need to keep track of the old information.
Disadvantages:
-      All history is lost. By applying this methodology, it is not possible to trace back
in history. For example, in this case, the company would not be able to know
that Christina lived in Illinois before.
-      Usage:
About 50% of the time.
When to use Type 1:
Type 1 slowly changing dimension should be used when it is not necessary for
the data warehouse to keep track of historical changes.
Type 2 Slowly Changing Dimension
In Type 2 Slowly Changing Dimension, a new record is added to the table to
represent the new information. Therefore, both the original and the new record
will be present. The newe record gets its own primary key.
In our example, recall we originally have the following table:
Customer Key Name State
1001 Christina Illinois
After Christina moved from Illinois to California, we add the new information as
a new row into the table:
Customer Key Name State
1001 Christina Illinois
1005 Christina California
Advantages:
- This allows us to accurately keep all historical information.
Disadvantages:
- This will cause the size of the table to grow fast. In cases where the number of
rows for the table is very high to start with, storage and performance can
become a concern.
- This necessarily complicates the ETL process.
Usage:
About 50% of the time.
When to use Type 2:
Type 2 slowly changing dimension should be used when it is necessary for the
data warehouse to track historical changes.
Type 3 Slowly Changing Dimension 
In Type 3 Slowly Changing Dimension, there will be two columns to indicate the
particular attribute of interest, one indicating the original value, and one
indicating the current value. There will also be a column that indicates when the
current value becomes active.
In our example, recall we originally have the following table:
Customer Key Name State
1001 Christina Illinois
To accommodate Type 3 Slowly Changing Dimension, we will now have the
following columns:
 Customer Key
 Name
 Original State
 Current State
 Effective Date
After Christina moved from Illinois to California, the original information gets
updated, and we have the following table (assuming the effective date of change
is January 15, 2003):
Customer Key Name Original State Current State Effective Date
1001 Christina Illinois California 15-JAN-2003
Advantages:
- This does not increase the size of the table, since new information is updated.
- This allows us to keep some part of history.

Disadvantages:
- Type 3 will not be able to keep all history where an attribute is changed more
than once. For example, if Christina later moves to Texas on December 15,
2003, the California information will be lost.
Usage:
Type 3 is rarely used in actual practice.
When to use Type 3:
Type III slowly changing dimension should only be used when it is necessary for
the data warehouse to track historical changes, and when such changes will only
occur for a finite number of time.

Dimension Table Examples:

Customer Dimension

Customer Name Gender Income Education Region


ID

1 Brian Edge M 2 3 4

2 Fred Smith M 3 5 1

3 Sally Jones F 1 7 3

 Date Dimension

Time ID DateKey Date_UK Date_USA DayofMonth DayName

1 20130101 01/01/2013 01/01/2013 1 Tuesday

2 20130102 02/01/2013 02/01/2013 2 Wednesday


3 20130103 03/01/2013 03/01/2013 3 Thursday

  Product Dimension

Product Product Product Business Key Batch ID Category Group


ID Name

1 Aero Milk AC-3AA 1 Dairy Dairy

2 Bikky Rice BK-B34 2 Food Cereals

3 Bikky Bics BZ-CG5 2 Biscuits Cookies

What is a surrogate key? & difference between primary key and surrogate key?
A surrogate key is a substitution for the natural primary key. It is  a unique
identifier or number ( normally created by a database sequence generator ) for
each record of a dimension table that can be used for the primary key to the
table.

A surrogate key  is useful because  natural keys may change.         

   What is the difference between a primary key and a surrogate key?


A primary key is a special constraint on a column or set of columns. A primary
key constraint ensures that the column(s) so designated have no NULL values,
and that every value is unique. 
Physically, a primary key is implemented by the database system using a unique
index, and all the columns in the primary key must have been declared NOT
NULL. A table may have only one primary key, but it may be composite (consist
of more than one column).
A surrogate key is any column or set of columns that can be declared as the
primary key instead of a "real" or natural key. 
Sometimes there can be several natural keys that could be declared as the
primary key, and these are all called candidate keys. So a surrogate is a
candidate key.
 A table could actually have more than one surrogate key, although this would
be unusual. 
The most common type of surrogate key is an incrementing integer, such as an
auto increment column in MySQL, or a sequence in Oracle, or an identity column
in SQL Server.

ODS(Operational Data Sources)


Its a replica of OLTP system and so the need of this, is to reduce the burden on
production system (OLTP) while fetching data for loading targets. Hence its a
mandate Requirement for every Warehouse.
So every day do we transfer data to ODS from OLTP to keep it up to date?
OLTP is a sensitive database they should not allow multiple select statements it
may impact the performance as well as if something goes wrong while fetching
data from OLTP to data warehouse it will directly impact the business.
ODS is the replication of OLTP.
ODS is usually getting refreshed through some oracle jobs.

enables management to gain a consistent picture of the business. 

Dimensional Model
A type of data modeling suited for data warehousing. In a dimensional model,
there are two types of tables: dimensional tables and fact tables. Dimensional
table records information on each dimension, and fact table records all the
"fact", or measures.
1.                       Data modeling
There are three levels of data modeling. They are conceptual, logical, and
physical. This section will explain the difference among the three, the order with
which each one is created, and how to go from one level to the other.
2.                       Conceptual Data Model
Features of conceptual data model include:
 Includes the important entities and the relationships among them.
 No attribute is specified.
 No primary key is specified.
At this level, the data modeler attempts to identify the highest-level
relationships among the different entities.
3.                       Logical Data Model
Features of logical data model include:
 Includes all entities and relationships among them.
 All attributes for each entity are specified.
 The primary key for each entity specified.
 Foreign keys (keys identifying the relationship between different entities)
are specified.
 Normalization occurs at this level.
At this level, the data modeler attempts to describe the data in as much detail as
possible, without regard to how they will be physically implemented in the
database.
In data warehousing, it is common for the conceptual data model and the logical
data model to be combined into a single step (deliverable).
The steps for designing the logical data model are as follows:
1.     Identify all entities.
2.     Specify primary keys for all entities.
3.     Find the relationships between different entities.
4.     Find all attributes for each entity.
5.     Resolve many-to-many relationships.
6.     Normalization.
4.                       Physical Data Model
Features of physical data model include:
 Specification all tables and columns.
 Foreign keys are used to identify relationships between tables.
 Demoralization may occur based on user requirements.
 Physical considerations may cause the physical data model to be quite
different from the logical data model.
At this level, the data modeler will specify how the logical data model will be
realized in the database schema.
The steps for physical data model design are as follows:
1.     Convert entities into tables.
2.     Convert relationships into foreign keys.
3.     Convert attributes into columns.
1.     http://www.learndatamodeling.com/dm_standard.htm
2.     Modeling is an efficient and effective way to represent the organization’s
needs; It provides information in a graphical way to the members of an
organization to understand and communicate the business rules and processes.
Business Modeling and Data Modeling are the two important types of modeling.

The differences between a logical data model and physical data model is
shown below.
Logical vs Physical Data Modeling
Logical Data Model Physical Data Model
Represents business information Represents the physical implementation
and defines business rules of the model in a database.
Entity Table
Attribute Column
Primary Key Primary Key Constraint
Alternate Key Unique Constraint or Unique Index
Inversion Key Entry Non Unique Index
Rule Check Constraint, Default Value
Relationship Foreign Key
Definition Comment

What is Granularity?
Principle: create fact tables with the most granular data possible to support
analysis of the business process.
In Data warehousing grain refers to the level of detail available in a given fact
table as well as to the level of detail provided by a star schema.
It is usually given as the number of records per key within the table. In general,
the grain of the fact table is the grain of the star schema.

Facts: Facts must be consistent with the grain.all facts are at a uniform grain.
 Watch for facts of mixed granularity
 Total sales for day & montly total
Dimensions: each dimension associated with fact table must take on a single
value for each fact row.

 Each dimension attribute must take on one value.


 Outriggers are the exception, not the rule.

What is Granularity?
Principle: create fact tables with the most granular data possible to support
analysis of the business process.
In Data warehousing grain refers to the level of detail available in a given fact
table as well as to the level of detail provided by a star schema.
It is usually given as the number of records per key within the table. In general,
the grain of the fact table is the grain of the star schema.

Facts: Facts must be consistent with the grain.all facts are at a uniform grain.
 Watch for facts of mixed granularity
 Total sales for day & montly total
Dimensions: each dimension associated with fact table must take on a single
value for each fact row.

 Each dimension attribute must take on one value.


 Outriggers are the exception, not the rule.
Posted by Sandeep Reddy at 1:23 AM No comments: 
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: DWH CONCEPTS

Fact and Dimension Examples


A fact table works with dimension tables. A fact table holds the data to be
analyzed, and a dimension table stores data about the ways in which the data in
the fact table can be analyzed. Thus, the fact table consists of two types of
columns. The foreign keys column allows joins with dimension tables, and the
measures columns contain the data that is being analyzed.

Suppose that a company sells products to customers. Every sale is a fact that
happens, and the fact table is used to record these facts. For example:

 Fact Table Example:

Time ID Product ID Customer ID Unit


Sold

4 17 2 1
8 21 3 2

8 4 1 1

 Now we can add a dimension table about customers:

 Dimension Table Examples:

Customer Dimension

Customer Name Gender Income Education Region


ID

1 Brian Edge M 2 3 4

2 Fred Smith M 3 5 1

3 Sally Jones F 1 7 3

 Date Dimension

Time ID DateKey Date_UK Date_USA DayofMonth DayName

1 20130101 01/01/2013 01/01/2013 1 Tuesday

2 20130102 02/01/2013 02/01/2013 2 Wednesday

3 20130103 03/01/2013 03/01/2013 3 Thursday

  Product Dimension

Product Product Product Business Key Batch ID Category Group


ID Name

1 Aero Milk AC-3AA 1 Dairy Dairy

2 Bikky Rice BK-B34 2 Food Cereals

3 Bikky Bics BZ-CG5 2 Biscuits Cookies

In this example, the customer ID column in the fact table is the foreign key that
joins with the dimension table. By following the links, you can see that row 2 of
the fact table records the fact that customer 3, Sally Jones, bought two items on
day 8. The company would also have a product table and a time table to
determine what Sally bought and exactly when.

When building fact tables, there are physical and data limits. The ultimate size of
the object as well as access paths should be considered. Adding indexes can help
with both. However, from a logical design perspective, there should be no
restrictions. Tables should be built based on current and future requirements,
ensuring that there is as much flexibility as possible built into the design to allow
for future enhancements without having to rebuild the data.

Posted by Sandeep Reddy at 1:21 AM No comments: 


Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: DWH CONCEPTS

What is Fact and Dimension?

A "fact" is a numeric value that a business wishes to count or sum.  


A "dimension" is essentially an entry point for getting at the facts. Dimensions
are things of interest to the business.

A set of level properties that describe a specific aspect of a business, used for
analyzing the factual measures.

What is Fact Table?


A Fact Table in a dimensional model consists of one or more numeric facts of
importance to a business.  Examples of facts are as follows:
·        the number of products sold
·        the value of products sold
·        the number of products produced
the number of service calls received

What is Factless Fact Table?


Factless fact table captures the many-to-many relationships between
dimensions, but contains no numeric or textual facts. They are often used to
record events or coverage information.
Common examples of factless fact tables include:
 Identifying product promotion events (to determine promoted products
that didn't sell)
 Tracking student attendance or registration events
 Tracking insurance-related accident events 

Types of facts?

 Additive: Additive facts are facts that can be summed up through all of
the dimensions in the fact table.
 Semi-Additive: Semi-additive facts are facts that can be summed up for
some of the dimensions in the fact table, but not the others.
 Non-Additive: Non-additive facts are facts that cannot be summed up for
any of the dimensions present in the fact table.
 Cumulative: This type of fact table describes what has happened over a
period of time. For example, this fact table may describe the total sales by
product by store by day. The facts for this type of fact tables are mostly additive
facts. The first example presented here is a cumulative fact table.
 Snapshot: This type of fact table describes the state of things in a
particular instance of time, and usually includes more semi-additive and non-
additive facts. The second example presented here is a snapshot fact table.
Fact Table Example:

Time ID Product ID Customer ID Unit


Sold

1 17 2 1

3 21 3 2

1 4 1 1

What is Dimension Table?

Dimension tables contain details about each instance of an object. For example,
the items dimension table would contain a record for each item sold in the store.
It might include information such as the cost of the item, the supplier, color,
sizes, and similar data. 

What is a Data Warehouse?


            A Data Warehouse is a collection of data marts representing historical data
from different operational data source (OLTP). The data from these OLTP are
structured and optimized for querying and data analysis in a Data Warehouse.
2. What is a Data mart?
            A Data Mart is a subset of a data warehouse that can provide data for
reporting and analysis on a section, unit or a department like Sales Dept, HR Dept,
etc. The Data Mart are sometimes also called as HPQS (Higher Performance Query
Structure).
3. What is OLAP?
            OLAP stands for Online Analytical Processing. It uses database tables (Fact
and Dimension tables) to enable multidimensional viewing, analysis and querying of
large amount of data.
4. What is OLTP?
            OLTP stands for Online Transaction Processing Except data warehouse
databases the other databases are OLTPs. These OLTP uses normalized schema
structure. These OLTP databases are designed for recording the daily operations and
transactions of a business.
5. What are Dimensions?
            Dimensions are categories by which summarized data can be viewed. For
example a profit Fact table can be viewed by a time dimension.
6. What are Confirmed Dimensions?
            The Dimensions which are reusable and fixed in nature Example customer,
time, geography dimensions.
7. What are Fact Tables?
            A Fact Table is a table that contains summarized numerical (facts) and
historical data. This Fact Table has a foreign key-primary key relation with a
dimension table. The Fact Table maintains the information in 3rd normal form.
            A star schema is defined is defined as a logical database design in which there
will be a centrally located fact table which is surrounded by at least one or more
dimension tables. This design is best suited for Data Warehouse or Data Mart.
8.  What are the types of Facts?
            The types of Facts are as follows.
1.     Additive Facts: A Fact which can be summed up for any of the dimension
available in the fact table.
2.     Semi-Additive Facts: A Fact which can be summed up to a few dimensions and
not for all dimensions available in the fact table.
3.     Non-Additive Fact: A Fact which cannot be summed up for any of the
dimensions available in the fact table.
9. What are the types of Fact Tables?
            The types of Fact Tables are:
1.     Cumulative Fact Table: This type of fact tables generally describes what was
happened over the period of time. They contain additive facts.
2.     Snapshot Fact Table: This type of fact table deals with the particular period of
time. They contain non-additive and semi-additive facts.
10. What is Grain of Fact?
            The Grain of Fact is defined as the level at which the fact information is stored
in a fact table. This is also called as Fact Granularity or Fact Event Level.
11. What is Factless Fact table?
            The Fact Table which does not contains facts is called as Fact Table. Generally
when we need to combine two data marts, then one data mart will have a fact less
fact table and other one with common fact table.
12. What are Measures?
            Measures are numeric data based on columns in a fact table.
13. What are Cubes?
            Cubes are data processing units composed of fact tables and dimensions from
the data warehouse. They provided multidimensional analysis.
14. What are Virtual Cubes?
            These are combination of one or more real cubes and require no disk space to
store them. They store only definition and not the data.
15. What is a Star schema design?
            A Star schema is defined as a logical database design in which there will be a
centrally located fact table which is surrounded by at least one or more dimension
tables. This design is best suited for Data Warehouse or Data Mart.
16. What is Snow Flake schema Design?
            In a Snow Flake design the dimension table (de-normalized table) will be
further divided into one or more dimensions (normalized tables) to organize the
information in a better structural format. To design snow flake we should first design
star schema design.
17. What is Operational Data Store [ODS] ?
            It is a collection of integrated databases designed to support operational
monitoring. Unlike the OLTP databases, the data in the ODS are integrated, subject
oriented and enterprise wide data.
18. What is Denormalization?
            Denormalization means a table with multi duplicate key. The dimension table
follows Denormalization method with the technique of surrogate key.
19. What is Surrogate Key?
            A Surrogate Key is a sequence generated key which is assigned to be a primary
key in the system (table).
20. What are the client components of Informatica 7.1.1?
Informatica 7.1.1 Client Components:
1.     Informatica Designer
2.     Informatica Work Flow  Manager
3.     Informatica Work Flow Monitor
4.     Informatica Repository Manager
5.     Informatica Repository Server Administration Console.
21. What are the server components of Informatica 7.1.1?
Informatica 7.1.1 Server Components:
1.     Informatica Server
2.     Informatica Repository Server.
22. What is Metadata?
            Data about data is called as Metadata. The Metadata contains the definition of
a data.
23. What is a Repository?
            Repository is a centrally stored container which stores the metadata, which is
used by the Informatica Power center server andPower Center client tools. The
Informatica stores Repository in relational database format.
            Informatica 7.1.1 Repository has 247 database objects
            Informatica 6.1.1 Repository has 172 database objects
            Informatica 5.1.1 Repository has 145 database objects
            Informatica 4.1.1 Repository has 111 database objects
24. What is Data Acquisition Process?
            The process of extracting the data from different source (operational
databases) systems, integrating the data and transforming the data into a
homogenous format and loading into the target warehouse database. Simple called as
ETL (Extraction, Transformation and Loading). The Data Acquisition process
designs are called in different manners by different ETL vendors.
            Informatica   —->  Mapping
            Data Stage  —->  Job
            Abinitio        —->  Graph
25. What are the GUI based ETL tools?
            The following are the GUI based ETL tools:
1.     Informatica
2.     DataStage
3.     Data Junction
4.     Oracle Warehouse Builder
5.     Abinitio
6.     Business Object Data Integrator
7.     Cognos Decision Stream.
26. What are programmatic based ETL tools?
            1. Pl/Sql
            2. SAS BASE
            3. SAS ACCESS
            4. Tera Data Utilities
                        a. BTEQ
                        b. Fast Load
                        c. Multi Load
                        d. Fast Export
                        e. T (Trickle) Pump
27.  What is a Transformation?
            A transformation is a repository object that generates, modifies, or passes
data. Transformations in a mapping represent the operations the PowerCenter
Server performs on the data. Data passes into and out of transformations through
ports that you link in a mapping or mapplet. Transformations can be active or
passive. An active transformation can change the number of rows that pass through
it. A passive transformation does not change the number of rows that pass through it.
 

29. What are features of Informatica Repository Server?


            Features of Informatica Repository Server.
1.    Informatica client application and Informatica server access the repository
database tables through the Repository Server.
2.     Informatica client connects to the repository server through the host name/ IP
address and its port number.
3.     The Repository Server can manager multiple repository on different machines
on the network.
4.     For each repository database registered with the Repository Server it configures
and manages a Repository Agent process.
5.     The Repository Agent is a multi-threaded process that performs the action
needed to retrieve, insert and updated metadata in the repository database tables.
30. What is a Work Flow?
            A Work Flow is a set of instructions on how to execute tasks such as sessions,
emails and shell commands. A WorkFlow is created from Workflow Manager.
31. What is the uses of Lookup Transformation?
              The Lookup Transformation is useful for:
1.     Getting a related value form a table using a key column value
2.     Update slowly changing dimension table
3.     To check whether records already exists in the table.
.
32. What are the different sources of Source systems of Data Warehouse?
1. RDBMS
2. Flat Files
3. XML Files
4. SAP R/3
5. PeopleSoft
6. SAP BW
7. Web Methods
8. Web Services
9. Seibel
10. Cobol Files
11. Legacy Systems.
33. Types of Slowly Changing Dimensions:
1. Type – 1 (Recent updates)
2. Type – 11 (Full historical information)
3. Type – 111 (Partial historical information)
34. What are Update Strategy’s target table options?
1. Update as Update: Updates each row flagged for update if it exists in the table.
2. Update as Insert: Inserts a new row for each update.
3. Update else Insert: Updates if row exists, else inserts.
35. What does a Mapping document contains?
The Mapping document contains the following information :
1. Source Definition – from where the database has to be loaded
2. Target Definition – to where the database has to be loaded
3. Business Logic – what logic has to be implemented in staging area.
36. What does the Top Down Approach says?
The Top Down Approach is coined by Bill Immon. According to his approach he says
“First we need to implement the Enterprisedata warehouse by extracting the data
from individual departments and from the Enterprise data warehouse develop
subject oriented databases called as “Data Marts”.
37. What does the Bottom Up Approach or Ralph Kimball Approach says?
The Bottom Down Approach is coined by Ralph Kimball. According to his approach
he says “First we need to develop subject oriented database called as “Data Marts”
then integrate all the Data Marts to develop the Enterprise data warehouse.
38. Who is the first person in the organization to start the Data Warehouse project?
The first person to start the Data Warehouse project in a organization is Business
Analyst.
39. What is a Dimension Modeling?
A Dimensional Modeling is a high level methodology used to implement the start
schema structure which is done by the Data Modeler.
40. What are the types of OLAPs ?
1. DOLAP: The OLAP tool which words with desktop databases are called as
DOLAP. Example: Cognos EP 7 Series and Business Objects, Micro strategy.
2. ROLAP: The OLAP which works with Relational databases are called as
ROLAP. Example: Business Object, Micro strategy, Cognos ReportNet and
BRIO.
3. MOLAP: The OLAP which is responsible for creating multidimensional
structures called cubes are called as MOLAP. Example: Cognos ReportNet.
4. HOLAP: The OLAP which uses the combined features of ROLAP and MOLAP
are called as HOLAP. Example Cognos ReportNet.
41. What is worklet?
The worklet is a group of sessions. To execute the worklet we have to create the
workflow.
42. Why we use lookup transformation?
Look up Transformations can access data from relational tables that are not sources
in mapping. With Lookup transformation, we can accomplish the following tasks.
43. What is a Power Center Repository?
The Power Center Repository allows you to share metadata across repositories to
create a data mart domain. In a data mart domain, you can create a single global
repository to store metadata used across an enterprise and a number of local
repositories to share the global metadata as needed.

Q1. Which basic tasks primarily done by ETL tester?

A1. A ETL Tester primarily test source data extraction, business transformation logic and target table loading .
There are so many tasks involved for doing the same , which are given below -

1. Stage table / SFS or MFS file created from source upstream system - below checks come under this :

a) Record count Check

b) Reconcile records with source data


c) No Junk data loaded

d) Key or Mandatory Field not missing

e) duplicate data not there

f) Data type and size check

2) Business transformation logic applied - below checks come under this :

a) Business data check like telephone no cant be more than 10 digit or character data

b) Record count check for active and passing transformation logic applied

c) Derived Field from the source data is proper

d) Check Data flow from stage to intermediate table

e) Surrogate key generation check if any

3. Target table loading from stage file or table after applying transformation - below check come under this

a) Record count check from intermediate table or file to target table

b) Mandatory or key field data not missing or Null

c) Aggregate or derived value loaded in Fact table

d) Check view created based on target table

e) Truncate and load table check

f) CDC applied on incremental load table

g) dimension table check & history table check

h) Business rule validation on loaded table

i) Check reports based on loaded fact and dimension table


========================================================================

Q2. Generally how enevironemt variables and paths are set in Unix?

A2. dot(.) profile , normally while logging this will be executed or we can execute as dot(.) dot(.)profile

========================================================================

3. If a column is added into a table, tell me the test cases you will write for this?

A3. Following test cases you can write for this -

1. Check that particular column data type and size is as per the data model.

2. Check data is getting loaded into that column as per the DEM (data element mapping)

3. Check the valid values , null check and boundary value check for that column

========================================================================
Q4.Let's suppose you are working on a project where requirement keeps changing. How would you tackle this?

A4. If the requirement is getting changed frequently then we need to lot of regression for the same functionality
which has been tested. Then you need to be ready with all your input test data and expected result, so after
checking changed part , you can run all the test cases and check the results in no time.

========================================================================

Q5. How do you modify your test data while doing the testing?

A5. If input test data is ASCII file, then you can easliy prepare it in notepad+ based on the interface and then ftp
it to unix server and if it's table then you can insert the rows into table as per the data model. If file other than
ascii format then we can use abinitio graph to covert excel sheet into required format or use other tools are
available for doing the same.

========================================================================

Q6. A table has partitions by range on data_dt, suppose it has already defined monthly partitions PTN_01
(values less than (TO_DATE ('01-Feb-2014' , 'dd-mon-yyyy' ))) for january 2014 data only and we are trying to
load data for feb 2014 then what will happen? If you find any error then how to solve the same.

A6. It will fetch error - “Inserted partition key does not map to any partition” (ORA -14400) . It means parition
is not there for feb data which we are trying to load, so add a new partition in the table for feb month data as
below :
Alter table table_name add partition partition_name values less than (TO_DATE ('01-MAR-2014' , 'dd-mon-
yyyy' ))

Note : Remember we can only create new partition for higher value than the previous created partition (it means
here we can't add partition for dec 2013 as we have higher value is feb 2014 here.

========================================================================

Q7. How will you connect oracle database from unix server?

A7. sqlplus username/password@dbserver

========================================================================

Q8. If one of the Oracle procedure fetches error - “No data found” then what is the issue here?

A8. In that procedure definitely we are retrieving the data from table and passing it to a variable. If that select
statement is not fetching any row then we will get this error.

========================================================================

Q9. If one of your wrapper unix script is fetching the error - “not enough memory” then what you will do?
A9. First we can check the disk usage by command df -h , then we can clean up accordingly and run the script
again.

========================================================================

Q10. let's suppose we have to two tables, item (Primary Key : item id)and order (Primary Key order id, Foreign
Key : item id) . If we try to delete items from order table then will we able to delete? If not then how can we do
that?

A10. If we make an attempt to delete or truncate a table with unique or primary keys referenced by foreign keys
enabled in another table then we get error : “ORA-02266 unique/primary keys in table referenced by enabled
foreign keys”

So, before deleting or truncating the table, disable the foreign key constraints in other tables or delete the data
from foreign table item first then from the primary table order here.

========================================================================

Q11. Why do we create index on a table? Please explain

A11. In nutshell I can say - for faster retrieval of data we use indexes, let's suppose I created a table order which
will contain billions of data and I know - most of the time I will be querying this table using order id then I
should make index on Order table for faster result.

========================================================================
Q12. What will be the default permission of a file created in UNIX? How can we provide all access to all?

A12. When a file is created, the permission flags are set according to the file mode creation mask, which can be
set using the "umask" command. If umask value is set as 002 then it means file permission is 664. (-rw-rw-r--).
we can chnage the permission of file as below :

chmod 777 filename (4: read , 2: write , 1: execute)

========================================================================

Q13. How we can link a defect with a test script in QC?

A13. First We should fail the test case step in test lab , then we can click on new defect (red color symbol) then
enter the defect details there and raise it. Then that defect is linked with that particular step of test case. One
more thing as we mention issues in actual result that will come in defect description automatically (no need to
put issue details again)

========================================================================

Q14. What are the different methods to load table from files in Oracle? Also tell me methods for teradata.

A14. SQL Loader, External table loading, loading through driver JDBC . In teradata we use multiload or
fastload usually.
========================================================================

Q15. What are the things you will check before you start testing? What will be your deliverables?

A15. Before starting the testing, requirement document, functional spec . Technical spec , interface, dem and
unit test result should be availble atleast. My deliverables will be – test plan, test spec, test script, defect
summary with root causal analysis, test execution or result report and automation script (if created).

========================================================================

Q16. What do you understand by active and passive transformation in informatica?

A16. Active transformation - 


No or records in input != No of records in output (like - filter, router, source qualifier)

Passive transformation-
No or records in input = No of records in output (like - Expression, look-up, stored procedure)
========================================================================

Q17. Let's suppose we are having order table which are having duplicate order_id. How we can delete the
duplicate rows from the table? Tell atleast two methods you know.

A17. First we can do the below :

create table order_new as select distinct * from order ;

drop table order ;

rename order_new to order ;


Note : This method is faster , but we need to recreate index, partitions, constraints....

Second method

delete from order a where rowid > (select min(rowid) from order b where a.order_id = b.order_id);

Note : here we deleting the duplicate rows based on rowid which is uniquely assigned to each row by Oracle.

========================================================================

Q18. How you will find the second highest salary from the employee table? Tell me two methods atleast.

A18. First method – we can use sub query to find this as below :

select max(sal)

from emp

where sal not in (select max(sal) from emp ) ;

Note : first we fond the hoghest salary here then next highest will be the second salary only

Second method – we can use row_number for the same as below :

SELECT empno, sal

FROM
(

select empno, sal, ROW_NUMBER() OVER (order by sal desc) RN

from emp

order by sal desc

WHERE RN = 2;

========================================================================

Q19. How we can find out the last two modified file for a particular file mask abc* in unix?

A19. We can do this using very simple command : ls -lrt abc* | tail -2

Note: To check the last command was successful or not – we can use echo $?

========================================================================

20. How you will delete the last line of a file? Tell atleast two methods

A20 first , we can do using Sed as : Sed -i '$d' file_name


second method -

cp file_main file_bkp

sed '$d' file_bkp > file_main

rm -f file_bkp

Note : In direction > operator is used to move all the contents of file into another, >> used to append the one file
data into another file.

========================================================================

Q21. Let's suppose we are migrating the data from legacy file system to oracle database tables. How would you
validate that the data is migrated propely? Tell me the imp. test scenario and test cases you will write to test this
requirement and how will you test that scenario?

A21 . Following scenario can be written for the same:

1) Check that all table DDL as per data model (desc table name )

2) Check that all records are loaded from source to target (match record count)

3) Check null value or junk data is not loaded into table ( check record count by putting not null conditions)

4) Check valid values for columns ( based on group by find all the counts for particular values for a field)

5) Check same data is loaded ( put a join between source table (if there) stage table and taget table and check the
result)
6) Check key fields ( add number fields for target and source and match them)

7) Check business logics (create a sheet with input and output)

8) After initial loading, check incremental loading ( insert/ delete /update check all the CDC cases)

9) check the output file layout (if any)

21 Basic ETL Testing Interview Questions and Answers:

1. What's the different between OLTP and OLAP system? Please provide one example for both.

================================================================================

2. What's the advantage with Snowflake compared to Star Schema?

================================================================================

3. What is dimensional modelling?

================================================================================

4.Let's suppose you are working on a tech refresh project i.e. (from VB code to .net code). Which
type of testing you will be doing in this project ??

================================================================================

5.Why OLAP system are not normalized while OLTP are highly normalized system?

================================================================================

6. Which SCD type is used to store all historical data along with current data?

================================================================================

7.What's the difference between active and passive transformation?

================================================================================

8. How ETL testing is different from conventional testing?

================================================================================

9. What do you understand by ETL ? Why we need ETL ?

================================================================================

10. What is the difference between data mart and data warehouse?

================================================================================

11. What are the different architectures of data warehouse systems?


================================================================================

12. Let's suppose in a test scenario one column is added in a table. What kind of test cases you will
write?

================================================================================

13. If a requirement keeps changing, then how will you test?

14. What do you understand by business process testing ?

================================================================================

15. Which are the important steps carried out in ETL after extracting the data from source system
and before transforming the data??

================================================================================

16. What do you understand by fact and dimension table?

================================================================================

17. What do you understand by slowly changing dimensions?

================================================================================

18. How would you validate that junk data is not loaded into the target table ?

================================================================================

19. How do you check the logs if any workflow (Informatica) failed while executing?

================================================================================

20. Have you worked on Multi file systems? How ab-initio graph process those files??

================================================================================

21. Let's suppose we are migrating the data from legacy file system to oracle database tables. How
would you validate that the data is migrated properly? Tell me the test scenario and test cases you
will write to test this requirement. ??

51 Oracle Interview Questions and Answers -

1. Reference Cursor ?

A reference cursor is a pointer to a memory location that can be passed


between different PL/SQL clients, thus allowing query result sets to be
passed back and forth between clients.
A reference cursor is a variable type defined using the PL/SQL TYPE
statement within an Oracle package, much like a PL/SQL table:

TYPE ref_type_name IS REF CURSOR RETURN return_type;

Here, ref_type_name is the name given to the type and return_type


represents
a record in the database. You do not have to specify the return type as
this
could be used as a general catch-all reference cursor. Such non-restrictive
types are known as weak, whereas specifying the return type is restrictive,
or strong. The following example uses %ROWTYPE to define a strong return
type that represents the record structure of the emp table:

DECLARE TYPE EmpCurType IS REF CURSOR RETURN emp%ROWTYPE;

http://www.vbip.com/books/1861003927/chapter_3927_16.asp

===========================================================================
=====
2. Anonymous Block ?
An unnamed PL/SQL block is called an anonymous block.

A PL/SQL block contains three parts or sections. They are:


• The optional Declaration section.
• The mandatory Execution section.
• The optional Exception (or Error) Handling
section.

===========================================================================
=====
3. Explain how cursor is implemented in PL/SQL

http://www.unix.org.ua/orelly/oracle/prog2/ch06_02.htm

A cursor is a variable that runs through the rows of some table or answer
to
some query.

CURSOR employee_cur IS SELECT * FROM employee;

Once I have declared the cursor, I can open it:


OPEN employee_cur;

And then I can fetch rows from it:


FETCH employee_cur INTO employee_rec;

and, finally, I can close the cursor:


CLOSE employee_cur;

***********
DECLARE
CURSOR joke_feedback_cur
IS
SELECT J.name, R.laugh_volume, C.name
FROM joke J, response R, comedian C
WHERE J.joke_id = R.joke_id
AND J.joker_id = C.joker_id;
BEGIN
...
END;
===========================================================================
=====
4. Constraints ?
refer :
http://storacle.princeton.edu:9001/oracle8-doc/server.805/a58241/
ch6.htm#1309

Integrity constraints are used to enforce Business logic.


Enforcing rules with integrity constraints is less costly than enforcing
the
equivalent rules by issuing SQL statements in the application.

Only define NOT NULL constraints for columns of a table that absolutely
require values at all times.

Use the combination of NOT NULL and UNIQUE key integrity constraints to
force the input of values in the UNIQUE key; this combination of data
integrity rules eliminates the possibility that any new row's data will
ever
attempt to conflict with an existing row's data.

Choosing a Table's Primary Key :


Each table can have one primary key. A primary key allows each row in a
table to be uniquely identified and ensures that no duplicate rows exist.
Use the following guidelines when selecting a primary key:

Choose a column whose data values are unique.


The purpose of a table's primary key is to uniquely identify each row of
the
table. Therefore, the column or set of columns in the primary key must
contain unique values for each row.

Choose a column whose data values are never changed.


A primary key value is only used to identify a row in the table; primary
key
values should never contain any data that is used for any other purpose.
Therefore, primary key values should rarely need to be changed.

Choose a column that does not contain any nulls.


A PRIMARY KEY constraint, by definition, does not allow the input of any
row
with a null in any column that is part of the primary key.

Choose a column that is short and numeric.


Short primary keys are easy to type. You can use sequence numbers to easily
generate numeric primary keys.

Avoid choosing composite primary keys.


Although composite primary keys are allowed, they do not satisfy the
previous recommendations. For example, composite primary key values are
long
and cannot be assigned by sequence numbers.

Foreign Key(Referential Integrity): Whenever two tables are related by a


common column (or set of columns), define a PRIMARY or UNIQUE key
constraint
on the column in the parent table, and define a FOREIGN KEY constraint on
the column in the child table, to maintain the relationship between the two
tables.
Using CHECK Integrity Constraints
Use CHECK constraints when you need to enforce integrity rules that can be
evaluated based on logical expressions. Never use CHECK constraints when
any
of the other types of integrity constraints can provide the necessary
checking.

*The condition must be a Boolean expression that can be evaluated using the
values in the row being inserted or updated.
*The condition cannot contain subqueries or sequences.
*The condition cannot include the SYSDATE, UID, USER, or USERENV SQL
functions.
*The condition cannot contain the pseudocolumns LEVEL, PRIOR, or ROWNUM;
*The condition cannot contain a user-defined SQL function.

Eg.
CREATE TABLE dept (

deptno NUMBER(3) PRIMARY KEY,


dname VARCHAR2(15),
loc VARCHAR2(15),
CONSTRAINT dname_ukey UNIQUE (dname, loc),
CONSTRAINT loc_check1
CHECK (loc IN ('NEW YORK', 'BOSTON', 'CHICAGO')));

===========================================================================
=====
5. Difference between Primary Key and Unique key ?

* Primary keys are used to identify each row of the table uniquely. Unique
keys should not have the purpose of identifying rows in the table.
* A primary key field cannot be Null, whereas a Unique column can have a
Null value.
* There could be only 1 Primary Key per table, but there could be any
number
of Unique Key columns.
* A primary key should be a Unique column, but all Unique Key column need
not be a PRimary Key
===========================================================================
=====
6. what are diffenent Joins explain . outer ? how would you write outer
join(=+)
if specify + to the right of equal to sign, which table data will be in
full.

A join is a query that combines rows from two or more tables.


An equijoin is a join with a join condition containing an equality
operator.
( where A.b = C.d)
A self join is a join of a table to itself.
Cartesian Products
If two tables in a join query have no join condition, then Oracle returns
their Cartesian product. Oracle combines each row of one table with each
row
of the other. A Cartesian product always generates many rows and is rarely
useful. For example, the Cartesian product of two tables, each with 100
rows, has 10,000 rows. Always include a join condition unless you
specifically need a Cartesian product. If a query joins three or more
tables
and you do not specify a join condition for a specific pair, then the
optimizer may choose a join order that avoids producing an intermediate
Cartesian product.

An inner join (sometimes called a "simple join") is a join of two or more


tables that returns only those rows that satisfy the join condition.

An outer join returns all rows that satisfy the join condition and also
returns some or all of those rows from one table for which no rows from the
other satisfy the join condition.

To write a query that performs an outer join of tables A and B and returns
all rows from A( left outer join), apply the outer join operator (+) to all
columns of B in the join condition.
For example,
SELECT ename, job, dept.deptno, dname
FROM emp, dept
WHERE emp.deptno (+) = dept.deptno;

will select all departments even if there is no employee for a particular


dept.

Similarly to get all rows from B form a join of A & B , apply the outer
join
operator(+) to A.
Example,
SELECT ename, job, dept.deptno, dname
FROM emp, dept
WHERE emp.deptno = dept.deptno (+)

===========================================================================
=====
7. Is a function in SELECT possible ?
Yes. FOr example, Select ltrim(price) from Item.
User defined functions work only from Oracle 8i onwards.
===========================================================================
=====
8. EXCEPTION handling in PL/SQL

An exception is a runtime error or warning condition, which can be


predefined or user-defined. Predefined exceptions are raised implicitly
(automatically) by the runtime system. User-defined exceptions must be
raised explicitly by RAISE statements. To handle raised exceptions, you
write separate routines called exception handlers.

Example of an exception handler:


DECLARE
pe_ratio NUMBER(3,1);
BEGIN
SELECT price / earnings INTO pe_ratio FROM stocks
WHERE symbol = 'XYZ'; -- might cause division-by-zero error
INSERT INTO stats (symbol, ratio) VALUES ('XYZ', pe_ratio);
COMMIT;
EXCEPTION -- exception handlers begin
WHEN ZERO_DIVIDE THEN -- handles 'division by zero' error
INSERT INTO stats (symbol, ratio) VALUES ('XYZ', NULL);
COMMIT;
...
WHEN OTHERS THEN -- handles all other errors
ROLLBACK;
END; -- exception handlers and block end here
Unlike predefined exceptions, user-defined exceptions must be declared and
must be raised explicitly by RAISE statements.
Example:
DECLARE
past_due EXCEPTION;
acct_num NUMBER;
BEGIN
...
...
IF ... THEN
RAISE past_due; -- this is not handled
END IF;

EXCEPTION
WHEN past_due THEN -- does not handle RAISEd exception
WHEN OTHERS THEN -- handle all other errors

END;

The procedure RAISE_APPLICATION_ERROR lets you issue user-defined ORA-


error
messages from stored subprograms. That way, you can report errors to your
application and avoid returning unhandled exceptions.

To call RAISE_APPLICATION_ERROR, use the syntax

raise_application_error(error_number, message[, {TRUE | FALSE}]);

where error_number is a negative integer in the range -20000 .. -20999 and


message is a character string up to 2048 bytes long. If the optional third
parameter is TRUE, the error is placed on the stack of previous errors. If
the parameter is FALSE (the default), the error replaces all previous
errors. RAISE_APPLICATION_ERROR is part of package DBMS_STANDARD, and as
with package STANDARD, you do not need to qualify references to it.

An application can call raise_application_error only from an executing


stored subprogram (or method). When called, raise_application_error ends
the
subprogram and returns a user-defined error number and message to the
application. The error number and message can be trapped like any Oracle
error.

Another way to handle exception is to use DECODE statements. For example:


INSERT INTO stats (symbol, ratio)
SELECT symbol, DECODE(earnings, 0, NULL, price / earnings)
FROM stocks WHERE symbol = 'XYZ';

Here the DECODE function checks whether the earnings is 0 and if it is


zero,
then returns earnings, else price/earnings.
===========================================================================
=====
9. When an exception is triggered in a loop, how you
continue to next itereration ?
By having an exception andling block within the loop.

===========================================================================
=====
10. Decode ? Using Decode map this logic,
If A>B Display 'A is Big', If A=B Display 'A equals B'
Else 'B is Big'

DECODE(Greatest(A,B), A, 'A is big',


B, 'B is Big',
'A equals B')

===========================================================================
=====
11. What is an index? what are diff type of indices? what is Clustered and
Non Clustered Indeces ?

Indexing is typically listing of keywords alongwith its location and is


done
to speed up the Data base.

Create index WORKERSKILL_NAME_SKILL on WORKERSKILL(Name,Skill);

A clustered index is a special type of index that reorders the way records
in the table are physically stored. Therefore table can have only one
clustered index. The leaf nodes of a clustered index contain the data
pages.
A nonclustered index is a special type of index in which the logical order
of the index does not match the physical stored order of the rows on disk.
The leaf node of a nonclustered index does not consist of the data pages.
Instead, the leaf nodes contain index rows.
===========================================================================
=====
12. Mutating Error ?
When we use a row level triger on a table, and at the same time if we
query/insert/delete/update on the same table, it will give the mutating
error.

===========================================================================
=====
13. How to avoid mutating error?
Two possible solutions.
1. Change the design of the table so as to avoid querying or any update on
the table on which the row level trigger is holding on.
2. Create a package which holds a PL/SQL table with the RowID of the rows
need to insert/update and a counter of the rows
a. In BEFORE STATEMENT trigger set the counter to zero.
b. In every ROW trigger call a procedure from the package which will
register the RowId and modify the counter
c. In AFTER STATEMENT trigger call a procedure which will make the needed
checks and update the particular rows of the table.
If you are not familiar with PL/SQL tables alternatively you can use
temporary Oracle tables. It is also a good idea.

===========================================================================
=====
14. Difference between Delete and Truncate.
Truncate will permenantly delete the record and no rollback capability will
exist.
The delete command remove records from a table. Delete moves records to the
rollback segment enabling rollback capability.
Excessive use of triggers can result in complex interdependencies, which
can
be difficult to maintain in a large application.
===========================================================================
=====
15. How many types of triggers are there and what are ? (Before Insert,
after Insert, update,
delete etc..)

Triggers are similar to stored procedures. A trigger stored in the database


can include SQL and PL/SQL or Java statements to run as a unit and can
invoke stored procedures. However, procedures and triggers differ in the
way
that they are invoked. A procedure is explicitly run by a user,
application,
or trigger. Triggers are implicitly fired by Oracle when a triggering event
occurs, no matter which user is connected or which application is being
used.
Different Types
---------------
Row Triggers and Statement Triggers
BEFORE and AFTER Triggers
INSTEAD OF Triggers ( ON views,for INSERT etc, instaed of Inserting, it
will
carry out the trigger statements)
Triggers on System Events(shut down) and User Events ( log on)
===========================================================================
=====
16. What all composite datatypes in oracle
Composite Data types:

Table

1. Is similar but not the same as a database table


2. Must contain only one column of any scalar datatype
3. Is like a one-dimensional array of any size
4. Has its elements indexed with a binary integer column called the primary
key of the table

Record

1. Contains uniquely defined columns of different data types,


2. Enables us to treat dissimilar columns that are logically related as a
single unit.

===========================================================================
=====
17. what is PL/SQL tables?

===========================================================================
=====
18. What are the default packages provided by oracle
The ones with "DBMS_" prefix. Eg. DBMS_Output, DBMS_ALERT

===========================================================================
=====
19. What is the diff between Where and Having

Having clause is used where group by clasue is used to restrict the groups
of returned rows to those groups for which the specified condition is TRUE.
The HAVING condition cannot contain a scalar subquery expression
eg:
SELECT department_id, MIN(salary), MAX (salary)
FROM employees
GROUP BY department_id
HAVING MIN(salary) < 5000;

===========================================================================
=====
20. How is error Handling done?

===========================================================================
=====
21. I have table that contains state codes,
I might have more than 1 rows with the same state code, how can I find out?
select count(*), state_code from table. IF this is > 1 , then we can assure
that there are duplicates.

===========================================================================
=====
22. How do you copy rows from Schema a , table p to
table p in schema b ?
To copy between tables on a remote database, include the same username,
password, and service name in the FROM and TO clauses:

COPY FROM HR/your_password@SchemaA -


TO HR/your_password@SchemaB -
INSERT tableP-
USING SELECT * FROM tableP

===========================================================================
=====
23. What is %Rowtype and %Type

PROCEDURE Get_emp_names(Dept_num IN Emp_tab.Deptno%TYPE)


..Means the INput Parameter Dept_num should have the same Data Type as
Emp_tab.DeptNo.

Use the %ROWTYPE attribute to create a record that contains all the columns
of the specified table. The following example defines the Get_emp_rec
procedure, which returns all the columns of the Emp_tab table in a PL/SQL
record for the given empno:

PROCEDURE Get_emp_rec (Emp_number IN Emp_tab.Empno%TYPE,


Emp_ret OUT Emp_tab%ROWTYPE) IS
BEGIN
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal, Comm, Deptno
INTO Emp_ret
FROM Emp_tab
WHERE Empno = Emp_number;
END;

You could call this procedure from a PL/SQL block as follows:

DECLARE
Emp_row Emp_tab%ROWTYPE; -- declare a record matching a
-- row in the Emp_tab table
BEGIN
Get_emp_rec(7499, Emp_row); -- call for Emp_tab# 7499
DBMS_OUTPUT.PUT(Emp_row.Ename || ' ' || Emp_row.Empno);
DBMS_OUTPUT.PUT(' ' || Emp_row.Job || ' ' || Emp_row.Mgr);
DBMS_OUTPUT.PUT(' ' || Emp_row.Hiredate || ' ' ||
Emp_row.Sal);
DBMS_OUTPUT.PUT(' ' || Emp_row.Comm || ' '|| Emp_row.Deptno);
DBMS_OUTPUT.NEW_LINE;
END;

===========================================================================
=====
24. How to use the cursor with using open, fetch and close.?

===========================================================================
=====
25. using select Statement how you will retrieve the user who is logged in?

SELECT SYS_CONTEXT ('USERENV', 'SESSION_USER')


FROM DUAL;

===========================================================================
=====
26. When Exception occurs, i want to see the error generated by Oracle. How
to see it?
SQLERRM
===========================================================================
=====
27. how to generate the out of the select statement in a file?
SPOOL

===========================================================================
=====
28. how to find the duplicate records?

select count(*), job from emp group by job having count(*) > 1

then there are duplicates.


===========================================================================
=====
29. What is Procedure, function, package and anonymous block?

Procedures are named groups of SQl statements.


Functions are like Procedures, but can return a value.
Packages are groups of procedures, functions, variables and SQL statements.
An unnamed PL/SQL block is called an anonymous block.
===========================================================================
=====
30. What is group by funcion and where it is used.?
Specify the GROUP BY clause if you want Oracle to group the selected rows
based on the value of expr(s) for each row and return a single row of
summary information for each group.
Expressions in the GROUP BY clause can contain any columns of the tables,
views, or materialized views in the FROM clause, regardless of whether the
columns appear in the select list.

It is used for aggraegate functions.

===========================================================================
=====
31. What is the difference between RDBMS and DBMS? Sol: Chords 12 rules.
===========================================================================
=====
32. What is normalised and denormalised data?
process of removing redundancy in data by separating the data into multiple
tables.

===========================================================================
=====
33. What is a VIEW?
A view is a custom-tailored presentation of the data in one or more tables.
A view can also be thought of as a "stored query." Views do not actually
contain or store data; they derive their data from the tables on which they
are based.

===========================================================================
=====
34. Can a view update a table?

Can, but with restrictions.


Like tables, views can be queried, updated, inserted into, and deleted
from,
with some restrictions.
All operations performed on a view affect the base tables of the view.
===========================================================================
=====
35. What happens if there is an exception in the cursor?
How do we ensure that the execution for other records in the cursor does
not
stop.

===========================================================================
=====
36. What are cursors ? After retrieving the records into the cursor can we
update
the record in the table for the retrieved record. What effect will it
have on the cursor?

Cursors
Oracle uses work areas to execute SQL statements and store processing
information. A PL/SQL construct called a cursor lets you name a work area
and access its stored information. There are two kinds of cursors: implicit
and explicit. PL/SQL implicitly declares a cursor for all SQL data
manipulation statements, including queries that return only one row. For
queries that return more than one row, you can explicitly declare a cursor
to process the rows individually. An example follows:

DECLARE
CURSOR c1 IS
SELECT empno, ename, job FROM emp WHERE deptno = 20;

The set of rows returned by a multi-row query is called the result set. Its
size is the number of rows that meet your search criteria
Multi-row query processing is somewhat like file processing. For example, a
COBOL program opens a file, processes records, then closes the file.
Likewise, a PL/SQL program opens a cursor, processes rows returned by a
query, then closes the cursor. Just as a file pointer marks the current
position in an open file, a cursor marks the current position in a result
set.

You use the OPEN, FETCH, and CLOSE statements to control a cursor. The OPEN
statement executes the query associated with the cursor, identifies the
result set, and positions the cursor before the first row. The FETCH
statement retrieves the current row and advances the cursor to the next
row.
When the last row has been processed, the CLOSE statement disables the
cursor.

===========================================================================
=====
37. What are user defined data types ?

SUBTYPE subtype_name IS base_type;


Eg. SUBTYPE EmpDate IS DATE; -- based on DATE type
SUBTYPE ID_Num IS emp.empno%TYPE; -- based on column type

CURSOR c1 IS SELECT * FROM dept;


SUBTYPE DeptFile IS c1%ROWTYPE; -- based on cursor rowtype

To specify base_type, you can use %TYPE, which provides the datatype of a
variable or database column. Also, you can use %ROWTYPE, which provides the
rowtype of a cursor, cursor variable, or database table.

A subtype does not introduce a new type; it merely places an optional


constraint on its base type.

===========================================================================
=====
38. How to copy the Structure and data from one table to another in one SQL
stmt?
COPY {FROM database | TO database | FROM database TO database}
{APPEND|CREATE|INSERT|REPLACE} destination_table [(column, column, column,
...)]
USING query

where database has the following syntax:

username[/password]@connect_identifier
===========================================================================
=====

39. Describe UNION and UNION ALL. UNION returns distinct rows selected by
both queries while UNION ALL returns all the rows. Therefore, if the table
has duplicates, UNION will remove them. If the table has no duplicates,
UNION will force a sort and cause performance degradation as compared to
UNION ALL.
===========================================================================
=====
40. What is 1st normal form? Each cell must be one and only one value, and
that value must be atomic: there can be no repeating groups in a table that
satisfies first normal form.
===========================================================================
=====
41. What is 2nd normal form? Every nonkey column must depend on the entire
primary key.
===========================================================================
=====
42. What is 3rd normal form? (another explanation than #1) No nonkey
column
depends on another nonkey column.
===========================================================================
=====

43. What is 4th normal form? Fourth normal form forbids independent
one-to-many relationships between primary key columns and nonkey columns.
===========================================================================
=====
44. What is 5th normal form? Fifth normal form breaks tables into the
smallest possible pieces in order to eliminate all redundancy within a
table. Tables normalized to this extent consist of little more than the
primary key.

45. How to find out top n salary from a employee table?


===========================================================================
=====

46.If we will modify a function , which is defined in package even though


stored package will be valid ?

===========================================================================
=====

47.How do you define public and private procedure in oracle?

===========================================================================
=====

48.If same kind of logic we put in function as well as procedure then which
one will be faster?

===========================================================================
=====

49.If we deleted any rows in table and commited then is it possible to


retrieve the deleted data?

===========================================================================
=====

50. While handiling excecption if we have written “When others then” first
before other exception handler then what will happen?

===========================================================================
=====
51. What is sql injection attack???
<form method="post" action="http://testasp.vulnweb.com/login.asp">
<input name="tfUName" type="text" id="tfUName">
<input name="tfUPass" type="password" id="tfUPass">
</form>
The easiest way for the login.asp to work is by building a database
query that looks like this:
SELECT id
FROM logins
WHERE username = '$username'
AND password = '$password’
If the variables $username and $password are requested directly from
the user's input, this can easily be compromised. Suppose that we
gave "Joe" as a username and that the following string was provided
as a password: anything' OR 'x'='x
SELECT id
FROM logins
WHERE username = 'Joe'
AND password = 'anything' OR 'x'='x'
As the inputs of the web application are not properly sanitised, the
use of the single quotes has turned the WHERE SQL command into a
two-component clause.
The 'x'='x' part guarantees to be true regardless of what the first
part contains.
This will allow the attacker to bypass the login form without
actually knowing a valid username / password combination!

101 UNIX INTERVIEW QUESTIONS AND ANSWERS -

1.      How do you change ownership of a file?    chown


2.      How do you change the ownership of the files recursively?  Chown -R
3.      How do you change access of a file? Chmod 777 filename
4.      How do you remove duplicate entries from a file? Uniq filename
5.      How do you restrict the file size to a particular value? Ulimit size
6.      How to truncate a file (say you want only last 20 line of a file) tail –20 filename
7.      What are wild characters? These are the characters which gives special meaning to the shell. Ex:
*, ?, ^ etc.
8.      What is sleep? What does it do?
     If you want some process to deactivate for a while then we can use sleep. It deactivates the
process for the specified seconds.
9.      Regular expression to fit the US  phone number of format 999-999?
10.  How do you view the .a files? Dump –H filename.a
11.  What is a library? What is the extension  of a library file? 
      Library is a directory file where functions provided by unix for programmers are stored.
Extension is .a.
12.  How to get information from a library file? Dump –H filename.a | grep
13.  What is the command used to output to both the console and the file?  Tee or tee with filename
14.  How to count the no.of files in the current directory? Wc
15.  Replace the blank characters with ~(tilde) in a file? Sed ‘s/ /~/’ filename
16.  How to print the contents of a file? Cat filename.
17.  How to list the process of a particular user?(And everything about ps)?  Ps –u <username>
18.  What is a pipe? What are named pipes?
Pipe concatenates two commands where the output of the first command acts as the input to the
second command and so on.
Name pipes create inter process communication. It is a special file.
Named pipes are similar to pipes where two main files will be there as fields[0] and fields[1]
which contains stream of files where the input one file destination stream act as the reading and
the other file stream act as write.
19.  How to mail a file to a user? Mail username
20.  How to read the telephone number from a file(say the no. is in the format ### - ####)?
21.  How to execute a process with lower priority? Nice (options –n)                                                           
22.  How do you see the process consuming more CPU time? Top ps
23.What these operator  $, $$, # , | , ? , > , <, >> [], ` ,$?, $*    does??
24.  What is an awk?
25.  How to start a background process? What happens when you exit a shell before the process
completes? By using &. It runs.
26.  Which is the first process that is started in Unix? Inittab
27.  How to get the values of control characters(say ctrl-c)?
28.  How to change the date and time of a file? Touch filename
29.  What is a semaphore?       
      A semaphore is a non-negative integer count and is generally used to co-ordinate access to
resources.  The initial semaphore count is set to the number of free resources, then threads slowly
increment and decrement the count as resources are added and removed. If the semaphore count
drops to zero, which means no available resources, threads attempting to decrement the
semaphore will block until the count is greater than zero.
30.  How do you remove the semaphores and shared memory? Semdestroy
31.  What are .b files? Batch files.
32.  How do you take a process already started as a foreground process to background? Give cntrl Z bg
33.  What is daemon? How do you write a daemon program? Daemon is a thread running all the time.
It started only once. We can write by giving nohups.
34.  What is a cron daemon? Job scheduler
35.  How do you list the semaphores and shared memory?
      Semaphores are listed with ls –s. Shared memories are listed with ls –m
36.  How to connect to another terminal? rlogin or telnet.
37.  Difference between http and https?'
       http is a non secured protocol. https is secured protocol.
38.  What is SSL? Secured socket layer.
39.  What is the standard port no of HTTP? 80
40.  What is the standard port no of HTTPS?  443
41.  What are plugins? 
       Additional features to a browser/ server. You can call your own c++ code here.
42.  What is CGI? Common Gateway Interface.
43.  What is $?  ?
       ‘$?’ returns code from the previous executed command.
44.  Everything about sed and awk.
45.  All deamons, specially cron. How will you schedule your command using cron deamon? 
      Store the command in commandfile(cmdfile) .  crontab cmdfile.
46.  What are the first 3 fields in the cron tab file? Time (i.e. minutes hour dayof the month month of
the year day of the week)
47.  What will cat<file1>file2 do give another command for it? 
      Copies the content of file1 to file2.            Cp file1 file2
48.  How do you check if the process running is taking a long time? Top
49.  What are background and foreground processes?
     Background processes are the one which runs even after exiting from the shell. Foreground
processes are interactive.

50.  How do you choose a column from a file? Cut –c                                                                                                              


51.  What are the different types of shell? Bourne shell, Korn shell, C-shell, Bash-shell
52.  How do you inverse the occurrence of a regular expression? Grep –v
53.  How would you swap fields in a file with ‘ ‘(blank) as a delimeter? Awk f ‘{print $3 $2 }’ filename
54.  How do you compress a file? What is the syntax? What does pack –f do? 
      By giving the command called compress. Compress filename.
55.  What is nohup? What is syntad? Where does the output go, if you do not mention the output
filename Nohup command. EX:- nohup sort emp.     If we are not redirecting the output of our
background proces then the result is stored in nohup.out
56.  What is mail? How do you automate to send a mail after the end of a process? 
      Mail is a way to circulate memos and other written information to  your co-workers if you
and they use networked computers.
57.  What is the internal value associated with the standard error device?  2
58.  How do you execute a command on a different machine without loging on to it? rlogin
59.  What does this meand: ${variable-value}?
60.  What does this meand: ${variable? Value}?
61.  How will you archive a file?  Tar cvf
62.  Which signal is sent to the kernel, when you kill a process?  Terminate.
63.  What is the difference between TCP/IP and OSI model?
OSI contains 7 layers
TCP/IP has 5 layers
64.  what is ARP (Address resolution protocol)?
65.  How do you find out which version of UNIX you are working with?  Uname –v
66.  What are environment variables? $HOME, $PATH                                                                   
67. What is the significance of $HOME and $PATH?
       Searches in all the specified set PATH for the given thing 
Takes to the root directory where you are logged in
68.  How do you display the environment variables? Env
69.  How will you know what version of a program is being picked up from among the paths described
in the $PATH variable? 
70.  What is crontab? 
    It is a command which is used to carry out jobs everyday for years together without needing
any prompting for us. This will excel over at and batch.
71.  What is inittab?   
      This file is  called when we start the machine which in turn calls the necessary processes or
files.
72.  What is memory leakage?  
      Not releasing the memory or acquiring extra memory.
73.  How do you replace the occurrence of a string with  some other value using sed? 
      Sed ‘s/string/value/’ filename.
74.  What would you do if you want zero mapped to a and one mapped to b in a string? Tr[ab][01]
75.  What is archiving? Compressing
76.  What are signals?  Interrupts
77.  What are the five basic IPCs?
        Interprocess communications
        Pipe
        Fifo
78.  What is shared segment memory?
79.  What is  a web server?  Where browser requests can be met.
80.  How will you determine the path of travel between two hosts in a n/w?  tracert
81.  What are the seven layers of  TCP/IP?
82.  What is a mac address?
83.  Given a table containing several columns, write a shell program to sum up 2 nd and 3rd column data.
84.  What is the difference between TCP/IP & UDP sockets? Write the series of steps showing the
lifecycle of a socket.
85.  Which Shell your working and how do you find  your in which Shell?
86.  How do you see the process  consuming more CPU time?
87.  How do you find out which version of UNIX you are working with?
88.  How do you find out the Average Disk used by the all the files in the current directory?
89.  Difference between thread and process
90.  How to find process-> CPU consumption on SMP environment(multi processor environment)
91.  List various options of netstat
92.  How to find swap space consumption for each process
93.  Difference between vmstat and top
94.  Why threads are listed as a separate processes,  when we do ps –el
95.  What are the different types of shells that you have used?
96.  Where the major configuration files stored in UNIX ?
97.  Explain some of the shell scripts that you have written?
98.  Explain sed command?
99.  What is the regular expression for a number of length 15 digits?
100.  What is the regular expression for a String of length 15 which does not have digits.
101.    How to get no of records and no of fields in awk ?

Types of ETL Testing :-

   1)       Constraint Testing:
In the phase of constraint testing, the test engineers identifies whether the data is
mapped from source to target or not.
The Test Engineer follows the below scenarios in ETL Testing process.
a)      NOT NULL
b)      UNIQUE
c)       Primary Key
d)      Foreign key
e)      Check
f)       Default
g)       NULL
   2)      Source to Target Count Testing:
In the Source to Target data is matched or not. A Tester can check in this view whether
it is ascending order or descending  order it doesn’t matter .Only count is required for
Tester.
Due to lack of time a tester can follow this type of Testing.
  
   3)      Source to Target Data Validation Testing:
In this Testing, a tester can validate the each and every point of the source to
target data.
Most of the financial projects, a tester can identify the decimal factors.

   4)      Threshold/Data Integrated Testing:


In this Testing, the Ranges of the data, A test Engineer can usually identifies the
population calculation and share marketing and business finance analysis (quarterly,
halferly, Yearly)

MIN       MAX     RANGE
4              10           6
                                            
   5)      Field to Field Testing:
In the field to field testing, a test engineer can identify that how much space is occupied
in the database. The data is integrated in the table cum datatypes.

NOTE: To check the order of the columns and source column to target column.

   6)      Duplicate Check Testing:


In this phase of ETL Testing, a Tester can face duplicate value very frequently so, at that
time the tester follows database queries why because huge amount of data is present in
source and Target tables.
Select ENO, ENAME, SAL, COUNT (*) FROM EMP GROUP BY ENO, ENAME, SAL
HAVING COUNT (*) >1;

Note: 
1)      There are no mistakes in Primary Key or no Primary Key is allotted then the duplicates
may arise.
2)      Sometimes, a developer can do mistakes while transferring the data from source to
target at that time duplicates may arise.
3)      Due to Environment Mistakes also duplicates arise (Due to improper plugins in the tool).
   7)      Error/Exception Logical Testing:
1)      Delimiter is available in Valid Tables
2)      Delimiter is not available in invalid tables(Exception Tables)

   8)      Incremental and Historical Process Testing:


In the Incremental data, the historical data is not corrupted. When the historical data is
corrupted then this is the condition where bugs raise.

   9)      Control Columns and Defect Values Testing:


This is introduced by IBM
   10)   Navigation Testing:
Navigation Testing is the End user point of view testing. An end user cannot follow the
friendly of the application that navigation is called as bad or poor Navigation.
                At the time of Testing, A tester can identify this type of navigation scenarios to
avoid unnecessary navigation.

   11)   Initialization testing:
A combination of hardware and software installed in platform is called the Initialization
Testing

   12)    Transformation Testing:
At the time of mapping from source table to target table, Transformation is not in
mapping condition, then the Test Engineer raises bugs.

   13)   Regression Testing:
Code modification to fix a bug or to implement a new functionality which makes us to to
find errors.
 These introduced errors are called regression . Identifying for regression effect is called
regression testing.

   14)   Retesting:
Re executing the failed test cases after fixing the bug.

   15)     System Integration Testing:


Integration testing: After the completion of programming process . Developer can
integrate the modules there are 3 models
a)      Top Down
b)      Bottom Up
c)       Hybrid

What is Metadata?
Metadata is defined as data that describes other data. Metadata can be divided into two main types:
structural and descriptive.

Structural metadata describes the design structure and their specifications. This type of metadata
describes the containers of data within a database.

Descriptive metadata describes instances of application data. This is the type of metadata that is
traditionally spoken of and described as “data about the data.”

A third type is sometime identified called Administrative metadata. Administrative metadata


provides information that helps to manage other information, such as when and how a resource was
created, file types and other technical information.

Metadata makes it easier to retrieve, use, or manage information resources by providing users with
information that adds context to the data they’re working with. Metadata can describe information at
any level of aggregation, including collections, single resources, or component part of a single
resource. Metadata can be embedded into a digital object or can be stored separately. Web pages
contain metadata called metatags.

Metadata at the most basic level is simply defined as “data about data”. An item of metadata
describes the specific characteristics about an individual data item. In the database realm, metadata
is defined as, “data about data, through which the end-user data are integrated and managed.”  
Metadata in a database typically store the relationships that link up numerous pieces of data. 
“Metadata names these fields, describes the size of the fields, and may put restrictions on what can
go in the field (for example, numbers only).”

“Therefore, metadata is information about how data is extracted, and how it may be transformed. It
is also about indexing and creating pointers into data. Database design is all about defining metadata
schemas.”  Meta data can be stored either internally, in the same file as the data, or externally, in a
separate area.  If the data is stored internally, the metadata is together with the data, making it more
easily accessible to view or change. However, this method creates high redundancy. If metadata is
stored externally, the searches can become more efficient. There is no redundancy but getting to this
metadata may be a little more technical.

All the metadata is stored in a data dictionary or a system catalog. The data dictionary is most
typically an external document that is created in a spreadsheet type of document that stores the
conceptual design ideas for the database schema.  The data dictionary also contains the general
format that the data, and in effect the metadata, should be.  Metadata is an essential aspect to
database design, it allows for increased processing power, due to the fact that it can help create
pointers and indexes.

What is Database Testing?


Testing the backend databases like comparing   the actual results   with expected results.
Data base testing basically include the following.
1) Data validity testing.
2) Data Integrity testing
3) Performances related to database.
4) Testing of Procedure, triggers and functions.
       For doing data validity testing you should be good in SQL queries
       For data integrity testing you should know about referential integrity and different constraint.
       For performance related things you should have idea about the table structure and design.
       For testing Procedure triggers and functions you should be able to understand the same.

*******************************************************************************

Difference between Database Testing and Data Warehouse Testing?


There is a popular misunderstanding that database testing and data warehouse is similar while the
fact is that both hold different direction in testing.

 Database testing is done using smaller scale of data normally with OLTP (Online transaction
processing) type of databases while data warehouse testing is done with large volume with data
involving OLAP (online analytical processing) databases.

 In database testing normally data is consistently injected from uniform sources while in data
warehouse testing most of the data comes from different kind of data sources which are sequentially
inconsistent.

We generally perform only CRUD (Create, read, update and delete) operation in database testing
while in data warehouse testing we use read-only (Select) operation.

Normalized databases are used in DB testing while demoralized DB is used in data warehouse testing.
There are number of universal verifications that have to be carried out for any kind of data
warehouse testing.

Below is the list of objects that are treated as essential for validation in ETL testing:
  Verify that data transformation from source to destination works as expected
  Verify that expected data is added in target system
  Verify that all DB fields and field data is loaded without any truncation
  Verify data checksum for record count match
  Verify that for rejected data proper error logs are generated with all details
  Verify NULL value fields
  Verify that duplicate data is not loaded
  Verify data integrity

***************************************************************************

What is ETL TESTING?

ETL basically stands for Extract Transform Load - which simply implies the process where you extract
data from Source Tables, transform them in to the desired format based on certain rules and finally
load them onto Target tables. There are numerous tools that help you with ETL process - Informatica,
Control-M being a few notable ones.

So ETL Testing implies - Testing this entire process using a tool or at table level with the help of test
cases and Rules Mapping document.

In ETL Testing, the following are validated -


1) Data File loads from Source system on to Source Tables.
2) The ETL Job that is designed to extract data from Source tables and then move them to staging
tables. (Transform process)
3) Data validation within the Staging tables to check all Mapping Rules / Transformation Rules are
followed.
4) Data Validation within Target tables to ensure data is present in required format and there is no
data loss from Source to Target tables.

Extract
In this step we extract data from different internal and external sources, structured and/or
unstructured. Plain queries are sent to the source systems, using native connections, message
queuing, ODBC or OLE-DB middleware. The data will be put in a so-called Staging Area (SA), usually
with the same structure as the source. In some cases we want only the data that is new or has been
changed, the queries will only return the changes. Some tools can do this automatically, providing a
changed data capture (CDC) mechanism.

Transform
Once the data is available in the Staging Area, it is all on one platform and one database. So we can
easily  join and union tables, filter and sort the data using specific attributes, pivot to another
structure and make business calculations. In this step of the ETL process, we can check on data
quality and cleans the data if necessary. After having all the data prepared, we can choose to
implement slowly changing dimensions. In that case we want to keep track in our analysis and
reports when attributes changes over time, for example a customer moves from one region to
another.

Load
Finally, data is loaded into a central warehouse, usually into fact and dimension tables. From there
the data can be combined, aggregated and loaded into datamarts or cubes as is deemed necessary.

*******************************************************************************

What are cubes?

Multi dimensional data is logically represented by Cubes in data warehousing. The dimension and the
data are represented by the edge and the body of the cube respectively. OLAP environments view the
data in the form of hierarchical cube. A cube typically includes the aggregations that are needed for
business intelligence queries.

*******************************************************************************

OLTP vs. OLAP

We can divide IT systems into transactional (OLTP) and analytical (OLAP). In general we can assume
that OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze
it.
- OLTP (On-line Transaction Processing) is characterized by a large number of short on-line
transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast
query processing, maintaining data integrity in multi-access environments and an effectiveness
measured by number of transactions per second. In OLTP database there is detailed and current
data, and schema used to store transactional databases is the entity model (usually 3NF).

- OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions.


Queries are often very complex and involve aggregations. For OLAP systems a response time is an
effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP
database there is aggregated, historical data, stored in multi-dimensional schemas (usually star
schema).

OLTP System OLAP System


Online Transaction Processing Online Analytical Processing
(Operational System) (Data Warehouse)
Operational data; OLTPs are the Consolidation data; OLAP data comes
Source of data
original source of the data. from the various OLTP Databases
To control and run fundamental To help with planning, problem solving,
Purpose of data
business tasks and decision support
Reveals a snapshot of ongoing Multi-dimensional views of various kinds
What the data
business processes of business activities
Short and fast inserts and updates Periodic long-running batch jobs refresh
Inserts and Updates
initiated by end users the data
Relatively standardized and simple
Often complex queries involving
Queries queries Returning relatively few
aggregations
records
Depends on the amount of data involved;
batch data refreshes and complex queries
Processing Speed Typically very fast
may take many hours; query speed can
be improved by creating indexes
Larger due to the existence of
Space Can be relatively small if historical
aggregation structures and history data;
Requirements data is archived
requires more indexes than OLTP
Typically de-normalized with fewer tables;
Database Design Highly normalized with many tables
use of star and/or snowflake schemas
Backup religiously; operational data Instead of regular backups, some
Backup and is critical to run the business, data environments may consider simply
Recovery loss is likely to entail significant reloading the OLTP data as a recovery
monetary loss and legal liability method
*******************************************************************************  

What is Business Intelligence?

Business intelligence, or BI for short, is an umbrella term that refers to competencies, processes,
technologies, applications and practices used to support evidence-based decision making in
organizations. In the widest sense it can be defined as a collection of approaches for gathering,
storing, analyzing and providing access to data that helps users to gain insights and make better fact-
based business decisions.

BI used for?
Organizations use Business Intelligence to gain data-driven insights on anything related to business
performance. It is used to understand and improve performance and to cut costs and identify new
business opportunities, this can include, among many other things:

  Analyzing customer behaviors, buying patterns and sales trends.


  Measuring, tracking and predicting sales and financial performance
  Budgeting and financial planning and forecasting
  Tracking the performance of marketing campaigns
  Optimizing processes and operational performance
  Improving delivery and supply chain effectiveness
  Web and e-commerce analytics
  Customer relationship management
  Risk analysis
  Strategic value driver analysis
  
  
 

Basics of Business Intelligence

Gathering Data
Gathering data is concerned with collecting or accessing data which can then be used to inform
decision making. Gathering data can come in many formats and basically refers to the automated
measurement and collection of performance data. For example, these can come from transactional
systems that keep logs of past transactions, point-of-sale systems, web site software, production
systems that measure and track quality, etc. A major challenge of gathering data is making sure that
the relevant data is collected in the right way at the right time. If the data quality is not controlled at
the data gathering stage then it can harm the entire BI efforts that might follow – always remember
the old adage - garbage in garbage out

Storing Data
Storing Data is concerned with making sure the data is filed and stored in appropriate ways to ensure
it can be found and used for analysis and reporting. When storing data the same basic principles
apply that you would use to store physical goods – say books in a library – you are trying to find the
most logical structure that will allow you to easily find and use the data. The advantages of modern
data-bases (often called data warehouses because of the large volumes of data) is that they allow
multi-dimensional formats so you can store the same data under different categories – also called
data marts or data-warehouse access layers. Like in the physical world, good data storage starts with
the needs and requirements of the end users and a clear understanding of what they want to use the
data for.

Analyzing Data
The next component of BI is analysing the data. Here we take the data that has been gathered and
inspect, transform or model it in order to gain new insights that will support our business decision
making. Data analysis comes in many different formats and approaches, both quantitative and
qualitative. Analysis techniques includes the use of statistical tools, data mining approaches as well as
visual analytics or even analysis of unstructured data such as text or pictures.

Providing Access
In order to support decision making the decision makers need to have access to the data. Access is
needed to perform analysis or to view the results of the analysis. The former is provided by the latest
software tools that allow end-users to perform data analysis while the latter is provided through
reporting, dashboard and scorecard applications.

*******************************************************************************

You might also like