Etl Imp
Etl Imp
An2:
Database testing involves some in depth knowledge of the given application and requires more defined
plan of approach to test the data.
Key issues include:
1) Data Integrity
2) Data Validity
3) Data Manipulation and updates
Tester must be aware of the database design concepts and implementation rules.
An3:
Data bas testing basically include the following.
1)Data validity testing.
2)Data Integritity testing
3)Performance related to data base.
4)Testing of Procedure,triggers and functions.
for doing data validity testing you should be good in SQL queries
For data integrity testing you should know about referintial integrity and different constraint.
For performance related things you should have idea about the table structure and design.
for testing Procedure triggers and functions you should be able to understand the same.
How to test data loading in Data base testing?
You have to do the following things while you are involving in Data Load testing.
1. You have know about source data (table(s), columns, datatypes and Contraints)
2. You have to know about Target data (table(s), columns, datatypes and Contraints)
3. You have to check the compatibility of Source and Target.
4. You have to Open corresponding DTS package in SQL Enterprise Manager and run the DTS package
(If you are using SQL Server).
5. Then you should compare the column's data of Source and Target.
6. You have to check the number to rows of Source and Target.
7. Then you have to update the data in Source and see the change is reflecting in Target or not.
8. You have to check about junk character and NULLs.
Explain Database testing?
An1:
here database testing means test engineer should test the data integrity, data accessing,query
retriving,modifications,updation and deletion etc
An2:
Database tests are supported via ODBC using the following functions:
SQLOpen, SQLClose, SQLError, SQLRetrieve, SQLRetrieveToFile, SQLExecQuery, SQLGetSchema and
SQLRequest.
You can carry out cursor type operations by incrementing arrays of returned datasets.
All SQL queries are supplied as a string. You can execute stored procedures for instance on SQL Server
you could use �Exec MyStoredProcedure� and as long as that stored procedure is registered on the
SQL Server database then it should execute however you cannot interact as much as you may like by
supplying say in/out variables, etc but for most instances it will cover your database test requirements
An3:
Data bas testing basically include the following.
1)Data validity testing.
2)Data Integritity testing
3)Performance related to data base.
4)Testing of Procedure,triggers and functions.
for doing data validity testing you should be good in SQL queries
For data integrity testing you should know about referintial integrity and different constraint.
For performance related things you should have idea about the table structure and design.
for testing Procedure triggers and functions you should be able to understand the same.
An4:
Data base testing generally deals with the follwoing:
a)Checking the integrity of UI data with Database Data
b)Checking whether any junk data is displaying in UI other than that stored in Database
c)Checking execution of stored procedures with the input values taken from the database tables
d)Checking the Data Migration .
e)Execution of jobs if any
What is data driven test?
An1:
Data driven test is used to test the multinumbers of data in a data-table, using this we can easy to
replace the paramerers in the same time from deferent locations.
e.g: using .xsl sheets.
An2:
Re-execution of our test with different input values is called Re-testing. In validate our Project
calculations, test engineer follows retesting manner through automation tool.Re-teting is also called
DataDriven Test.There are 4 types of datadriven tests.
1) Dynamic Input submissiion ( key driven test) : Sometines a test engineer conducts retesting with
different input values to validate the calculation through dynamic submission.For this input submission,
test engineer use this function in TSL scriipt-- create_input_dialog ("label");
2) Data Driven Files Through FLAT FILES ( .txt,.doc) : Sometimes testengineer conducts re-testing
depends on flat file contents. He collect these files from Old Version databases or from customer side.
3)Data Driven Tests From FRONTEND GREAVES : Some times a test engineer create automation scripts
depends on frontend objects values such as (a) list (b) menu (c) table (d) data window (5) ocx etc.,
4)Data Driven Tests From EXCEL SHEET : sometimes a testengineer follows this type of data driven test
to execute their script for multiple inputs. These multiple inputs consists in excel sheet columns. We
have to collect this testdata from backend tables .
How to Test Database Procedures and Triggers?
Before testing Data Base Procedures and Triggers, Tester should know that what is the Input and out
put of the procedures/Triggers, Then execute� Procedures and� Triggers, if you get answer that�
Test Case will be� pass� other wise fail.
These requirements should get from DEVELOPER
How to test a DTS package created for data insert update and delete? What should be
considered in the above case while testing it?W hat conditions are to be checked if the data is
inserted, updated or deleted using a text files?
Data Integrity checks should be performed. IF the database schema is 3rd normal form, then that
should be maintained. Check to see if any of the constraints have thrown an error. The most important
command will have to be the DELETE command. That is where things can go really wrong.
Most of all, maintain a backup of the previous database.
How do you test whether a database in updated when information is entered in the front
end?
It depend on your application interface..
1. If your application provides view functionality for the entered data, then you can verify that from front
end only. Most of the time Black box test engineers verify the functionality in this way.
2. If your application has only data entry from front end and there is no view from front end, then you
have to go to Database and run relevent SQL query.
An2:
To write testcase for database its just like functional testing.
1.Objective:Write the objective that you would like to test. eg: To check the shipment that i load thru
xml is getting inserted for perticular customer.
2.Write the method of input or action that you do. eg:Load an xml with all data which can be added to a
customer.
3.Expected :Input should be viewd in database. eg: The shipment should be loaded sucessfully for that
customer,also it should be seen in application.4.You can write ssuch type of testcases for any
functionality like update,delete etc.
An3:
At first we need to go through the documents provided.
Need to know what tables, stored procedures are mentioned in the doc.
Then test the functionality of the application.
Simultaneously, start writing the DB testcases.. with the queries you have used at the backend while
testing, the tables and stored procedures you have used in order to get the desired results. Trigers that
were fired. Based on the stored procedure we can know the functionality for a specific peice of the
application. So that we can write queries related to that. From that we make DB testcases also.
What we normally check for the Database Testing?
An1:
In DB testing we need to check for,
1. The field size validation
2. Check constraints.
3. Indexes are done or not (for performance related issues)
4. Stored procedures
5. The field size defined in the application is matching with that in the db.
An2:
Database testing involves some indepth knowledge of the given application and requires more defined
plan of approach to test the data. Key issues include :
1) data Integrity
2) data validity
3) data manipulation and updates.
Tester must be aware of the database design concepts and implementation rules
What is your tuning approach if SQL query taking long time? Or how do u tune
SQL query?
If query taking long time then First will run the query in Explain Plan, The
explain plan process stores data in the PLAN_TABLE.
it will give us execution plan of the query like whether the query is using the
relevant indexes on the joining columns or indexes to support the query are
missing.
If joining columns doesn’t have index then it will do the full table scan if it is full
table scan the cost will be more then will create the indexes on the joining
columns and will run the query it should give better performance and also
needs to analyze the tables if analyzation happened long back. The ANALYZE
statement can be used to gather statistics for a specific table, index or cluster
using
ANALYZE TABLE employees COMPUTE STATISTICS;
If still have performance issue then will use HINTS, hint is nothing but a clue.
We can use hints like
ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousingsystems.
(/*+ ALL_ROWS */)
FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.
(/*+ FIRST_ROWS */)
CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS, based
on statistics gathered.
HASH
Hashes one table (full scan) and creates a hash index for that table. Then
hashes other table and uses hash index to find corresponding records. Therefore
not suitable for < or > join conditions.
/*+ use_hash */
In DWH materialized views are very essential because in reporting side if we do
aggregate calculations as per the business requirement report performance
would be de graded. So to improve report performance rather than doing report
calculations and joins at reporting side if we put same logic in the MV then we
can directly select the data from MV without any joins and aggregations. We can
also schedule MV (Materialize View).
Inline view:
If we write a select statement in from clause that is nothing but inline view.
Ex:
Get dept wise max sal along with empname and emp no.
Select a.empname, a.empno, b.sal, b.deptno
From EMP a, (Select max (sal) sal, deptno from EMP group by deptno) b
Where
a.sal=b.sal and
a.deptno=b.deptno
Posted by Sandeep Reddy at 5:26 AM No comments:
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: FAQ'S, Oracle
The WHERE clause cannot be used to restrict groups. you use the
HAVING clause to restrict groups.
Posted by Sandeep Reddy at 5:25 AM No comments:
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: FAQ'S, Oracle
ROWNUM
For each row returned by a query, the ROWNUM pseudo column returns a
number indicating the order in which Oracle selects the row from a table or set
of joined rows. The first row selected has a ROWNUM of 1, the second has 2, and
so on.
You can use ROWNUM to limit the number of rows returned by a query, as in this
example:
Rowid Row-num
Rowid is an oracle internal id that is Row-num is a row number returned
allocated every time a new record is by a select statement.
inserted in a table. This ID is unique
and cannot be changed by the user.
Rowid is permanent. Row-num is temporary.
Rowid is a globally unique identifier The row-num pseudocoloumn returns
for a row in a database. It is created a number indicating the order in
at the time the row is inserted into which oracle selects the row from a
the table, and destroyed when it is table or set of joined rows.
removed from a table.
Posted by Sandeep Reddy at 5:24 AM No comments:
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: FAQ'S, Oracle
A row is in first normal form (1NF) if all underlying domains contain atomic
values only.
Eliminate duplicative columns from the same table.
Create separate tables for each group of related data and identify each
row with a unique column or set of columns (the primary key).
Second Normal Form:
An entity is in Third Normal Form (3NF) when it meets the requirement of being
in Second Normal Form (2NF) and additionally:
Functional dependencies on non-key fields are eliminated by putting them
in a separate table. At this level, all non-key fields are dependent on the primary
key.
A row is in third normal form if and only if it is in second normal form and
if attributes that do not contribute to a description of the primary key are move
into a separate table. An example is creating look-up tables.
Boyce-Codd Normal Form:
Boyce Codd Normal Form (BCNF) is a further refinement of 3NF. In his later
writings Codd refers to BCNF as 3NF. A row is in Boyce Codd normal form if, and
only if, every determinant is a candidate key. Most entities in 3NF are already in
BCNF.
Fourth Normal Form:
TYPES
Ex:
SQL> select * from emp where sal > (select sal from emp where
empno = 7566);
In multi row subquery, it will return more than one value. In such cases
we should include operators like any, all, in or not in between the
comparision operator and the subquery.
Ex:
SQL> select * from emp where sal > any (select sal from emp where
sal between 2500 and
4000);
SQL> select * from emp where sal > all (select sal from emp where
sal between 2500 and
4000);
EMPNO ENAME JOB MGR HIREDATE SAL COMM
DEPTNO
---------- ---------- --------- ---------- ------------- ------ ----------
----------
7839 KING PRESIDENT 17-NOV-81
5000 10
MULTIPLE SUBQUERIES
There is no limit on the number of subqueries included in a where
clause. It allows nesting of a query within a subquery.
Ex:
SQL> select * from emp where sal = (select max(sal) from emp
where sal < (select
max(sal) from emp));
CORRELATED SUBQUERIES
EXISTS
Exists function is a test for existence. This is a logical test for the return
of rows from a query.
Ex:
Suppose we want to display the department numbers which has
more than 4 employees.
SQL> select deptno,count(*) from emp group by deptno having
count(*) > 4;
DEPTNO COUNT(*)
--------- ----------
20 5
30 6
From the above query can you want to display the names of
employees?
SQL> select deptno,ename, count(*) from emp group by
deptno,ename having count(*) > 4;
The above query returns nothing because combination of deptno and
ename never return
more than one count.
The solution is to use exists which follows.
SQL> select deptno,ename from emp e1 where exists (select * from
emp e2
where e1.deptno=e2.deptno group by e2.deptno having
count(e2.ename) > 4) order by
deptno,ename;
DEPTNO ENAME
---------- ----------
20 ADAMS
20 FORD
20 JONES
20 SCOTT
20 SMITH
30 ALLEN
30 BLAKE
30 JAMES
30 MARTIN
30 TURNER
30 WARD
NOT EXISTS
DEPTNO ENAME
--------- ----------
10 CLARK
10 KING
JOINS
The purpose of a join is to combine the data across tables.
A join is actually performed by the where clause which combines the
specified rows of tables.
If a join involves in more than two tables then oracle joins first two
tables based on the joins condition and then compares the result with
the next table and so on.
TYPES
Equi join
Non-equi join
Self join
Natural join
Cross join
Outer join
Ø Left outer
Ø Right outer
Ø Full outer
Inner join
Using clause
On clause
EQUI JOIN
Ex:
SQL> select empno,ename,job,dname,loc from emp e,dept d where
e.deptno=d.deptno;
USING CLAUSE
NON-EQUI JOIN
A join which contains an operator other than ‘=’ in the joins condition.
Ex:
SQL> select empno,ename,job,dname,loc from emp e,dept d where
e.deptno > d.deptno;
SELF JOIN
Ex:
SQL> select e1.empno,e2.ename,e1.job,e2.deptno from emp e1,emp
e2 where
e1.empno=e2.mgr;
Ex:
SQL> select empno,ename,job,dname,loc from emp natural join dept;
CROSS JOIN
Ex:
SQL> select empno,ename,job,dname,loc from emp cross join dept;
OUTER JOIN
Outer join gives the non-matching records along with matching records.
This will display the all matching records and the records which are in
left hand side table those that are not in right hand side table.
Ex:
SQL> select empno,ename,job,dname,loc from emp e left outer join
dept d
on(e.deptno=d.deptno);
Or
SQL> select empno,ename,job,dname,loc from emp e,dept d where
e.deptno=d.deptno(+);
This will display the all matching records and the records which are in
right hand side table those that are not in left hand side table.
Ex:
SQL> select empno,ename,job,dname,loc from emp e right outer join
dept d
on(e.deptno=d.deptno);
Or
SQL> select empno,ename,job,dname,loc from emp e,dept d where
e.deptno(+) = d.deptno;
This will display the all matching records and the non-matching records
from both tables.
Ex:
SQL> select empno,ename,job,dname,loc from emp e full outer join
dept d
on(e.deptno=d.deptno);
INNER JOIN
Ex:
SQL> select empno,ename,job,dname,loc from emp inner join dept
using(deptno);
ADVANTAGES
SEQUENCE
Syntax:
Create sequence <seq_name> [increment bty n] [start with n]
[maxvalue n] [minvalue n]
[cycle/nocycle] [cache/nocache];
Ex:
SQL> create sequence s;
SQL> create sequence s increment by 10 start with 100 minvalue 5
maxvalue 200 cycle
cache 20;
USING SEQUENCE
ALTERING SEQUENCE
Ex:
SQL> alter sequence s minvalue 5;
SQL> alter sequence s increment by 2;
SQL> alter sequence s cache 10;
DROPPING SEQUENCE
SQL> drop sequence s;
SET OPERATORS
TYPES
Ø Union
Ø Union all
Ø Intersect
Ø Minus
UNION
This will combine the records of multiple tables having the same
structure.
Ex:
SQL> select * from student1 union select * from student2;
UNION ALL
This will combine the records of multiple tables having the same
structure but including duplicates.
Ex:
SQL> select * from student1 union all select * from student2;
INTERSECT
This will give the common records of multiple tables having the same
structure.
Ex:
SQL> select * from student1 intersect select * from student2;
MINUS
This will give the records of a table whose records are not in other
tables having the same structure.
Ex:
SQL> select * from student1 minus select * from student2;
USING ROLLUP
This will give the salaries in each department in each job category along
wih the total salary
fot individual departments and the total salary of all the departments.
USING GROUPING
In the above query it will give the total salary of the individual
departments but with a
blank in the job column and gives the total salary of all the departments
with blanks in
deptno and job columns.
To replace these blanks with your desired string grouping will be used
SQL> select decode(grouping(deptno),1,'All
Depts',deptno),decode(grouping(job),1,'All
jobs',job),sum(sal) from emp group by rollup(deptno,job);
DECODE(GROUPING(DEPTNO),1,'ALLDEPTS',DEP DECODE(GR
SUM(SAL)
----------------------------------- ----------------------------------
--------------
10 CLERK 1300
10 MANAGER 2450
10 PRESIDENT 5000
10 All jobs 8750
20 ANALYST 6000
20 CLERK 1900
20 MANAGER 2975
20 All jobs 10875
30 CLERK 950
30 MANAGER 2850
30 SALESMAN 5600
30 All jobs 9400
All Depts All jobs
29025
USING CUBE
This will give the salaries in each department in each job category, the
total salary for individual departments, the total salary of all the
departments and the salaries in each job category.
SQL> select decode(grouping(deptno),1,’All
Depts’,deptno),decode(grouping(job),1,’All
Jobs’,job),sum(sal) from emp group by cube(deptno,job);
DECODE(GROUPING(DEPTNO),1,'ALLDEPTS',DEP DECODE(GR
SUM(SAL)
----------------------------------- ------------------------------------
------------
10 CLERK 1300
10 MANAGER 2450
10 PRESIDENT 5000
10 All Jobs 8750
20 ANALYST 6000
20 CLERK 1900
20 MANAGER 2975
20 All Jobs 10875
30 CLERK 950
30 MANAGER 2850
30 SALESMAN 5600
30 All Jobs 9400
All Depts ANALYST
6000
All Depts CLERK 4150
All Depts MANAGER
8275
All Depts PRESIDENT
5000
All Depts SALESMAN
5600
Ex:
SQL> select deptno, sum(sal) from emp group by deptno;
HAVING
This will work as where clause which can be used only with group by
because of absence of where clause in group by.
Ex:
SQL> select deptno,job,sum(sal) tsal from emp group by deptno,job
having sum(sal) > 3000;
ORDER OF EXECUTION
TYPES
Ø Range partitions
Ø List partitions
Ø Hash partitions
Ø Sub partitions
ADVANTAGES
DISADVANTAGES
Ø Partitioned tables cannot contain any columns with long or long raw
datatypes, LOB types or object types.
RANGE PARTITIONS
** if you are using maxvalue for the last partition, you can not add a
partition.
b) Inserting records into range partitioned table
SQL> Insert into student values(1,’a’); -- this will go to p1
SQL> Insert into student values(11,’b’); -- this will go to p2
SQL> Insert into student values(21,’c’); -- this will go to p3
SQL> Insert into student values(31,’d’); -- this will go to p4
c) Retrieving records from range partitioned table
SQL> Select *from student;
SQL> Select *from student partition(p1);
d) Possible operations with range partitions
v Add
v Drop
v Truncate
v Rename
v Split
v Move
v Exchange
e) Adding a partition
SQL> Alter table student add partition p5 values less than(40);
f) Dropping a partition
SQL> Alter table student drop partition p4;
g) Renaming a partition
SQL> Alter table student rename partition p3 to p6;
h) Truncate a partition
SQL> Alter table student truncate partition p6;
i) Splitting a partition
SQL> Alter table student split partition p2 at(15) into (partition
p21,partition p22);
j) Exchanging a partition
SQL> Alter table student exchange partition p1 with table student2;
k) Moving a partition
SQL> Alter table student move partition p21 tablespace saketh_ts;
LIST PARTITIONS
HASH PARTITIONS
Ex:
1) We can create varrays using oracle types as well as user defined
types.
a) Varray using pre-defined types
SQL> Create type va as varray(5) of varchar(10);/
b) Varrays using user defined types
SQL> Create type addr as object(hno number(3),city
varchar(10));/
SQL> Create type va as varray(5) of addr;/
2) Using varray in table
SQL> Create table student(no number(2),name
varchar(10),address va);
3) Inserting values into varray table
SQL> Insert into student values(1,’sudha’,va(addr(111,’hyd’)));
SQL> Insert into student
values(2,’jagan’,va(addr(111,’hyd’),addr(222,’bang’)));
4) Selecting data from varray table
SQL> Select * from student;
-- This will display varray column data along with varray and
adt;
SQL> Select no,name, s.* from student s1, table(s1.address)
s;
-- This will display in general format
5) Instead of s.* you can specify the columns in varray
SQL> Select no,name, s.hno,s.city from student
s1,table(s1.address) s;
A nested table is, as its name implies, a table within a table. In this case
it is a table that is represented as a column within another table.
Nested table has the same effect of varrays but has no limit.
Ex:
1) We can create nested tables using oracle types and user defined
types which has no limit
a) Nested tables using pre-defined types
SQL> Create type nt as table of varchar(10);/
b) Nested tables using user defined types
SQL> Create type addr as object(hno number(3),city
varchar(10));/
SQL> Create type nt as table of addr;/
2) Using nested table in table
SQL> Create table student(no number(2),name
varchar(10),address nt) nested table
address store as student_temp;
3) Inserting values into table which has nested table
SQL> Insert into student values (1,’sudha’,nt(addr(111,’hyd’)));
SQL> Insert into student values
(2,’jagan’,nt(addr(111,’hyd’),addr(222,’bang’)));
4) Selecting data from table which has nested table
SQL> Select * from student;
-- This will display nested table column data along with nested
table and adt;
SQL> Select no,name, s.* from student s1, table(s1.address)
s;
-- This will display in general format
5) Instead of s.* you can specify the columns in nested table
SQL> Select no,name, s.hno,s.city from student
s1,table(s1.address) s;
6) Inserting nested table data to the existing row
SQL> Insert into table(select address from student where
no=1)
values(addr(555,’chennai’));
7) Update in nested tables
SQL> Update table(select address from student where no=2) s
set s.city=’bombay’ where
s.hno = 222;
8) Delete in nested table
While adding constraints you need not specify the name but the type
only, oracle will internally name the constraint.
If you want to give a name to the constraint, you have to use the
constraint clause.
NOT NULL
Ex:
SQL> create table student(no number(2) not null, name varchar(10),
marks number(3));
SQL> create table student(no number(2) constraint nn not null,
name varchar(10), marks
number(3));
CHECK
Ex:
COLUMN LEVEL
ALTER LEVEL
Ex:
COLUMN LEVEL
ALTER LEVEL
PRIMARY KEY
This is used to avoid duplicates and nulls. This will work as combination
of unique and not null.
Primary key always attached to the parent table.
We can add this constraint in all three levels.
Ex:
COLUMN LEVEL
SQL> create table student(no number(2) primary key, name
varchar(10), marks number(3));
SQL> create table student(no number(2) constraint pk primary key,
name varchar(10),
marks number(3));
TABLE LEVEL
FOREIGN KEY
This is used to reference the parent table primary key column which
allows duplicates.
Foreign key always attached to the child table.
We can add this constraint in table and alter levels only.
Ex:
TABLE LEVEL
Once the primary key and foreign key relationship has been created
then you can not remove any parent record if the dependent childs
exists.
USING ON DELTE CASCADE
By using this clause you can remove the parent record even it childs
exists.
Because when ever you remove parent record oracle automatically
removes all its dependent records from child table, if this clause is
present while creating foreign key constraint.
Ex:
TABLE LEVEL
Ex:
UNIQUE (TABLE LEVEL)
SQL> create table student(no number(2) , name varchar(10), marks
number(3),
unique(no,name));
SQL> create table student(no number(2) , name varchar(10), marks
number(3), constraint
un unique(no,name));
DEFERRABLE CONSTRAINTS
Ex:
SQL> create table student(no number(2), name varchar(10), marks
number(3), constraint
un unique(no) deferred initially immediate);
Ø Enable
Ø Disable
Ø Enforce
Ø Drop
ENABLE
This will enable the constraint. Before enable, the constraint will check
the existing data.
Ex:
SQL> alter table student enable constraint un;
DISABLE
Ex:
SQL> alter table student enable constraint un;
ENFORCE
This will enforce the constraint rather than enable for future inserts or
updates.
This will not check for existing data while enforcing data.
Ex:
SQL> alter table student enforce constraint un;
DROP
SQL> Select to_char(trunc(sysdate),’dd-mon-yyyy
hh:mi:ss am’) from dual;
WHY INDEXES?
Indexes are most useful on larger tables, on columns that are likely
to appear in where clauses as simple equality.
TYPES
Ø Unique index
Ø Non-unique index
Ø Btree index
Ø Bitmap index
Ø Composite index
Ø Reverse key index
Ø Function-based index
Ø Descending index
Ø Domain index
Ø Object index
Ø Cluster index
Ø Text index
Ø Index organized table
Ø Partition index
v Local index
ü Local prefixed
ü Local non-prefixed
v Global index
ü Global prefixed
ü Global non-prefixed
UNIQUE INDEX
Unique indexes guarantee that no two rows of a table have duplicate
values in the columns that define the index. Unique index is
automatically created when primary key or unique constraint is
created.
Ex:
SQL> create unique index stud_ind on student(sno);
NON-UNIQUE INDEX
Ex:
SQL> create index stud_ind on student(sno);
Ex:
SQL> create index stud_ind on student(sno);
BITMAP INDEX
Ex:
SQL> create bitmap index stud_ind on student(sex);
COMPOSITE INDEX
A composite index also called a concatenated index is an index
created on multiple columns of a table. Columns in a composite index
can appear in any order and need not be adjacent columns of the
table.
Ex:
SQL> create bitmap index stud_ind on student(sno, sname);
Ex:
SQL> create index stud_ind on student(sno, reverse);
We can rebuild a reverse key index into normal index using the
noreverse keyword.
Ex:
SQL> alter index stud_ind rebuild noreverse;
This will use result of the function as key instead of using column as
the value for the key.
Ex:
SQL> create index stud_ind on student(upper(sname));
DESCENDING INDEX
The order used by B-tree indexes has been ascending order. You can
categorize data in B-tree index in descending order as well. This
feature can be useful in applications where sorting operations are
required.
Ex:
SQL> create index stud_ind on student(sno desc);
TEXT INDEX
To use oracle text, you need to create a text index on the column in
which the text is stored. Text index is a collection of tables and
indexes that store information about the text stored in the column.
TYPES
There are several different types of indexes available in oracle 9i. The
first, CONTEXT is supported in oracle 8i as well as oracle 9i. As of
oracle 9i, you can use the CTXCAT text index fo further enhance your
text index management and query capabilities.
Ø CONTEXT
Ø CTXCAT
Ø CTXRULE
You can create a text index via a special version of the create index
comman. For context index, specify the ctxsys.context index type and
for ctxcat index, specify the ctxsys.ctxcat index type.
Ex:
Suppose you have a table called BOOKS with the following columns
Title, Author, Info.
TEXT QUERIES
Syntax:
Contains(indexed_column, search_str);
Syntax:
Contains(indexed_column, search_str, index_set);
The following queries will search for a word called ‘prperty’ whose
score is greater than zero.
Suppose if you want to know the score of the ‘property’ in each book,
if score values for individual searches range from 0 to 10 for each
occurrence of the string within the text then use the score function.
The following queries will search for more than two words.
SQL> select * from books where contains(info,
‘property AND harvests AND workers’) > 0;
SQL> select * from books where catsearch(info, ‘property harvests
workers’, null) > 0;
The following queries will search for either of the two words.
The following queries will search for the phrase. If the search phrase
includes a reserved word within oracle text, the you must use curly
braces ({}) to enclose text.
You can enclose the entire phrase within curly braces, in which case
any reserved words within the phrase will be treated as part of the
search criteria.
The following queries will search for the words that are in between
the search terms.
You can use wildcards to expand the list of valid search terms used
during your query. Just as in regular text-string wildcard processing,
two wildcards are available.
The following queries will not return anything because its search
does not contain the word ‘hardest’.
SQL> select * from books where contains(info, ‘hardest’) > 0;
To use a fuzzy match, precede the search term with a question mark,
with no space between the question mark and the beginning of the
search term.
SOUNDEX, expands search terms based on how the word sounds. The
SOUNDEX expansion method uses the same text-matching logic
available via the SOUNDEX function in SQL.
To use the SOUNDEX option, you must precede the search term with
an exclamation mark(!).
INDEX SYNCHRONIZATION
SQL> exec CTX_DDL.SYNC_INDEX(‘book_index’);
INDEX SETS
SQL> exec CTX_DDL.CREATE_INDEX_SET(‘books_index_set’);
SQL> exec CTX_DDL.ADD_INDEX(‘books_index_set’, ‘title_index’);
INDEX-ORGANIZED TABLE
PARTITION INDEX
LOCAL INDEXES
Ex:
SQL> create index stud_index on student(sno) local;
GLOBAL INDEXES
Ex:
SQL> create index stud_index on student(sno) global;
Ex:
SQL> alter index stud_ind rebuild partition p2
Once you turned on the monitoring the use of indexes, then we can
check whether the table is hitting the index or not.
To monitor the use of index use the follwing syntax.
Syntax:
alter index index_name monitoring usage;
Syntax:
Hierarchical queries
Starting at the root, walk from the top down, and eliminate employee Higgins
in the result, but
process the child rows.
SELECT department_id, employee_id, last_name, job_id, salary
FROM employees
WHERE last_name! = ’Higgins’
START WITH manager_id IS NULL
CONNECT BY PRIOR employee_id = menagerie;
Posted by Sandeep Reddy at 5:32 AM No comments:
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: IMP QUERIES, Oracle
It is a perfect valid question to ask why hints should be used. Oracle comes
with an optimizer that promises to optimize a query's execution plan. When
this optimizer is really doing a good job, no hints should be required at all.
Sometimes, however, the characteristics of the data in the database are
changing rapidly, so that the optimizer (or more accuratly, its statistics) are
out of date. In this case, a hint could help.
You should first get the explain plan of your SQL and determine what
changes can be done to make the code operate without using hints if
possible. However, hints such as ORDERED, LEADING, INDEX, FULL, and the
various AJ and SJ hints can take a wild optimizer and give you optimal
performance
Tables analyze and update Analyze Statement
The ANALYZE statement can be used to gather statistics for a specific table,
index or cluster. The statistics can be computed exactly, or estimated based
on a specific number of rows, or a percentage of rows:
ANALYZE TABLE employees COMPUTE STATISTICS;
ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 15 PERCENT;
EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES');
If index is not able to create then will go for /*+ parallel(table, 8)*/-----For
select and update example---in where clase like st,not in ,>,< ,<> then we
will use.
Posted by Sandeep Reddy at 5:30 AM No comments:
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: FAQ'S, Oracle
Explain plan
Explain plan will tell us whether the query properly using indexes or
not.whatis the cost of the table whether it is doing full table scan or not,
based on these statistics we can tune the query.
The explain plan process stores data in the PLAN_TABLE. This table can be
located in the current schema or a shared schema and is created using in
SQL*Plus as follows:
SQL> CONN sys/password AS SYSDBA
Connected
SQL> @$ORACLE_HOME/rdbms/admin/utlxplan.sql
SQL> GRANT ALL ON sys.plan_table TO public;
What is your tuning approach if SQL query taking long time? Or how do u
tune SQL query?
If query taking long time then First will run the query in Explain Plan, The
explain plan process stores data in the PLAN_TABLE.
it will give us execution plan of the query like whether the query is using the
relevant indexes on the joining columns or indexes to support the query are
missing.
If joining columns doesn’t have index then it will do the full table scan if it is
full table scan the cost will be more then will create the indexes on the
joining columns and will run the query it should give better performance
and also needs to analyze the tables if analyzation happened long back. The
ANALYZE statement can be used to gather statistics for a specific table, index
or cluster using
ANALYZE TABLE employees COMPUTE STATISTICS;
If still have performance issue then will use HINTS, hint is nothing but a clue.
We can use hints like
ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousingsystems.
(/*+ ALL_ROWS */)
FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.
(/*+ FIRST_ROWS */)
CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS,
based on statistics gathered.
HASH
Hashes one table (full scan) and creates a hash index for that table. Then
hashes other table and uses hash index to find corresponding records.
Therefore not suitable for < or > join conditions.
/*+ use_hash */
We can keep aggregated data into materialized view. we can schedule the MV
to refresh but table can’t.MV can be created based on multiple tables.
Materialized View:
If we write a select statement in from clause that is nothing but inline view.
Ex:
Get dept wise max sal along with empname and emp no.
a.deptno=b.deptno
Posted by Sandeep Reddy at 5:26 AM No comments:
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: FAQ'S, Oracle
Differences between where clause and having clause
The WHERE clause cannot be used to restrict groups. you use the
HAVING clause to restrict groups.
c)Validation should be done that all the data specified gets extracted.
d)Test should include the check to see that the transformation and cleansing
process are working correctly.
e)Make sure that all the types of data transformations are working and
meeting the FS/MS and Business requirements/rules.
1.Data completeness test are designed to verify that all the expected data
loads into the DWH.
2.It includes running detailed tests to verify that all records,all fields and full
contents of each field are loaded correctly or not.
c)Comparing unique values of key fields between source data and data
loaded to the warehouse (Target) column mapping from the source or stage.
4.Populating the full contents of each field to validate that have no truncation
occurs at any step in the process for example if the source data fields is
having string(30) and make sure that to test it with 30 characters.
Posted by Sandeep Reddy at 4:30 AM No comments:
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Test Strategy for ETL Testing / Standard Tests for ETL Testing
There will be some standard tests for DWH that should be carried out as part
of testing for every DWH Project.
These are Strategies for testing ETL Applications are Identified as below:
Regression Testing -
Ensures the new data updates have not broken any existing functionality or
process.
OR
ETL Testing in Less Time, With Greater Coverage, to Deliver Trusted Data
Much ETL testing today is done by SQL scripting or “eyeballing” of data on
spreadsheets. These approaches to ETL testing are very time-consuming,
error-prone, and seldom provide complete test coverage. Informatica Data
Validation Option provides an ETL testing tool that can accelerate and
automate ETL testing in both production environments and development &
test. This means that you can deliver complete, repeatable and auditable test
coverage in less time with no programming skills required.
DBMS
A database is a collection of occurrence of multiple record types containing
the relationship between records, data aggregate and data items. A database
may be defined as
A database is a collection of interrelated data store together without
harmful and unnecessary redundancy (duplicate data) to serve multiple
applications
The data is stored so that they are independent of programs, which use
the data. A common and control approach is used in adding the new data,
modifying and retrieving existing data or deletion of data within the database
A running database has function in a corporation, factory, government
department and other organization. Database is used for searching the data
to answer some queries. A database may be design for batch processing, real
time processing or on line processing.
DATABASE SYSTEM
Database System is an integrated collection of related files along with the
detail about their definition, interpretation, manipulation and maintenance. It
is a system, which satisfied the data need for various applications in an
organization without unnecessary redundancy. A database system is based
on the data. Also a database system can be run or executed by using
software called DBMS (Database Management System). A database system
controls the data from unauthorized access.
Character
It is the most basic logical data element. It consists of a single
alphabetic, numeric, or other symbol.
Field
It consists of a grouping of characters. A data field represents an
attribute (a characteristic or quality) of some entity (object, person, place, or
event).
Record
The related fields of data are grouped to form a record. Thus, a record
represents a collection of attributes that describe an entity. Fixed-length
records contain, a fixed number of fixed-length data fields. Variable-length
records contain a variable number of fields and field lengths.
File
A group of related records is known as a data file, or table. Files are
frequently classified by the application for which they ar primarily used, such
as a payroll file or an inventory file, or the type of data they contain, such as
a document file or a graphical image file. Files are also classified by their
permanence, for example, a master file versus a transaction file. A
transaction file would contain records of all transactions occurring during a
period, whereas a master file contains all the permanent records. A history
file is an obsolete transaction or master file retained for backup purposes or
for long-term historical storage called archival storage.
Database
DWH Concepts
1. WHAT IS DWH ?WHY WE REQUIRE THAT ?
2. DWH ARCHITECTURE
3. DATA MART and TYPES
4. DM vs DWH
5. DATA CLEANZING
6. DATA SCRUBING
7. DATA MASKING
8. NORMALIZATION
9. ODS
10. STG AREA
11. DSS
Testing Concepts
1. Whitebox Testing
2. Blackbox Testing
3. Gray Box Testing
4. Regression Testing
5. Smoke Testing vs Sanity Testing
6. User Testing
7. Unit testing
8. Intigration testing
9. Module testing
10. System testing
11. UAT
1. Data Extract
2. Data Transform
3. Data Load
4. Import Source
5. Import Target
6. Mappings,Maplets
7. Workflows,Worklets
8. Transformations,Functionalities,Rules and Techniques
9. Import and Export
10. Coping and Rules
11. Queries Preparation based on Transformations
12. Importance of ETL Testing
13. Creating of Mappings,Sessions,Workflows
14. Running of Mappings,Sessions,Workflows
15. Analyzing of Mappings,Sessions,Workflows
16. Tasks and Types
Practice:
Data is important for businesses to make the critical business decisions. ETL
testing plays a significant role validating and ensuring that the business
information is exact, consistent and reliable. Also, it minimizes hazard of data
loss in production.
Hope these tips will help ensure your ETL process is accurate and the data
warehouse build by this is a competitive advantage for your business.
Database testing is done using smaller scale of data normally with OLTP
(Online transaction processing) type of databases while data warehouse
testing is done with large volume with data involving OLAP (online analytical
processing) databases.
In database testing normally data is consistently injected from uniform
sources while in data warehouse testing most of the data comes from
different kind of data sources which are sequentially inconsistent.
We generally perform only CRUD (Create, read, update and delete) operation
in database testing while in data warehouse testing we use read-only (Select)
operation.
Normalized databases are used in DB testing while demoralized DB is used in
data warehouse testing.
There are number of universal verifications that have to be carried out for
any kind of data warehouse testing. Below is the list of objects that are
treated as essential for validation in ETL testing:
- Verify that data transformation from source to destination works as
expected
- Verify that expected data is added in target system
- Verify that all DB fields and field data is loaded without any truncation
- Verify data checksum for record count match
- Verify that for rejected data proper error logs are generated with all details
- Verify NULL value fields
- Verify that duplicate data is not loaded
- Verify data integrity
Apart from these 4 main ETL testing methods other testing methods like
integration testing and user acceptance testing is also carried out to make
sure everything is smooth and reliable.
Most of the companies are taking a step forward for constructing their data
warehouse to store and monitor real time data as well as historical data.
Crafting an efficient data warehouse is not an easy job. Many organizations
have distributed departments with different applications running on
distributed technology. ETL tool is employed in order to make a flawless
integration between different data sources from different departments. ETL
tool will work as an integrator, extracting data from different sources;
transforming it in preferred format based on the business transformation
rules and loading it in cohesive DB known are Data Warehouse.
Well planned, well defined and effective testing scope guarantees smooth
conversion of the project to the production. A business gains the real
buoyancy once the ETL processes are verified and validated by independent
group of experts to make sure that data warehouse is concrete and robust.
New Data Warehouse Testing – New DW is built and verified from scratch.
Data input is taken from customer requirements and different data sources
and new data warehouse is build and verified with the help of ETL tools.
Migration Testing – In this type of project customer will have an existing
DW and ETL performing the job but they are looking to bag new tool in order
to improve efficiency.
Change Request – In this type of project new data is added from different
sources to an existing DW. Also, there might be a condition where customer
needs to change their existing business rule or they might integrate the new
rule.
Report Testing – Report are the end result of any Data Warehouse and the
basic propose for which DW is build. Report must be tested by validating
layout, data in the report and calculation.
1.Data completeness test are designed to verify that all the expected data
loads into the DWH.
2.It includes running detailed tests to verify that all records,all fields and full
contents of each field are loaded correctly or not.
c)Comparing unique values of key fields between source data and data
loaded to the warehouse (Target) column mapping from the source or stage.
4.Populating the full contents of each field to validate that have no truncation
occurs at any step in the process for example if the source data fields is
having string(30) and make sure that to test it with 30 characters.
b)The destination column length is equal to or greater than the source column
length.
c)Validation should be done that all the data specified gets extracted.
d)Test should include the check to see that the transformation and cleansing
process are working correctly.
e)Make sure that all the types of data transformations are working and meeting
the FS/MS and Business requirements/rules.
Test Strategy for ETL Testing / Standard Tests for ETL Testing
There will be some standard tests for DWH that should be carried out as part of
testing for every DWH Project.
These are Strategies for testing ETL Applications are Identified as below:
Regression Testing -
Ensures the new data updates have not broken any existing functionality or
process.
What is BI?
Business Intelligence refers to a set of methods and techniques that are used by
organizations for tactical and strategic decision making. It leverages methods
and technologies that focus on counts, statistics and business objectives to
improve business performance.
The objective of Business Intelligence is to better understand customers and
improve customer service, make the supply and distribution chain more efficient,
and to identify and address business problems and opportunities quickly.
Warehouse is used for high level data analysis purpose. It
is used for predictions, time series analysis, financial
Analysis, what -if simulations etc. Basically it is used
What is a DataMart(DM)?
Datamart is usually sponsored at the department level and developed with a
specific details or subject in mind, a Data Mart is a subset of data warehouse
with a focused objective.
In terms of design data warehouse and data mart are almost the same.
In general a Data Warehouse is used on an enterprise level and a Data Marts is
used on a business division/department level.
A data mart only contains data specific to a particular subject areas.
A set of level properties that describe a specific aspect of a business, used for
analyzing the factual measures.
Types of facts?
Additive: Additive facts are facts that can be summed up through all of
the dimensions in the fact table.
Semi-Additive: Semi-additive facts are facts that can be summed up for
some of the dimensions in the fact table, but not the others.
Non-Additive: Non-additive facts are facts that cannot be summed up for
any of the dimensions present in the fact table.
Cumulative: This type of fact table describes what has happened over a
period of time. For example, this fact table may describe the total sales by
product by store by day. The facts for this type of fact tables are mostly additive
facts. The first example presented here is a cumulative fact table.
Snapshot: This type of fact table describes the state of things in a
particular instance of time, and usually includes more semi-additive and non-
additive facts. The second example presented here is a snapshot fact table.
Fact Table Example:
Time ID Product ID Customer ID Unit
Sold
1 17 2 1
3 21 3 2
1 4 1 1
Dimension tables contain details about each instance of an object. For example,
the items dimension table would contain a record for each item sold in the store.
It might include information such as the cost of the item, the supplier, color,
sizes, and similar data.
Types of Dimensions?
Disadvantages:
- Type 3 will not be able to keep all history where an attribute is changed more
than once. For example, if Christina later moves to Texas on December 15,
2003, the California information will be lost.
Usage:
Type 3 is rarely used in actual practice.
When to use Type 3:
Type III slowly changing dimension should only be used when it is necessary for
the data warehouse to track historical changes, and when such changes will only
occur for a finite number of time.
Customer Dimension
1 Brian Edge M 2 3 4
2 Fred Smith M 3 5 1
3 Sally Jones F 1 7 3
Date Dimension
Product Dimension
What is a surrogate key? & difference between primary key and surrogate key?
A surrogate key is a substitution for the natural primary key. It is a unique
identifier or number ( normally created by a database sequence generator ) for
each record of a dimension table that can be used for the primary key to the
table.
Dimensional Model
A type of data modeling suited for data warehousing. In a dimensional model,
there are two types of tables: dimensional tables and fact tables. Dimensional
table records information on each dimension, and fact table records all the
"fact", or measures.
1. Data modeling
There are three levels of data modeling. They are conceptual, logical, and
physical. This section will explain the difference among the three, the order with
which each one is created, and how to go from one level to the other.
2. Conceptual Data Model
Features of conceptual data model include:
Includes the important entities and the relationships among them.
No attribute is specified.
No primary key is specified.
At this level, the data modeler attempts to identify the highest-level
relationships among the different entities.
3. Logical Data Model
Features of logical data model include:
Includes all entities and relationships among them.
All attributes for each entity are specified.
The primary key for each entity specified.
Foreign keys (keys identifying the relationship between different entities)
are specified.
Normalization occurs at this level.
At this level, the data modeler attempts to describe the data in as much detail as
possible, without regard to how they will be physically implemented in the
database.
In data warehousing, it is common for the conceptual data model and the logical
data model to be combined into a single step (deliverable).
The steps for designing the logical data model are as follows:
1. Identify all entities.
2. Specify primary keys for all entities.
3. Find the relationships between different entities.
4. Find all attributes for each entity.
5. Resolve many-to-many relationships.
6. Normalization.
4. Physical Data Model
Features of physical data model include:
Specification all tables and columns.
Foreign keys are used to identify relationships between tables.
Demoralization may occur based on user requirements.
Physical considerations may cause the physical data model to be quite
different from the logical data model.
At this level, the data modeler will specify how the logical data model will be
realized in the database schema.
The steps for physical data model design are as follows:
1. Convert entities into tables.
2. Convert relationships into foreign keys.
3. Convert attributes into columns.
1. http://www.learndatamodeling.com/dm_standard.htm
2. Modeling is an efficient and effective way to represent the organization’s
needs; It provides information in a graphical way to the members of an
organization to understand and communicate the business rules and processes.
Business Modeling and Data Modeling are the two important types of modeling.
The differences between a logical data model and physical data model is
shown below.
Logical vs Physical Data Modeling
Logical Data Model Physical Data Model
Represents business information Represents the physical implementation
and defines business rules of the model in a database.
Entity Table
Attribute Column
Primary Key Primary Key Constraint
Alternate Key Unique Constraint or Unique Index
Inversion Key Entry Non Unique Index
Rule Check Constraint, Default Value
Relationship Foreign Key
Definition Comment
What is Granularity?
Principle: create fact tables with the most granular data possible to support
analysis of the business process.
In Data warehousing grain refers to the level of detail available in a given fact
table as well as to the level of detail provided by a star schema.
It is usually given as the number of records per key within the table. In general,
the grain of the fact table is the grain of the star schema.
Facts: Facts must be consistent with the grain.all facts are at a uniform grain.
Watch for facts of mixed granularity
Total sales for day & montly total
Dimensions: each dimension associated with fact table must take on a single
value for each fact row.
What is Granularity?
Principle: create fact tables with the most granular data possible to support
analysis of the business process.
In Data warehousing grain refers to the level of detail available in a given fact
table as well as to the level of detail provided by a star schema.
It is usually given as the number of records per key within the table. In general,
the grain of the fact table is the grain of the star schema.
Facts: Facts must be consistent with the grain.all facts are at a uniform grain.
Watch for facts of mixed granularity
Total sales for day & montly total
Dimensions: each dimension associated with fact table must take on a single
value for each fact row.
Suppose that a company sells products to customers. Every sale is a fact that
happens, and the fact table is used to record these facts. For example:
4 17 2 1
8 21 3 2
8 4 1 1
Customer Dimension
1 Brian Edge M 2 3 4
2 Fred Smith M 3 5 1
3 Sally Jones F 1 7 3
Date Dimension
Product Dimension
In this example, the customer ID column in the fact table is the foreign key that
joins with the dimension table. By following the links, you can see that row 2 of
the fact table records the fact that customer 3, Sally Jones, bought two items on
day 8. The company would also have a product table and a time table to
determine what Sally bought and exactly when.
When building fact tables, there are physical and data limits. The ultimate size of
the object as well as access paths should be considered. Adding indexes can help
with both. However, from a logical design perspective, there should be no
restrictions. Tables should be built based on current and future requirements,
ensuring that there is as much flexibility as possible built into the design to allow
for future enhancements without having to rebuild the data.
A set of level properties that describe a specific aspect of a business, used for
analyzing the factual measures.
Types of facts?
Additive: Additive facts are facts that can be summed up through all of
the dimensions in the fact table.
Semi-Additive: Semi-additive facts are facts that can be summed up for
some of the dimensions in the fact table, but not the others.
Non-Additive: Non-additive facts are facts that cannot be summed up for
any of the dimensions present in the fact table.
Cumulative: This type of fact table describes what has happened over a
period of time. For example, this fact table may describe the total sales by
product by store by day. The facts for this type of fact tables are mostly additive
facts. The first example presented here is a cumulative fact table.
Snapshot: This type of fact table describes the state of things in a
particular instance of time, and usually includes more semi-additive and non-
additive facts. The second example presented here is a snapshot fact table.
Fact Table Example:
1 17 2 1
3 21 3 2
1 4 1 1
Dimension tables contain details about each instance of an object. For example,
the items dimension table would contain a record for each item sold in the store.
It might include information such as the cost of the item, the supplier, color,
sizes, and similar data.
A1. A ETL Tester primarily test source data extraction, business transformation logic and target table loading .
There are so many tasks involved for doing the same , which are given below -
1. Stage table / SFS or MFS file created from source upstream system - below checks come under this :
a) Business data check like telephone no cant be more than 10 digit or character data
b) Record count check for active and passing transformation logic applied
3. Target table loading from stage file or table after applying transformation - below check come under this
Q2. Generally how enevironemt variables and paths are set in Unix?
A2. dot(.) profile , normally while logging this will be executed or we can execute as dot(.) dot(.)profile
========================================================================
3. If a column is added into a table, tell me the test cases you will write for this?
1. Check that particular column data type and size is as per the data model.
2. Check data is getting loaded into that column as per the DEM (data element mapping)
3. Check the valid values , null check and boundary value check for that column
========================================================================
Q4.Let's suppose you are working on a project where requirement keeps changing. How would you tackle this?
A4. If the requirement is getting changed frequently then we need to lot of regression for the same functionality
which has been tested. Then you need to be ready with all your input test data and expected result, so after
checking changed part , you can run all the test cases and check the results in no time.
========================================================================
Q5. How do you modify your test data while doing the testing?
A5. If input test data is ASCII file, then you can easliy prepare it in notepad+ based on the interface and then ftp
it to unix server and if it's table then you can insert the rows into table as per the data model. If file other than
ascii format then we can use abinitio graph to covert excel sheet into required format or use other tools are
available for doing the same.
========================================================================
Q6. A table has partitions by range on data_dt, suppose it has already defined monthly partitions PTN_01
(values less than (TO_DATE ('01-Feb-2014' , 'dd-mon-yyyy' ))) for january 2014 data only and we are trying to
load data for feb 2014 then what will happen? If you find any error then how to solve the same.
A6. It will fetch error - “Inserted partition key does not map to any partition” (ORA -14400) . It means parition
is not there for feb data which we are trying to load, so add a new partition in the table for feb month data as
below :
Alter table table_name add partition partition_name values less than (TO_DATE ('01-MAR-2014' , 'dd-mon-
yyyy' ))
Note : Remember we can only create new partition for higher value than the previous created partition (it means
here we can't add partition for dec 2013 as we have higher value is feb 2014 here.
========================================================================
Q7. How will you connect oracle database from unix server?
========================================================================
Q8. If one of the Oracle procedure fetches error - “No data found” then what is the issue here?
A8. In that procedure definitely we are retrieving the data from table and passing it to a variable. If that select
statement is not fetching any row then we will get this error.
========================================================================
Q9. If one of your wrapper unix script is fetching the error - “not enough memory” then what you will do?
A9. First we can check the disk usage by command df -h , then we can clean up accordingly and run the script
again.
========================================================================
Q10. let's suppose we have to two tables, item (Primary Key : item id)and order (Primary Key order id, Foreign
Key : item id) . If we try to delete items from order table then will we able to delete? If not then how can we do
that?
A10. If we make an attempt to delete or truncate a table with unique or primary keys referenced by foreign keys
enabled in another table then we get error : “ORA-02266 unique/primary keys in table referenced by enabled
foreign keys”
So, before deleting or truncating the table, disable the foreign key constraints in other tables or delete the data
from foreign table item first then from the primary table order here.
========================================================================
A11. In nutshell I can say - for faster retrieval of data we use indexes, let's suppose I created a table order which
will contain billions of data and I know - most of the time I will be querying this table using order id then I
should make index on Order table for faster result.
========================================================================
Q12. What will be the default permission of a file created in UNIX? How can we provide all access to all?
A12. When a file is created, the permission flags are set according to the file mode creation mask, which can be
set using the "umask" command. If umask value is set as 002 then it means file permission is 664. (-rw-rw-r--).
we can chnage the permission of file as below :
========================================================================
A13. First We should fail the test case step in test lab , then we can click on new defect (red color symbol) then
enter the defect details there and raise it. Then that defect is linked with that particular step of test case. One
more thing as we mention issues in actual result that will come in defect description automatically (no need to
put issue details again)
========================================================================
Q14. What are the different methods to load table from files in Oracle? Also tell me methods for teradata.
A14. SQL Loader, External table loading, loading through driver JDBC . In teradata we use multiload or
fastload usually.
========================================================================
Q15. What are the things you will check before you start testing? What will be your deliverables?
A15. Before starting the testing, requirement document, functional spec . Technical spec , interface, dem and
unit test result should be availble atleast. My deliverables will be – test plan, test spec, test script, defect
summary with root causal analysis, test execution or result report and automation script (if created).
========================================================================
Passive transformation-
No or records in input = No of records in output (like - Expression, look-up, stored procedure)
========================================================================
Q17. Let's suppose we are having order table which are having duplicate order_id. How we can delete the
duplicate rows from the table? Tell atleast two methods you know.
Second method
delete from order a where rowid > (select min(rowid) from order b where a.order_id = b.order_id);
Note : here we deleting the duplicate rows based on rowid which is uniquely assigned to each row by Oracle.
========================================================================
Q18. How you will find the second highest salary from the employee table? Tell me two methods atleast.
A18. First method – we can use sub query to find this as below :
select max(sal)
from emp
Note : first we fond the hoghest salary here then next highest will be the second salary only
FROM
(
from emp
WHERE RN = 2;
========================================================================
Q19. How we can find out the last two modified file for a particular file mask abc* in unix?
A19. We can do this using very simple command : ls -lrt abc* | tail -2
Note: To check the last command was successful or not – we can use echo $?
========================================================================
20. How you will delete the last line of a file? Tell atleast two methods
cp file_main file_bkp
rm -f file_bkp
Note : In direction > operator is used to move all the contents of file into another, >> used to append the one file
data into another file.
========================================================================
Q21. Let's suppose we are migrating the data from legacy file system to oracle database tables. How would you
validate that the data is migrated propely? Tell me the imp. test scenario and test cases you will write to test this
requirement and how will you test that scenario?
1) Check that all table DDL as per data model (desc table name )
2) Check that all records are loaded from source to target (match record count)
3) Check null value or junk data is not loaded into table ( check record count by putting not null conditions)
4) Check valid values for columns ( based on group by find all the counts for particular values for a field)
5) Check same data is loaded ( put a join between source table (if there) stage table and taget table and check the
result)
6) Check key fields ( add number fields for target and source and match them)
8) After initial loading, check incremental loading ( insert/ delete /update check all the CDC cases)
1. What's the different between OLTP and OLAP system? Please provide one example for both.
================================================================================
================================================================================
================================================================================
4.Let's suppose you are working on a tech refresh project i.e. (from VB code to .net code). Which
type of testing you will be doing in this project ??
================================================================================
5.Why OLAP system are not normalized while OLTP are highly normalized system?
================================================================================
6. Which SCD type is used to store all historical data along with current data?
================================================================================
================================================================================
================================================================================
================================================================================
10. What is the difference between data mart and data warehouse?
================================================================================
12. Let's suppose in a test scenario one column is added in a table. What kind of test cases you will
write?
================================================================================
================================================================================
15. Which are the important steps carried out in ETL after extracting the data from source system
and before transforming the data??
================================================================================
================================================================================
================================================================================
18. How would you validate that junk data is not loaded into the target table ?
================================================================================
19. How do you check the logs if any workflow (Informatica) failed while executing?
================================================================================
20. Have you worked on Multi file systems? How ab-initio graph process those files??
================================================================================
21. Let's suppose we are migrating the data from legacy file system to oracle database tables. How
would you validate that the data is migrated properly? Tell me the test scenario and test cases you
will write to test this requirement. ??
1. Reference Cursor ?
http://www.vbip.com/books/1861003927/chapter_3927_16.asp
===========================================================================
=====
2. Anonymous Block ?
An unnamed PL/SQL block is called an anonymous block.
===========================================================================
=====
3. Explain how cursor is implemented in PL/SQL
http://www.unix.org.ua/orelly/oracle/prog2/ch06_02.htm
A cursor is a variable that runs through the rows of some table or answer
to
some query.
***********
DECLARE
CURSOR joke_feedback_cur
IS
SELECT J.name, R.laugh_volume, C.name
FROM joke J, response R, comedian C
WHERE J.joke_id = R.joke_id
AND J.joker_id = C.joker_id;
BEGIN
...
END;
===========================================================================
=====
4. Constraints ?
refer :
http://storacle.princeton.edu:9001/oracle8-doc/server.805/a58241/
ch6.htm#1309
Only define NOT NULL constraints for columns of a table that absolutely
require values at all times.
Use the combination of NOT NULL and UNIQUE key integrity constraints to
force the input of values in the UNIQUE key; this combination of data
integrity rules eliminates the possibility that any new row's data will
ever
attempt to conflict with an existing row's data.
*The condition must be a Boolean expression that can be evaluated using the
values in the row being inserted or updated.
*The condition cannot contain subqueries or sequences.
*The condition cannot include the SYSDATE, UID, USER, or USERENV SQL
functions.
*The condition cannot contain the pseudocolumns LEVEL, PRIOR, or ROWNUM;
*The condition cannot contain a user-defined SQL function.
Eg.
CREATE TABLE dept (
===========================================================================
=====
5. Difference between Primary Key and Unique key ?
* Primary keys are used to identify each row of the table uniquely. Unique
keys should not have the purpose of identifying rows in the table.
* A primary key field cannot be Null, whereas a Unique column can have a
Null value.
* There could be only 1 Primary Key per table, but there could be any
number
of Unique Key columns.
* A primary key should be a Unique column, but all Unique Key column need
not be a PRimary Key
===========================================================================
=====
6. what are diffenent Joins explain . outer ? how would you write outer
join(=+)
if specify + to the right of equal to sign, which table data will be in
full.
An outer join returns all rows that satisfy the join condition and also
returns some or all of those rows from one table for which no rows from the
other satisfy the join condition.
To write a query that performs an outer join of tables A and B and returns
all rows from A( left outer join), apply the outer join operator (+) to all
columns of B in the join condition.
For example,
SELECT ename, job, dept.deptno, dname
FROM emp, dept
WHERE emp.deptno (+) = dept.deptno;
Similarly to get all rows from B form a join of A & B , apply the outer
join
operator(+) to A.
Example,
SELECT ename, job, dept.deptno, dname
FROM emp, dept
WHERE emp.deptno = dept.deptno (+)
===========================================================================
=====
7. Is a function in SELECT possible ?
Yes. FOr example, Select ltrim(price) from Item.
User defined functions work only from Oracle 8i onwards.
===========================================================================
=====
8. EXCEPTION handling in PL/SQL
EXCEPTION
WHEN past_due THEN -- does not handle RAISEd exception
WHEN OTHERS THEN -- handle all other errors
END;
===========================================================================
=====
10. Decode ? Using Decode map this logic,
If A>B Display 'A is Big', If A=B Display 'A equals B'
Else 'B is Big'
===========================================================================
=====
11. What is an index? what are diff type of indices? what is Clustered and
Non Clustered Indeces ?
A clustered index is a special type of index that reorders the way records
in the table are physically stored. Therefore table can have only one
clustered index. The leaf nodes of a clustered index contain the data
pages.
A nonclustered index is a special type of index in which the logical order
of the index does not match the physical stored order of the rows on disk.
The leaf node of a nonclustered index does not consist of the data pages.
Instead, the leaf nodes contain index rows.
===========================================================================
=====
12. Mutating Error ?
When we use a row level triger on a table, and at the same time if we
query/insert/delete/update on the same table, it will give the mutating
error.
===========================================================================
=====
13. How to avoid mutating error?
Two possible solutions.
1. Change the design of the table so as to avoid querying or any update on
the table on which the row level trigger is holding on.
2. Create a package which holds a PL/SQL table with the RowID of the rows
need to insert/update and a counter of the rows
a. In BEFORE STATEMENT trigger set the counter to zero.
b. In every ROW trigger call a procedure from the package which will
register the RowId and modify the counter
c. In AFTER STATEMENT trigger call a procedure which will make the needed
checks and update the particular rows of the table.
If you are not familiar with PL/SQL tables alternatively you can use
temporary Oracle tables. It is also a good idea.
===========================================================================
=====
14. Difference between Delete and Truncate.
Truncate will permenantly delete the record and no rollback capability will
exist.
The delete command remove records from a table. Delete moves records to the
rollback segment enabling rollback capability.
Excessive use of triggers can result in complex interdependencies, which
can
be difficult to maintain in a large application.
===========================================================================
=====
15. How many types of triggers are there and what are ? (Before Insert,
after Insert, update,
delete etc..)
Table
Record
===========================================================================
=====
17. what is PL/SQL tables?
===========================================================================
=====
18. What are the default packages provided by oracle
The ones with "DBMS_" prefix. Eg. DBMS_Output, DBMS_ALERT
===========================================================================
=====
19. What is the diff between Where and Having
Having clause is used where group by clasue is used to restrict the groups
of returned rows to those groups for which the specified condition is TRUE.
The HAVING condition cannot contain a scalar subquery expression
eg:
SELECT department_id, MIN(salary), MAX (salary)
FROM employees
GROUP BY department_id
HAVING MIN(salary) < 5000;
===========================================================================
=====
20. How is error Handling done?
===========================================================================
=====
21. I have table that contains state codes,
I might have more than 1 rows with the same state code, how can I find out?
select count(*), state_code from table. IF this is > 1 , then we can assure
that there are duplicates.
===========================================================================
=====
22. How do you copy rows from Schema a , table p to
table p in schema b ?
To copy between tables on a remote database, include the same username,
password, and service name in the FROM and TO clauses:
===========================================================================
=====
23. What is %Rowtype and %Type
Use the %ROWTYPE attribute to create a record that contains all the columns
of the specified table. The following example defines the Get_emp_rec
procedure, which returns all the columns of the Emp_tab table in a PL/SQL
record for the given empno:
DECLARE
Emp_row Emp_tab%ROWTYPE; -- declare a record matching a
-- row in the Emp_tab table
BEGIN
Get_emp_rec(7499, Emp_row); -- call for Emp_tab# 7499
DBMS_OUTPUT.PUT(Emp_row.Ename || ' ' || Emp_row.Empno);
DBMS_OUTPUT.PUT(' ' || Emp_row.Job || ' ' || Emp_row.Mgr);
DBMS_OUTPUT.PUT(' ' || Emp_row.Hiredate || ' ' ||
Emp_row.Sal);
DBMS_OUTPUT.PUT(' ' || Emp_row.Comm || ' '|| Emp_row.Deptno);
DBMS_OUTPUT.NEW_LINE;
END;
===========================================================================
=====
24. How to use the cursor with using open, fetch and close.?
===========================================================================
=====
25. using select Statement how you will retrieve the user who is logged in?
===========================================================================
=====
26. When Exception occurs, i want to see the error generated by Oracle. How
to see it?
SQLERRM
===========================================================================
=====
27. how to generate the out of the select statement in a file?
SPOOL
===========================================================================
=====
28. how to find the duplicate records?
select count(*), job from emp group by job having count(*) > 1
===========================================================================
=====
31. What is the difference between RDBMS and DBMS? Sol: Chords 12 rules.
===========================================================================
=====
32. What is normalised and denormalised data?
process of removing redundancy in data by separating the data into multiple
tables.
===========================================================================
=====
33. What is a VIEW?
A view is a custom-tailored presentation of the data in one or more tables.
A view can also be thought of as a "stored query." Views do not actually
contain or store data; they derive their data from the tables on which they
are based.
===========================================================================
=====
34. Can a view update a table?
===========================================================================
=====
36. What are cursors ? After retrieving the records into the cursor can we
update
the record in the table for the retrieved record. What effect will it
have on the cursor?
Cursors
Oracle uses work areas to execute SQL statements and store processing
information. A PL/SQL construct called a cursor lets you name a work area
and access its stored information. There are two kinds of cursors: implicit
and explicit. PL/SQL implicitly declares a cursor for all SQL data
manipulation statements, including queries that return only one row. For
queries that return more than one row, you can explicitly declare a cursor
to process the rows individually. An example follows:
DECLARE
CURSOR c1 IS
SELECT empno, ename, job FROM emp WHERE deptno = 20;
The set of rows returned by a multi-row query is called the result set. Its
size is the number of rows that meet your search criteria
Multi-row query processing is somewhat like file processing. For example, a
COBOL program opens a file, processes records, then closes the file.
Likewise, a PL/SQL program opens a cursor, processes rows returned by a
query, then closes the cursor. Just as a file pointer marks the current
position in an open file, a cursor marks the current position in a result
set.
You use the OPEN, FETCH, and CLOSE statements to control a cursor. The OPEN
statement executes the query associated with the cursor, identifies the
result set, and positions the cursor before the first row. The FETCH
statement retrieves the current row and advances the cursor to the next
row.
When the last row has been processed, the CLOSE statement disables the
cursor.
===========================================================================
=====
37. What are user defined data types ?
To specify base_type, you can use %TYPE, which provides the datatype of a
variable or database column. Also, you can use %ROWTYPE, which provides the
rowtype of a cursor, cursor variable, or database table.
===========================================================================
=====
38. How to copy the Structure and data from one table to another in one SQL
stmt?
COPY {FROM database | TO database | FROM database TO database}
{APPEND|CREATE|INSERT|REPLACE} destination_table [(column, column, column,
...)]
USING query
username[/password]@connect_identifier
===========================================================================
=====
39. Describe UNION and UNION ALL. UNION returns distinct rows selected by
both queries while UNION ALL returns all the rows. Therefore, if the table
has duplicates, UNION will remove them. If the table has no duplicates,
UNION will force a sort and cause performance degradation as compared to
UNION ALL.
===========================================================================
=====
40. What is 1st normal form? Each cell must be one and only one value, and
that value must be atomic: there can be no repeating groups in a table that
satisfies first normal form.
===========================================================================
=====
41. What is 2nd normal form? Every nonkey column must depend on the entire
primary key.
===========================================================================
=====
42. What is 3rd normal form? (another explanation than #1) No nonkey
column
depends on another nonkey column.
===========================================================================
=====
43. What is 4th normal form? Fourth normal form forbids independent
one-to-many relationships between primary key columns and nonkey columns.
===========================================================================
=====
44. What is 5th normal form? Fifth normal form breaks tables into the
smallest possible pieces in order to eliminate all redundancy within a
table. Tables normalized to this extent consist of little more than the
primary key.
===========================================================================
=====
===========================================================================
=====
48.If same kind of logic we put in function as well as procedure then which
one will be faster?
===========================================================================
=====
===========================================================================
=====
50. While handiling excecption if we have written “When others then” first
before other exception handler then what will happen?
===========================================================================
=====
51. What is sql injection attack???
<form method="post" action="http://testasp.vulnweb.com/login.asp">
<input name="tfUName" type="text" id="tfUName">
<input name="tfUPass" type="password" id="tfUPass">
</form>
The easiest way for the login.asp to work is by building a database
query that looks like this:
SELECT id
FROM logins
WHERE username = '$username'
AND password = '$password’
If the variables $username and $password are requested directly from
the user's input, this can easily be compromised. Suppose that we
gave "Joe" as a username and that the following string was provided
as a password: anything' OR 'x'='x
SELECT id
FROM logins
WHERE username = 'Joe'
AND password = 'anything' OR 'x'='x'
As the inputs of the web application are not properly sanitised, the
use of the single quotes has turned the WHERE SQL command into a
two-component clause.
The 'x'='x' part guarantees to be true regardless of what the first
part contains.
This will allow the attacker to bypass the login form without
actually knowing a valid username / password combination!
1) Constraint Testing:
In the phase of constraint testing, the test engineers identifies whether the data is
mapped from source to target or not.
The Test Engineer follows the below scenarios in ETL Testing process.
a) NOT NULL
b) UNIQUE
c) Primary Key
d) Foreign key
e) Check
f) Default
g) NULL
2) Source to Target Count Testing:
In the Source to Target data is matched or not. A Tester can check in this view whether
it is ascending order or descending order it doesn’t matter .Only count is required for
Tester.
Due to lack of time a tester can follow this type of Testing.
3) Source to Target Data Validation Testing:
In this Testing, a tester can validate the each and every point of the source to
target data.
Most of the financial projects, a tester can identify the decimal factors.
MIN MAX RANGE
4 10 6
5) Field to Field Testing:
In the field to field testing, a test engineer can identify that how much space is occupied
in the database. The data is integrated in the table cum datatypes.
NOTE: To check the order of the columns and source column to target column.
Note:
1) There are no mistakes in Primary Key or no Primary Key is allotted then the duplicates
may arise.
2) Sometimes, a developer can do mistakes while transferring the data from source to
target at that time duplicates may arise.
3) Due to Environment Mistakes also duplicates arise (Due to improper plugins in the tool).
7) Error/Exception Logical Testing:
1) Delimiter is available in Valid Tables
2) Delimiter is not available in invalid tables(Exception Tables)
11) Initialization testing:
A combination of hardware and software installed in platform is called the Initialization
Testing
12) Transformation Testing:
At the time of mapping from source table to target table, Transformation is not in
mapping condition, then the Test Engineer raises bugs.
13) Regression Testing:
Code modification to fix a bug or to implement a new functionality which makes us to to
find errors.
These introduced errors are called regression . Identifying for regression effect is called
regression testing.
14) Retesting:
Re executing the failed test cases after fixing the bug.
What is Metadata?
Metadata is defined as data that describes other data. Metadata can be divided into two main types:
structural and descriptive.
Structural metadata describes the design structure and their specifications. This type of metadata
describes the containers of data within a database.
Descriptive metadata describes instances of application data. This is the type of metadata that is
traditionally spoken of and described as “data about the data.”
Metadata makes it easier to retrieve, use, or manage information resources by providing users with
information that adds context to the data they’re working with. Metadata can describe information at
any level of aggregation, including collections, single resources, or component part of a single
resource. Metadata can be embedded into a digital object or can be stored separately. Web pages
contain metadata called metatags.
Metadata at the most basic level is simply defined as “data about data”. An item of metadata
describes the specific characteristics about an individual data item. In the database realm, metadata
is defined as, “data about data, through which the end-user data are integrated and managed.”
Metadata in a database typically store the relationships that link up numerous pieces of data.
“Metadata names these fields, describes the size of the fields, and may put restrictions on what can
go in the field (for example, numbers only).”
“Therefore, metadata is information about how data is extracted, and how it may be transformed. It
is also about indexing and creating pointers into data. Database design is all about defining metadata
schemas.” Meta data can be stored either internally, in the same file as the data, or externally, in a
separate area. If the data is stored internally, the metadata is together with the data, making it more
easily accessible to view or change. However, this method creates high redundancy. If metadata is
stored externally, the searches can become more efficient. There is no redundancy but getting to this
metadata may be a little more technical.
All the metadata is stored in a data dictionary or a system catalog. The data dictionary is most
typically an external document that is created in a spreadsheet type of document that stores the
conceptual design ideas for the database schema. The data dictionary also contains the general
format that the data, and in effect the metadata, should be. Metadata is an essential aspect to
database design, it allows for increased processing power, due to the fact that it can help create
pointers and indexes.
*******************************************************************************
Database testing is done using smaller scale of data normally with OLTP (Online transaction
processing) type of databases while data warehouse testing is done with large volume with data
involving OLAP (online analytical processing) databases.
In database testing normally data is consistently injected from uniform sources while in data
warehouse testing most of the data comes from different kind of data sources which are sequentially
inconsistent.
We generally perform only CRUD (Create, read, update and delete) operation in database testing
while in data warehouse testing we use read-only (Select) operation.
Normalized databases are used in DB testing while demoralized DB is used in data warehouse testing.
There are number of universal verifications that have to be carried out for any kind of data
warehouse testing.
Below is the list of objects that are treated as essential for validation in ETL testing:
Verify that data transformation from source to destination works as expected
Verify that expected data is added in target system
Verify that all DB fields and field data is loaded without any truncation
Verify data checksum for record count match
Verify that for rejected data proper error logs are generated with all details
Verify NULL value fields
Verify that duplicate data is not loaded
Verify data integrity
***************************************************************************
ETL basically stands for Extract Transform Load - which simply implies the process where you extract
data from Source Tables, transform them in to the desired format based on certain rules and finally
load them onto Target tables. There are numerous tools that help you with ETL process - Informatica,
Control-M being a few notable ones.
So ETL Testing implies - Testing this entire process using a tool or at table level with the help of test
cases and Rules Mapping document.
Extract
In this step we extract data from different internal and external sources, structured and/or
unstructured. Plain queries are sent to the source systems, using native connections, message
queuing, ODBC or OLE-DB middleware. The data will be put in a so-called Staging Area (SA), usually
with the same structure as the source. In some cases we want only the data that is new or has been
changed, the queries will only return the changes. Some tools can do this automatically, providing a
changed data capture (CDC) mechanism.
Transform
Once the data is available in the Staging Area, it is all on one platform and one database. So we can
easily join and union tables, filter and sort the data using specific attributes, pivot to another
structure and make business calculations. In this step of the ETL process, we can check on data
quality and cleans the data if necessary. After having all the data prepared, we can choose to
implement slowly changing dimensions. In that case we want to keep track in our analysis and
reports when attributes changes over time, for example a customer moves from one region to
another.
Load
Finally, data is loaded into a central warehouse, usually into fact and dimension tables. From there
the data can be combined, aggregated and loaded into datamarts or cubes as is deemed necessary.
*******************************************************************************
Multi dimensional data is logically represented by Cubes in data warehousing. The dimension and the
data are represented by the edge and the body of the cube respectively. OLAP environments view the
data in the form of hierarchical cube. A cube typically includes the aggregations that are needed for
business intelligence queries.
*******************************************************************************
We can divide IT systems into transactional (OLTP) and analytical (OLAP). In general we can assume
that OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze
it.
- OLTP (On-line Transaction Processing) is characterized by a large number of short on-line
transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast
query processing, maintaining data integrity in multi-access environments and an effectiveness
measured by number of transactions per second. In OLTP database there is detailed and current
data, and schema used to store transactional databases is the entity model (usually 3NF).
Business intelligence, or BI for short, is an umbrella term that refers to competencies, processes,
technologies, applications and practices used to support evidence-based decision making in
organizations. In the widest sense it can be defined as a collection of approaches for gathering,
storing, analyzing and providing access to data that helps users to gain insights and make better fact-
based business decisions.
BI used for?
Organizations use Business Intelligence to gain data-driven insights on anything related to business
performance. It is used to understand and improve performance and to cut costs and identify new
business opportunities, this can include, among many other things:
Gathering Data
Gathering data is concerned with collecting or accessing data which can then be used to inform
decision making. Gathering data can come in many formats and basically refers to the automated
measurement and collection of performance data. For example, these can come from transactional
systems that keep logs of past transactions, point-of-sale systems, web site software, production
systems that measure and track quality, etc. A major challenge of gathering data is making sure that
the relevant data is collected in the right way at the right time. If the data quality is not controlled at
the data gathering stage then it can harm the entire BI efforts that might follow – always remember
the old adage - garbage in garbage out
Storing Data
Storing Data is concerned with making sure the data is filed and stored in appropriate ways to ensure
it can be found and used for analysis and reporting. When storing data the same basic principles
apply that you would use to store physical goods – say books in a library – you are trying to find the
most logical structure that will allow you to easily find and use the data. The advantages of modern
data-bases (often called data warehouses because of the large volumes of data) is that they allow
multi-dimensional formats so you can store the same data under different categories – also called
data marts or data-warehouse access layers. Like in the physical world, good data storage starts with
the needs and requirements of the end users and a clear understanding of what they want to use the
data for.
Analyzing Data
The next component of BI is analysing the data. Here we take the data that has been gathered and
inspect, transform or model it in order to gain new insights that will support our business decision
making. Data analysis comes in many different formats and approaches, both quantitative and
qualitative. Analysis techniques includes the use of statistical tools, data mining approaches as well as
visual analytics or even analysis of unstructured data such as text or pictures.
Providing Access
In order to support decision making the decision makers need to have access to the data. Access is
needed to perform analysis or to view the results of the analysis. The former is provided by the latest
software tools that allow end-users to perform data analysis while the latter is provided through
reporting, dashboard and scorecard applications.
*******************************************************************************