0% found this document useful (0 votes)
110 views123 pages

SQL & Query Optimization: Unit - Iii

Uploaded by

cnpnraja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views123 pages

SQL & Query Optimization: Unit - Iii

Uploaded by

cnpnraja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 123

UNIT - III

SQL & QUERY OPTIMIZATION


SYLLABUS

SQL & QUERY OPTIMIZATION


� SQL fundamentals- SQL Standards - Data types - DDL – DML
– DCL – TCL - Keys- Integrity – Views-Trigger-Cursors-
Embedded SQL - Dynamic SQL - Query Processing and
Optimization
WHAT IS SQL?
� SQL stands for Structured Query Language.
� It is used to store, retrieve and manipulate data in a RDBMS
(Relational Database Management System).
� SQL uses various commands to manipulate data from the stored data
such as CREATE, ALTER, SELECT, INSERT, DELETE, DROP etc.
� RDBMS
⚫ Relational Database Management System is a database
management system (DBMS) based on the relational model, as
defined by E.F Codd.
⚫ A relational database stores data in the form of table. Each
table consists of rows and columns.

ID NAME GRADE COLLEGE


1 Martin A SSPMS
2 Ryan A PVG
3 Alex B VIIT
SQL SYNTAX
� SQL follows unique set of rules and provide guidelines
called as syntax.
� SQL is not case sensitive. SQL keywords are generally
written in uppercase.
� SQL is dependent on relational algebra and tuple.
� User is able to perform several operations in a database
with SQL statements.
� Example:  
1) SELECT “column name” FROM “table name”;
2) SELECT * FROM Employee;
SQL DATABASES AND OPERATORS

� A data type defines a sort of value that a column should contain.


� In a database table, every column is necessary to have a name
and data type.
� Important Note: Data type may vary depending on the database.
For example: MySQL supports INT but Oracle supports
NUMBER for integer values.
DATA TYPES
Data type Description
INTEGER Integer number (no decimal)
CHARACTER(n) Character string with fixed length of n.
VARCHAR(n) Character string with variable length of n
DECIMAL(p,s) Where, 'p' is precision value and 's' is scale value.
REAL This is single precision floating point numeric value.
FLOAT(p) Where, 'p' is precision value.
DOUBLE PRECISION This is double precision floating point number.
DATE Stores YY/MM/DD values.
TIME Stores hour, minute and second values.
Stores year, month, day, hour, minute and second
TIMESTAMP
values.
ARRAY It is a set-length and ordered collection of elements.
XML Stores xml data
SQL OPERATORS
1. SQL arithmetic operators.

Operators Description
'+' Performs addition.

'-' Performs subtraction.

'*' Performs multiplication.

'/' Performs division.


Divides left hand operand by
'%' right hand operand and
returns reminder.
SQL OPERATORS Consider the value of a = 25 and b = 75 to
2. SQL Comparison operators understand examples in the following table.
Operators Description Example
'=' Checks if a is equal to b. If yes, condition becomes true, else false. (a = b) is not true.

'!=' Checks if a is not equal to b. If yes, condition becomes true, else false. (a!= b) is true.

'<>' Checks if a is equal to b or not. If yes, condition becomes true, else (a<>b) is false.
false.

'>' Checks if a is greater than b. If yes, condition becomes true, else false. (a>b) is false.

'<' Checks if a is less than b. If yes, condition becomes true, else false. (a<b) is true.

'>=' Checks if value of a is greater than or equal to b. If yes, condition (a>=b) is false.
becomes true, else false.

'<=' Checks if value of a is less than or equal to b. If yes, condition (a<= b) is true.
becomes true, else false.

'!<' Checks if value of a is not less than value of b. If yes, condition (a!<b) is false.
becomes true, else false.

'!>' Checks if value of a is not greater than b. If yes, condition becomes (a!>b) is true.
true.
SQL OPERATORS
3. SQL Logical Operator
Operator Description

It is used to compare a value to all values in another


ALL
value set.
It allows the existence of multiple conditions in SQL
AND
statement.
ANY It compares the value in list according to the condition.
BETWEEN It is used to search for values within the set of values.
It is used to compare a value with the specified list
IN
values.
NOT It reverse the meaning of any logical operator.
It is used to combine multiple conditions in SQL
OR
statements.
It is used to search for presence of a row in a specified
EXIST
table.
It is used to compare a value to similar values using
LIKE
wildcard operators.
SQL COMMANDS

�DDL – Data Definition Language

�DML – Data Manipulation Language

�DCL – Data Control Language

�TCL – Transaction Control Language


DDL
1. CREATE
2. ALTER
3. DROP
4. TRUNCATE
5. RENAME
1. CREATE
CREATE DATABASE
Syntax: Example:
CREATE DATABASE Database_Name; CREATE DATABASE Employee;

CREATE TABLE

Syntax Example
CREATE TABLE table_name ( CREATE TABLE Persons (
    column1 datatype,     PersonID int,
    column2 datatype,     LastName varchar(255),
    column3 datatype,     FirstName varchar(255),
   ....     Address varchar(255),
);     City varchar(255)
);
CREATE VIEW

Syntax Example
CREATE VIEW view_name AS CREATE VIEW Brazil_Customers AS
SELECT column1, column2, ... SELECT CustomerName, ContactName
FROM table_name FROM Customers
WHERE Country = 'Brazil';
WHERE condition;
2. ALTER
Alter command is used for altering the table in many forms like:
1.Add a column
2.Rename existing column
3.Drop a column
4.Modify the size of the column or change datatype of the column
ADD using ALTER
Syntax: ALTER TABLE table_name ADD( column_name datatype);
Example: ALTER TABLE Student ADD (Address VARCHAR(200));
RENAME using ALTER 
Syntax: ALTER TABLE table_name RENAME old_column_name TO
new_column_name;
Example: ALTER TABLE Employee RENAME Marks TO Age;
DROP using ALTER
Syntax: ALTER TABLE table_name DROp (column_name);
Example: ALTER TABLE Employee DROP (Age);
DROP using ALTER
Syntax: ALTER TABLE table_name DROp (column_name);
Example: ALTER TABLE Employee DROP (Age);
3. DROP
DROP DATABASE

Syntax: Example:
DROP DATABASE Database Name; DROP DATABASE Employee;

DROP TABLE

Syntax : Example:
DROP TABLE table_name DROP TABLE Student;

SQL DROP statement is used to delete


indexes from a table in the database.
4. TRUNCATE
This command removes all the records from a table. But this command will not
destroy the table’s structure.
Syntax :
TRUNCATE TABLE table_name

This will delete all the records from the table. For example the below command will
remove all the records from table student.

Example:
TRUNCATE TABLE Student;
5. RENAME
The rename command is used to change the name of an
existing database object(like Table, Column) to a new name.

Syntax: RENAME TABLE `current_table_name` TO `new_table_name`;


Example: CREATE TABLE t1(name varchar(20), id int(19));
Mysql> desc t1;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| name| varchar(20) | YES | | NULL | |
mysql> desc t1;
| id | int(19) | YES | | NULL | |
ERROR 1146 (42S02): Table 'new.t1' doesn't exist
+-------+-------------+------+-----+---------+-------+
rename table t1 to test;
Mysql> desc test;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| name | varchar(20) | YES | | NULL | |
| id | int(19) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
DML
1. SELECT
2. INSERT
3. DELETE
4. UPDATE
1. SELECT
SQL SELECT statement is most commonly used to query
the database to retrieve the selected data, according to
the necessary condition from the stored table.
Syntax:
SELECT column_name FROM table_name WHERE condition;
Example:
Student_ID LastName FirstName Marks
1 Patil Ravi 60
2 Morya Surendra 60
3 Singh Jaya 78
1. To display all data from the table 'Students', the query should be written
as:
SELECT * FROM Students;

2. To select LastName of all the students, the query should be:


SELECT LastName FROM Students;
3. To obtain LastName from table 'Students' for the students securing 60
marks, the query should be written as:
SELECT LastName FROM Students WHERE Marks = 60;
SELECT
SELECT DISTINCT
Syntax:
SELECT DISTINCT column_name1, column_name2, ….......
FROM table_name;

Student_ID LastName FirstName Marks


1 Patil Ravi 60
2 Morya Surendra 60
3 Singh Jaya 78

Example:
SELECT DISTINCT Marks FROM student;

Marks
60
78
WHERE
The WHERE Clause is used to retrieve only those records which fulfill the given
criteria.
Syntax:
SELECT column_name FROM table_name WHERE conditions;

C_ID LastName FirstName Contact_no City Country


1 Patil Ravi 0201234568 Pune India
2 Morya Surendra 0202345677 Pune India
3 Singh Jaya 0203456788 Berlin Germany
4 Pandit Prajakta 0204567897 Pune India

SELECT * FROM Clients WHERE C_ID=1;

AND
The SQL AND operator is used to combine multiple conditions along with WHERE clause.
Syntax:
SELECT Column_name FROM table_name WHERE condition AND condition;
SELECT C_ID FROM Clients WHERE City= 'Pune‘ AND Country= 'India';
OR
The SQL OR operator is used to combine multiple conditions along with WHERE and OR
clause.
Syntax:
SELECT * FROM table_name WHERE condition OR condition;
SELECT FirstName FROM Clients WHERE City='Pune‘ OR City='Berlin';
2. INSERT
SQL INSERT statement is used to insert a data into a table .

The two ways to insert a data in a table are:

1. Inserting data with SELECT statement.


INSERT INTO SELECT statement selects data from one table to another table
(existing table).
Syntax:
i) To copy all columns from one table to another table.

INSERT INTO table_3 SELECT * FROM table_1;


ii) To copy only few columns into another columns.

SELECT INTO table_3 (column_1, column_2, column_3........) FROM table_1;

INSERT  INTO Table1(Last_Name, First_Name, Country)


SELECT Last_Name, First_Name, Country FROM Table2;
2. Inserting data without SELECT statement.
Syntax:
INSERT INTO Table_name (Column_Name1, Column Name 2) Values (Value1, Value2);
INSERT INTO Clients (Last_Name, First_Name, Contact, Country)
Values ('Pandit', 'Prajakta', 2345678, 'India');
INSERTING DATA WITH SELECT STATEMENT.
Table 1:
Client_ID Last_Name First_Name Contact Country
1 Patil Ravi 600000 India
2 Morya Surendra 230000 India
3 Singh Jaya 780000 India
4 Pandit Prajakta 550000 India

Table 2:
Client_ID Last_Name First_Name Contact Country
1 Thomas Alex 2400000 USA
2 Cruise Martin 5600000 USA

INSERT  INTO Table1(Last_Name, First_Name, Country)


SELECT Last_Name, First_Name, Country FROM Table2;
The result is shown in the following table.  
Client_ID Last_Name First_Name Contact Country
1 Patil Ravi 600000 India
2 Morya Surendra 230000 India
3 Singh Jaya 780000 India
4 Pandit Prajakta 550000 India
5 Thomas Alex null USA
6 Cruise Martin null USA
3. DELETE
SQL DELETE Statement is used to delete one or multiple rows from table.

Syntax:
DELETE FROM table_name WHERE [Condition];

Client_ID Last_Name First_Name Contact Country


1 Thomas Alex 2400000 USA
2 Cruise Martin 5600000 USA
3 Pandit Prajakta 34542892 India

1. Write a query to delete row where, Client_ID= 2.

DELETE FROM Clients WHERE Client_ID = 2;

2. Write a query to delete all rows from the table.

DELETE FROM Clients;


4. UPDATE
SQL UPDATE statement is used to modify the data already present in the database.

Syntax:
UPDATE table_name SET Column1= value1, Cloumn2 = value2, column3 = value3.....
WHERE [Condition];
Example : Query using UPDATE statement.
Consider the following table titled as 'Clients'
Client_ID Last_Name First_Name Contact Country
1 Thomas Alex 2400000 USA
2 Cruise Martin 5600000 USA
3 Pandit Prajakta null India
1. Write a query to update contact information of a particular row where condition
is given as, client id = 3.

UPDATE Clients SET Contact = 34542892 WHERE Client_ID = 3 AND LAST_Name =


'Pandit';
2. Write a query to perform update operation on multiple fields from given table

UPDATE Clients SET LAST_Name = 'Brown', First_name = 'Albert', Contact =


923849, Country = 'UK‘ WHERE Client_ID = 3;
DCL
� DCL stands for Data Control Language in Structured Query
Language (SQL). As the name suggests these commands are
used to control privilege in the database.
� The privileges (Right to access the data) are required for
performing all the database operations like creating tables,
views, or sequences.
� DCL command is a statement that is used to perform the work
related to the rights, permissions, and other control of the
database system.
� There are two types of Privileges in database:
⚫ System Privilege
⚫ Object Privilege
DCL
� Need Of DCL commands
⚫ Unauthorized access to the data should be prevented in order to
achieve security in our database
⚫ DCL commands maintain the database effectively than anyone
else other than database administrator is not allowed to access the
data without permission.
⚫ These commands provide flexibility to the data administrator to
set and remove database permissions in granular fashion.
� Commands in DCL
⚫ The two most important DCL commands are:
� GRANT
� REVOKE
1. GRANT
� This command is used to grant permission to the user to perform
a particular operation on a particular object.
� If you are a database administrator and you want to restrict user
accessibility such as one who only views the data or may only
update the data.
� You can give the privilege permission to the users according to
your wish.

Syntax:
GRANT privileges ON Object TO user;
Parameters Used:
• privileges: These are the access rights or privileges granted to the user.
• object: It is the name of the database object to which permissions are being granted. In
the case of granting privileges on a table, this would be the table name.
• user: It is the name of the user to whom the privileges would be granted.
PRIVILEGES
EXAMPLE FOR GRANT
GRANT SELECT ON Users TO'Amit'@'localhost;

GRANT SELECT, INSERT, DELETE, UPDATE ON


Users TO 'Amit'@'localhost;

GRANT ALL ON Users TO 'Amit'@'localhost;

GRANT SELECT ON Users TO '*'@'localhost;


2. REVOKE
� This command is used to take permission/access back from
the user.
� If you want to return permission from the database that you
have granted to the users at that time you need to run
REVOKE command.

Syntax:
REVOKE privilege_list ON object_name
FROM user_name;
REVOKE SELECT ON users TO 'Amit'@localhost';

REVOKE SELECT, INSERT, DELETE, UPDATE ON Users TO


'Amit'@'localhost;

REVOKE ALL ON Users TO 'Amit'@'localhost;

REVOKE SELECT ON Users TO '*'@'localhost;


TCL
� TCL stands for Transaction Control Languages. These
commands are used for maintaining consistency of the
database and for the management of transactions made by the
DML commands.
� A Transaction is a set of SQL statements that are executed
on the data stored in DBMS.
� Whenever any transaction is made these transactions are
temporarily happen in database. So to make the changes
permanent, we use TCL commands.
� The TCL commands are:
⚫ COMMIT
⚫ ROLLBACK
⚫ SAVEPOINT
EXAMPLE
mysql> CREATE TABLE customer (a INT, b CHAR (20), INDEX (a));
Query OK, 0 rows affected (0.00 sec)
mysql> -- Do a transaction with autocommit turned on.
mysql> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO customer VALUES (10, 'Heikki');
Query OK, 1 row affected (0.00 sec)
mysql> COMMIT;
Query OK, 0 rows affected (0.00 sec)
mysql> -- Do another transaction with autocommit turned off.
mysql> SET autocommit=0;
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO customer VALUES (15, 'John');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO customer VALUES (20, 'Paul');
Query OK, 1 row affected (0.00 sec)
mysql> DELETE FROM customer WHERE b = 'Heikki';
Query OK, 1 row affected (0.00 sec)
mysql> -- Now we undo those last 2 inserts and the delete.
mysql> ROLLBACK;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT * FROM customer;
+------+--------+
| a | b |
+------+-------- +
| 10 | Heikki |
+------+-------- +
1 row in set (0.00 sec)
SQL CONSTRAINTS
� SQL constraints are used to define rules for the data in table.
It can be define, inside the table when table is created or after
creating table.
SQL Constraints are listed below.

No Constraint Name Description


It ensures that a column cannot accept NULL
1 NOT NULL
values.
It ensures that each row and column have a
2 UNIQUE
unique value.
It is a combination of a NOT NULL and
3 PRIMARY KEY
UNIQUE.
4 FOREIGN KEY It is used to connect two tables together.
It ensures that the values in a column satisfies
5 CHECK
the condition.
6 DEFAULT Defines a default value for a column.
1. SQL NOT NULL
• The SQL NOT NULL constraint ensures that, a
column should not accept NULL values.
• Example : Query using NOT NULL constraint.

• Write a query to create a table by using NOT


NULL constraint.
• CREATE TABLE Employee
(
    Emp_ID int NOT NULL,
    Emp_Name varchar (255) NOT NULL,
    Emp_Address varchar (255),
    Emp_City Varchar (255)
);

• INSERT INTO EMPLOYEE VALUES(NULL,NULL,‘UKKADAM','CBE');


2. SQL UNIQUE CONSTRAINTS
• The UNIQUE constraint is used to ensure that
each row and column have a unique value.
• Example : Query using UNIQUE constraint.
Create a table 'Employee' by using UNIQUE
constraints.

CREATE TABLE Employee


(
    Emp_ID int UNIQUE,
    Emp_Name varchar (255) NOT NULL,
    Emp_Address varchar (255),
    Emp_City Varchar (255),
);
3. SQL PRIMARY KEY
� Primary key uniquely identifies each row in the table.
� When multiple columns are used as a primary key, it is known as composite
primary key.
� A primary key should not have null value. Each table have only one
primary key.
� Example : Query using PRIMARY KEY constraint.
Create a table 'Students' using PRIMARY KEY constraints.
CREATE TABLE Students
(
S_ID int not null,
Name Varchar (255) not null,
Address Varchar (255),
City varchar (255),
PRIMARY KEY (S_ID)
);
4. SQL FOREIGN KEY
� In relational databases, a foreign key in one table denotes a
primary key in another column.
� Example : Query using FOREIGN KEY constraint.
Consider the following two tables, one is
entitled 'Students' and another is 'Examination'.
4. SQL FOREIGN KEY
Table1 : 'Students'
Stud_ID Stud_Name City Country
1 Mark London England
2 Alex Paris France
3 Bob Sydney Australia
4 Jaya Delhi India
5 Surendra Baroda India

Table2 : 'Examination'

Exam_No Stud_ID Result


S101 1 Pass
S102 2 Fail
S103 3 Pass
S104 4 Pass
S105 5 Pass
CREATE TABLE Examination
(
    Exam_No varchar(255) NOT NULL, Result varchar(255) NOT NULL,
    Stud_ID int, PRIMARY KEY(Exam_No),
    FOREIGN KEY(Stud_ID) REFERENCES Students(Stud_ID)
);
5. SQL CHECK CONSTRAINT
� The CHECK constraint is used to limit the value range that can be
placed in a column.
� If you define a CHECK constraint on a single column it allows
only certain values for this column.
� If you define a CHECK constraint on a table it can limit the values
in certain columns based on values in other columns in the row.
� Example:
CREATE TABLE Persons (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
CHECK (Age>=18)
);
6. SQL DEFAULT CONSTRAINT
� The DEFAULT constraint is used to provide a default
value for a column.
� The default value will be added to all new records IF
no other value is specified.
� CREATE TABLE Persons (
    ID int NOT NULL,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int,
    City varchar(255) DEFAULT 'Sandnes'
);
6. SQL DEFAULT CONSTRAINT
� INSERT INTO Geeks VALUES (4, 'Mira', 23, 'Delhi');
� INSERT INTO Geeks VALUES (5, 'Hema', 27);
� INSERT INTO Geeks VALUES (6, 'Neha', 25, 'Delhi');
� INSERT INTO Geeks VALUES (7, 'Khushi', 26);

select * from Geeks;


ID Name Age Location

4 Mira 23 Delhi

5 Hema 27 Noida

6 Neha 25 Delhi

7 Khushi 26 Noida
TRIGGERS
� Triggers are Stored in database and executed by Oracle engine
whenever some event occurs.
� When a trigger is fired, SQL statement inside the trigger's
PL/SQL code block can also fire the same or some other
trigger.
� This is called cascading triggers.
� Triggers are written to execute in response events like DML
Statements (DELETE, INSERT, or UPDATE),
� DDL statements (CREATE, ALTER, DELETE), and database
operation (SERVERERROR, LOGON, LOGOFF, STARTUP
or SHUTDOWN)
DIFFERENCE BETWEEN TRIGGER AND
STORED PROCEDURE
Trigger Stored Procedure

Trigger is an act which is performed Stored procedure is a set of


automatically before or after an functionality which is executed
event has occurred. when it is explicitly invoked.

It cannot accept parameters. It can accept parameters.


A trigger cannot return any value. A stored procedure can return a
value.
It is executed automatically on It needs to be explicitly called.
some event.
Triggers are used for insertion, Stored procedures are often used
update and deletion. independently in the database.
USES OF TRIGGERS
1. Generate virtual column values
automatically.
2. Log events.
3. Prevent invalid transactions.
4. Enforce complex business or referential
integrity rules that user cannot define with
constraints.
DML Triggers
DML triggers are created on table or view,
and their triggering event that is composed
of the DML statements DELETE, INSERT,
UPDATE.
SYNTAX FOR CREATING TRIGGER
� CREATE [OR REPLACE ] TRIGGER trigger_name   
{BEFORE | AFTER | INSTEAD OF }   
{INSERT [OR] | UPDATE [OR] | DELETE}   
[OF col_name]   
ON table_name   
[REFERENCING OLD AS o NEW AS n]   
[FOR EACH ROW]   
WHEN (condition)    
DECLARE  
   Declaration-statements  
BEGIN   
   Executable-statements  
EXCEPTION  
   Exception-handling-statements  
END;
Example: Illustration of creating trigger in PL/SQL.

Consider the following table titled 'Employee'.

ID NAME DESIGNATION SALARY

101 AJAY MANAGER 37000

102 RAM ASST.MANAGER 32000

103 KARAN ASST.MANAGER 32000

104 RAM PROGRAMMER 20000

105 RAM ANALYST 20000


EXAMPLE
CREATE OR REPLACE TRIGGER print_salary_changes
BEFORE DELETE OR INSERT OR UPDATE ON employee
FOR EACH ROW
WHEN (NEW.Id>0)
DECLARE
sal_diff NUMBER;
BEGIN
sal_diff := :NEW.salary - :OLD.salary;
DBMS_OUTPUT.PUT(:NEW.name || ': ');
DBMS_OUTPUT.PUT('Old salary = ' || :OLD.salary || ', ');
DBMS_OUTPUT.PUT('New salary = ' || :NEW.salary || ', ');
DBMS_OUTPUT.PUT_LINE('Difference: ' || sal_diff);
END;
THE FOLLOWING CODE IS USED TO
CHECK THE SALARY DIFFERENCE.

DECLARE
total_rows number(2);
BEGIN
UPDATE EMPLOYEE
SET salary = salary+( salary *0.05);
IF sql%notfound THEN
dbms_output.put_line('no employee updated');
ELSIF sql%found THEN
total_rows := sql%rowcount;
dbms_output.put_line( total_rows || ' employee updated ');
END IF;
END;
INPUT TABLE: EMPLOYEE
ID NAME DESIGNATION SALARY
101 AJAY MANAGER 40000
102 RAM ASST.MANAGER 35000
103 KARAN ASST.MANAGER 35000
104 RAM PROGRAMMER 23000
105 RAM ANALYST 23000

OUTPUT TABLE EMPLOYEE


ID1 NAME DESIGNATION SALARY

101 AJAY MANAGER 40000


102 RAM ASST.MANAGER 35000
103 KARAN ASST.MANAGER 35000
104` RAM PROGRAMMER 23000
105 RAM ANALYST 23000
CURSOR
� Oracle Engine uses a work area for its internal processing to
execute SQL statements.
� This work area is private to SQL's operations and is called
a cursor.
� The data stored in the cursor is called the active data set.
� Cursor contains information on a SELECT statement and the
rows of data accessed by it.
� It can hold more than one row, but can process only one row at
a time.
� Cursor is used to access the result-set present in the memory.
� This result set contains the records returned on execution of a
query.
ATTRIBUTES OF CURSORS

Attributes Description

%ISOPEN Returns TRUE if cursor is open, else FALSE.

%FOUND Returns TRUE if record was fetch successfully, else FALSE.

Returns TRUE if records was not fetched successfully, else


%NOTFOUND
FALSE.

%ROWCOUNT Returns number of records processed from the cursor.


TYPES OF CURSORS
Cursors are classified depending on the circumstances
in which they are opened.
� Implicit Cursor
� Explicit Cursor 
1. IMPLICIT CURSORS

� Implicit cursors are automatically generated by the


Oracle engine.
� If the Oracle Engine opens a cursor for its internal
processing, it is known as Implicit cursor.
� Implicit cursors are created by default to process the
statements when DML statements(INSERT, UPDATE,
DELETE) are executed.
Example:
Update the information of employees using implicit cursor.

Consider the following table titled 'Employee'


Id Name Designation Salary
1 Albert Programmer 50000
2 Anna HR 25000
3 Mark Analyst 55000
4 Jason Content writer 21000
5 Andrew Programmer 90000

DECLARE   
   total_rows number(2);  
BEGIN  
   UPDATE Employee  
   SET salary = salary + 1000;  
   IF sql%notfound THEN  
      dbms_output.put_line('no Employee updated');  
   ELSIF sql%found THEN  
      total_rows := sql%rowcount;  
      dbms_output.put_line( total_rows | |  ' Employee updated ');  
   END IF;   
END;
2. EXPLICIT CURSOR
� If a cursor is opened for processing data through a PL/SQL
block as per requirement like user defined cursor, is known as
an Explicit cursor.
� Explicit cursor is created while executing a SELECT statement
that returns more than one row.
� These cursor should be defined in the declaration section of the
PL/SQL block and created on a SELECT statement which
returns more than one row.
� Syntax:
⚫ Cursor cursor_name IS select_statement;
FOLLOWING ARE THE STEPS TO WORK
WITH AN EXPLICIT CURSOR:
1. Declare
� Syntax:
CURSOR Name IS SELECT statement;
2.Open
� Syntax:
OPEN Cursor_name;
3. Fetch
� This statement is used to access one row at a time.
Syntax:
FETCH cursor_name INTO variable_list;  
       
4. Close
Syntax:
Close Cursor_name;
EXAMPLE:
WRITE A PL/SQL CODE TO RETRIEVE THE
EMPLOYEE NAME AND DESIGNATION USING
EXPLICIT CURSOR.  
ID NAME DESIGNATION SALARY
101 AJAY MANAGER 40000
102 RAM ASST.MANAGER 35000
103 KARAN ASST.MANAGER 35000
104 RAM PROGRAMMER 23000
105 RAM ANALYST 23000
WRITE A PL/SQL CODE TO RETRIEVE THE
EMPLOYEE NAME AND DESIGNATION USING
EXPLICIT CURSOR.
DECLARE
c_id employee.id%type;
c_name employee.name%type;
c_des employee.designation%type;
CURSOR c_employee is
SELECT id, name, designation FROM employee;
BEGIN
OPEN c_employee;
LOOP
FETCH c_employee into c_id, c_name, c_des;
EXIT WHEN c_employee%notfound;
dbms_output.put_line(c_id || ' ' || c_name || ' ' || c_des);
END LOOP;
CLOSE c_employee;
END;
OUTPUT

101 AJAY MANAGER


102 RAM ASST.MANAGER
103 KARAN ASST.MANAGER
104 RAM PROGRAMMER
105 RAM ANALYST

Statement processed.
EMBEDDED SQL
AND
DYNAMIC SQL
WHAT IS EMBEDDED SQL?
� This is a method for combining data manipulation capabilities
of SQL and computing power of any programming language.
� A language in which SQL queries are embedded is referred to
as a host language, and the SQL structures permitted in the
host language constitute embedded SQL.
� Programs written in the host language can use the embedded
SQL syntax to access and update data stored in a database.
� An embedded SQL program must be processed by a special
preprocessor prior to compilation.
� The preprocessor replaces embedded SQL requests with host-
language declarations and procedure calls that allow runtime
execution of the database accesses.
EMBEDDED SQL
� Then the resulting program is compiled by the host language
compiler.
� To identify embedded SQL requests to the preprocessor, we
use the EXEC SQL statement; it has the form:
� To identify embedded SQL requests to the preprocessor, we
use the EXEC SQL statement; it has the form:

EXEC SQL<embedded SQL statement>;


or
EXEC SQL<embedded SQL statement> END-EXEC
EMBEDDED SQL

Connecting to the Database

CONNECT TO <server-name> AS <Connection-name>

AUTHORIZATION<user account name and password> ;

To change the connection

SET CONNECTION < connection name> ;

To disconnect

DISCONNECT <connection name>;


C PROGRAM VARIABLES USED IN THE
EMBEDDED SQL EXAMPLES
int loop ;
EXEC SQL BEGIN DECLARE SECTION ;
varchar dname [16], fname [16], lname [16], address [31] ;
char ssn [10], bdate [11], sex [2], minit [2] ;
float salary, raise ;
int dno, dnumber ;
int SQLCODE ; char SQLSTATE [6] ;
EXEC SQL END DECLARE SECTION ;
A C PROGRAM SEGMENT WITH EMBEDDED SQL.
loop = 1 ;
while (loop) {
prompt("Enter a Social Security Number: ", ssn) ;
EXEC SQL
SELECT Fname, Minit, Lname, Address, Salary
INTO :fname, :minit, :lname, :address, :salary
FROM EMPLOYEE WHERE Ssn = :ssn ;
if (SQLCODE = = 0)
printf(fname, minit, lname, address, salary)
else
printf("Social Security Number does not exist: ", ssn) ;
prompt("More Social Security Numbers (enter 1 for Yes, 0 for No): ",
loop) ;
}
A C PROGRAM SEGMENT THAT USES CURSORS WITH EMBEDDED
SQL FOR UPDATE PURPOSES.
prompt("Enter the Department Name: ", dname) ;
EXEC SQL
SELECT Dnumber INTO :dnumber
FROM DEPARTMENT WHERE Dname = :dname ;
EXEC SQL DECLARE EMP CURSOR FOR
SELECT Ssn, Fname, Minit, Lname, Salary
FROM EMPLOYEE WHERE Dno = :dnumber
FOR UPDATE OF Salary ;
EXEC SQL OPEN EMP ;
EXEC SQL FETCH FROM EMP INTO :ssn, :fname, :minit, :lname, :salary ;
while (SQLCODE = = 0) {
printf("Employee name is:", Fname, Minit, Lname) ;
prompt("Enter the raise amount: ", raise) ;
EXEC SQL
UPDATE EMPLOYEE
SET Salary = Salary + :raise
WHERE CURRENT OF EMP ;
EXEC SQL FETCH FROM EMP INTO :ssn, :fname, :minit, :lname, :salary ;
}
EXEC SQL CLOSE EMP ;
EMBEDDED SQL
� SQL stands for Structured Query Language, it provides
as a declarative query language.
� However, a general-purpose programming language
requires to get access to the database because
⚫ SQL is not as powerful as any of the general purpose
language available today.
⚫ There are many declarative actions such as interacting with
the user sending the result to a GUI or printing a report
which we cannot do using SQL.
⚫ There are many queries that we can express in C, Pascal,
Cobol and many more but we cannot express in SQL.
DYNAMIC SQL
� Dynamic SQL is a programming technique that allows you to
construct SQL statements dynamically at runtime.
� It allows you to create more general purpose and flexible SQL
statement because the full text of the SQL statements may be
unknown at compilation.
� Creating a dynamic SQL is simple, you just need to make it a
string as follows:
⚫ 'SELECT * FROM production.products';
� To execute a dynamic SQL statement, you call the stored
procedure sp_executesql as shown in the following statement
� EXEC sp_executesql N'SELECT * FROM production.products';
Create table Employees
EXAMPLE (
ID int primary key identity,
FirstName nvarchar(50),
LastName nvarchar(50),
Gender nvarchar(50),
Salary int
)
Go

Insert into Employees values ('Mark', 'Hastings', 'Male',


60000)
Insert into Employees values ('Steve', 'Pound', 'Male',
45000)
Insert into Employees values ('Ben', 'Hoskins', 'Male',
70000)
Insert into Employees values ('Philip', 'Hastings', 'Male',
45000)
Insert into Employees values ('Mary', 'Lambeth', 'Female',
30000)
Insert into Employees values ('Valarie', 'Vikings', 'Female',
35000)
Insert into Employees values ('John', 'Stanmore', 'Male',
80000)
Go
DYNAMIC SQL EXAMPLE
� First, declare two variables, @table for holding the
name of the table from which you want to query
and @sql for holding the dynamic SQL
DECLARE
@table NVARCHAR(128),
@sql NVARCHAR(MAX);
SET @table = N'production.products';

SET @sql = N'SELECT * FROM ' + @table;

EXEC sp_executesql @sql;


DYNAMIC SQL EXAMPLE
CREATE PROC usp_query (
@table NVARCHAR(128)
)
AS
BEGIN

DECLARE @sql NVARCHAR(MAX);


-- construct SQL
SET @sql = N'SELECT * FROM ' + @table;
-- execute the SQL
EXEC sp_executesql @sql;

END;

EXEC usp_query 'production.brands';


Static (Embedded) SQL Dynamic (Interactive) SQL

In Static SQL, how database will be accessed


In Dynamic SQL, how database will be
is predetermined in the embedded SQL
accessed is determined at run time.
statement.

SQL statements are compiled at compile


SQL statements are compiled at run time.
time.

Parsing, Validation, Optimization and Parsing, Validation, Optimization and


Generation of application plan are done at Generation of application plan are done at
compile time. run time.

It is generally used for situations where data It is generally used for situations where data
is distributed uniformly. is distributed non uniformly.

EXECUTE IMMEDIATE, EXECUTE and EXECUTE IMMEDIATE, EXECUTE and


PREPARE statements are not used. PREPARE statements are used.

It is less flexible. It is more flexible.


QUERY PROCESSING
BASIC STEPS IN QUERY PROCESSING
1. Parsing and translation
2. Optimization
3. Evaluation
BASIC STEPS IN QUERY PROCESSING (CONT.)

Parsing and translation


⚫ translate the query into its internal form. This is then translated
into relational algebra.
⚫ Parser checks syntax, verifies relations
Evaluation
⚫ The query-execution engine takes a query-evaluation plan,
executes that plan, and returns the answers to the query.
BASIC STEPS IN QUERY PROCESSING : OPTIMIZATION

A relational algebra expression may have many equivalent expressions


⚫ E.g., salary75000(salary(instructor)) is equivalent to
salary(salary75000(instructor))
Each relational algebra operation can be evaluated using one of several
different algorithms
⚫ Correspondingly, a relational-algebra expression can be evaluated
in many ways.
Annotated expression specifying detailed evaluation strategy is called
an evaluation-plan.
⚫ E.g., can use an index on salary to find instructors with salary <
75000,
⚫ or can perform complete relation scan and discard instructors with
salary  75000
BASIC STEPS: OPTIMIZATION (CONT.)
Query Optimization: Amongst all equivalent evaluation plans
choose the one with lowest cost.
⚫ Cost is estimated using statistical information from the
database catalog
e.g. number of tuples in each relation, size of tuples, etc.
In this session we study
⚫ How to measure query costs
⚫ Algorithms for evaluating relational algebra operations
⚫ How to combine algorithms for individual operations in order
to evaluate a complete expression
MEASURES OF QUERY COST
Cost is generally measured as total elapsed time for answering query
⚫ Many factors contribute to time cost
disk accesses, CPU, or even network communication
Typically disk access is the predominant cost, and is also relatively
easy to estimate. Measured by taking into account
⚫ Number of seeks * average-seek-cost
⚫ Number of blocks read * average-block-read-cost
⚫ Number of blocks written * average-block-write-cost
Cost to write a block is greater than cost to read a block
data is read back after being written to ensure that the
write was successful
MEASURES OF QUERY COST (CONT.)
For simplicity we just use the number of block transfers from disk and
the number of seeks as the cost measures
⚫ tT – time to transfer one block
⚫ tS – time for one seek
⚫ Cost for b block transfers plus S seeks
b * tT + S * tS
We ignore CPU costs for simplicity
⚫ Real systems do take CPU cost into account
We do not include cost to writing output to disk in our cost formulae
MEASURES OF QUERY COST (CONT.)
Several algorithms can reduce disk IO by using extra buffer space
⚫ Amount of real memory available to buffer depends on other
concurrent queries and OS processes, known only during
execution
We often use worst case estimates, assuming only the
minimum amount of memory needed for the operation is
available
Required data may be buffer resident already, avoiding disk I/O
⚫ But hard to take into account for cost estimation
SELECTION OPERATION
File scan
Algorithm A1 (linear search). Scan each file block and test all records to
see whether they satisfy the selection condition.
⚫ Cost estimate = br block transfers + 1 seek
br denotes number of blocks containing records from relation r
⚫ If selection is on a key attribute, can stop on finding record
cost = (br /2) block transfers + 1 seek
⚫ Linear search can be applied regardless of
selection condition or
ordering of records in the file, or
availability of indices
Note: binary search generally does not make sense since data is not stored
consecutively
⚫ except when there is an index available,
⚫ and binary search requires more seeks than index search
SELECTIONS USING INDICES
Index scan – search algorithms that use an index
⚫ selection condition must be on search-key of index.
A2 (primary index, equality on key). Retrieve a single record that
satisfies the corresponding equality condition
SELECT * FROM STUDENT WHERE STD_ID = 105;
⚫ Cost = (hi + 1) * (tT + tS)

A3 (primary index, equality on nonkey) Retrieve multiple records.


⚫ Records will be on consecutive blocks
Let b = number of blocks containing matching records
⚫ Cost = hi * (tT + tS) + tS + tT * b
⚫ SELECT * FROM STUDENT WHERE STD_NAME = ‘Marry’;
SELECTIONS USING INDICES
A4 (secondary index, equality on nonkey).
⚫ Retrieve a single record if the search-key is a candidate key
Cost = (hi + 1) * (tT + tS)
⚫ Retrieve multiple records if search-key is not a candidate key
each of n matching records may be on a different block
Cost = (hi + n) * (tT + tS)
Can be very expensive!
SELECTIONS INVOLVING COMPARISONS
Can implement selections of the form AV (r) or A  V(r) by using
⚫ a linear file scan,
⚫ or by using indices in the following ways:
A5 (primary index, comparison). (Relation is sorted on A)
For A  V(r) use index to find first tuple  v and scan relation
sequentially from there
For AV (r) just scan relation sequentially till first tuple > v; do not
use index
 SELECT * FROM STUDENT WHERE STD_ID >= 105;
A6 (secondary index, comparison).
For A  V(r) use index to find first index entry  v and scan index
sequentially from there, to find pointers to records.
For AV (r) just scan leaf pages of index finding pointers to records,
till first entry > v
In either case, retrieve records that are pointed to
requires an I/O for each record
Linear file scan may be cheaper
SELECT * FROM STUDENT WHERE AGE>=18
IMPLEMENTATION OF COMPLEX
SELECTIONS
� Conjunction: 1 2. . . n(r)
� A7 (conjunctive selection using one index).
⚫ Select a combination of i and algorithms A1 through A7 that results in
the least cost for i (r).
⚫ Test other conditions on tuple after fetching it into memory buffer.
� A8 (conjunctive selection using composite index).
⚫ Use appropriate composite (multiple-key) index if available.
� A9 (conjunctive selection by intersection of identifiers).
⚫ Requires indices with record pointers.
⚫ Use corresponding index for each condition, and take intersection of
all the obtained sets of record pointers.
⚫ Then fetch records from file
⚫ If some conditions do not have appropriate indices, apply test in
memory.
ALGORITHMS FOR COMPLEX
SELECTIONS
Disjunction:1 2 . . . n (r).
A10 (disjunctive selection by union of identifiers).
⚫ Applicable if all conditions have available indices.
Otherwise use linear scan.
⚫ Use corresponding index for each condition, and take union of
all the obtained sets of record pointers.
⚫ Then fetch records from file
Negation: (r)
⚫ Use linear scan on file
⚫ If very few records satisfy , and an index is applicable to 
Find satisfying records using index and fetch from file
SORTING

We may build an index on the relation, and then use the index to
read the relation in sorted order. May lead to one disk block
access for each tuple.
For relations that fit in memory, techniques like quicksort can be
used. For relations that don’t fit in memory, external
sort-merge is a good choice.
EXAMPLE: EXTERNAL SORTING USING SORT-MERGE
a 19 a 19
g 24 d 31 a 14
b 14
a 19 g 24 a 19
c 33
d 31 b 14
b 14 d 31
c 33 c 33
c 33 e 16
b 14 d 7
e 16 g 24
e 16 d 21
r 16 d 21 d 31
a 14
d 21 m 3 e 16
d 7
m 3 r 16 g 24
d 21
p 2 m 3
m 3
d 7 a 14 p 2
p 2
a 14 d 7 r 16
r 16
p 2
initial sorted
relation runs runs output
create merge merge
runs pass–1 pass–2
EXTERNAL SORT-MERGE
Let M denote memory size (in pages).
1. Create sorted runs. Let i be 0 initially.
Repeatedly do the following till the end of the relation:
(a) Read M blocks of relation into memory
(b) Sort the in-memory blocks
(c) Write sorted data to run Ri; increment i.
Let the final value of i be N
2. Merge the runs (next slide)…..
EXTERNAL SORT-MERGE (CONT.)
2. Merge the runs (N-way merge). We assume (for now) that
N < M.
1. Use N blocks of memory to buffer input runs, and 1 block to
buffer output. Read the first block of each run into its buffer
page
2. repeat
1. Select the first record (in sort order) among all buffer pages
2. Write the record to the output buffer. If the output buffer is
full write it to disk.
3. Delete the record from its input buffer page.
If the buffer page becomes empty then
read the next block (if any) of the run into the buffer.
3. until all input buffer pages are empty:
EXTERNAL SORT-MERGE (CONT.)
If N  M, several merge passes are required.
⚫ In each pass, contiguous groups of M - 1 runs are merged.
⚫ A pass reduces the number of runs by a factor of M -1, and
creates runs longer by the same factor.
E.g. If M=11, and there are 90 runs, one pass reduces the
number of runs to 9, each 10 times the size of the initial runs
⚫ Repeated passes are performed till all runs have been merged
into one.
EXTERNAL MERGE SORT (CONT.)

� Cost analysis:
⚫ 1 block per run leads to too many seeks during merge
� Instead use bb buffer blocks per run
 read/write bb blocks at a time
� Can merge M/bb–1 runs in one pass
⚫ Total number of merge passes required: log M/bb–1(br/M).
⚫ Block transfers for initial run creation as well as in each pass
is 2br
� for final pass, we don’t count write cost
� we ignore final write cost for all operations since the output of
an operation may be sent to the parent operation without being
written to disk
� Thus total number of block transfers for external sorting:
br ( 2 log M/bb–1 (br / M) + 1) 

⚫ Seeks: next slide


EXTERNAL MERGE SORT (CONT.)

Cost of seeks
⚫ During run generation: one seek to read each run and one
seek to write each run
2 br / M
⚫ During the merge phase
Need 2 br / bb seeks for each merge pass
except the final one which does not require a write
Total number of seeks:
2 br / M + br / bb (2 logM/bb–1(br / M) -1)
JOIN OPERATION
Several different algorithms to implement joins
⚫ Nested-loop join
⚫ Block nested-loop join
⚫ Indexed nested-loop join
⚫ Merge-join
⚫ Hash-join
Choice based on cost estimate
Examples use the following information
⚫ Number of records of student: 5,000 takes: 10,000
⚫ Number of blocks of student: 100 takes: 400
NESTED-LOOP JOIN

To compute the theta join r  s


for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr,ts) to see if they satisfy the join condition 
if they do, add tr • ts to the result.
end
end
r is called the outer relation and s the inner relation of the join.
Requires no indices and can be used with any kind of join condition.
Expensive since it examines every pair of tuples in the two relations.
NESTED-LOOP JOIN (CONT.)

� In the worst case, if there is enough memory only to hold one block of
each relation, the estimated cost is
nr  bs + br block transfers, plus
nr + br seeks
� If the smaller relation fits entirely in memory, use that as the inner
relation.
⚫ Reduces cost to br + bs block transfers and 2 seeks
� Assuming worst case memory availability cost estimate is
⚫ with student as outer relation:
� 5000  400 + 100 = 2,000,100 block transfers,
� 5000 + 100 = 5100 seeks
⚫ with takes as the outer relation
� 10000  100 + 400 = 1,000,400 block transfers and 10,400 seeks
� If smaller relation (student) fits entirely in memory, the cost estimate will
be 500 block transfers.
� Block nested-loops algorithm (next slide) is preferable.
BLOCK NESTED-LOOP JOIN

� Variant of nested-loop join in which every block of inner


relation is paired with every block of outer relation.
for each block Br of r do begin
for each block Bs of s do begin
for each tuple tr in Br do begin
for each tuple ts in Bs do begin
Check if (tr,ts) satisfy the join condition
if they do, add tr • ts to the result.
end
end
end
end
BLOCK NESTED-LOOP JOIN (CONT.)
� Worst case estimate: br  bs + br block transfers + 2 * br seeks
⚫ Each block in the inner relation s is read once for each block in
the outer relation
� Best case: br + bs block transfers + 2 seeks.
� Improvements to nested loop and block nested loop algorithms:
⚫ In block nested-loop, use M — 2 disk blocks as blocking unit for
outer relations, where M = memory size in blocks; use remaining
two blocks to buffer inner relation and output
� Cost = br / (M-2)  bs + br block transfers +
2 br / (M-2) seeks
⚫ If equi-join attribute forms a key or inner relation, stop inner loop
on first match
⚫ Scan inner loop forward and backward alternately, to make use of
the blocks remaining in buffer (with LRU replacement)
⚫ Use index on inner relation if available (next slide)
INDEXED NESTED-LOOP JOIN
� Index lookups can replace file scans if
⚫ join is an equi-join or natural join and
⚫ an index is available on the inner relation’s join attribute
� Can construct an index just to compute a join.
� For each tuple tr in the outer relation r, use the index to look up tuples
in s that satisfy the join condition with tuple tr.
� Worst case: buffer has space for only one page of r, and, for each tuple
in r, we perform an index lookup on s.
� Cost of the join: br (tT + tS) + nr  c
⚫ Where c is the cost of traversing index and fetching all
matching s tuples for one tuple or r
⚫ c can be estimated as cost of a single selection on s using the
join condition.
� If indices are available on join attributes of both r and s,
use the relation with fewer tuples as the outer relation.
EXAMPLE OF NESTED-LOOP JOIN COSTS

� Compute student takes, with student as the outer relation.


� Let takes have a primary B+-tree index on the attribute ID,
which contains 20 entries in each index node.
� Since takes has 10,000 tuples, the height of the tree is 4, and
one more access is needed to find the actual data
� student has 5000 tuples
� Cost of block nested loops join
⚫ 400*100 + 100 = 40,100 block transfers + 2 * 100 = 200 seeks
� assuming worst case memory
� may be significantly less with more memory
� Cost of indexed nested loops join
⚫ 100 + 5000 * 5 = 25,100 block transfers and seeks.
⚫ CPU cost likely to be less than that for block nested loops join
OTHER OPERATIONS
� Duplicate elimination can be implemented via hashing or sorting.
⚫ On sorting duplicates will come adjacent to each other, and all but
one set of duplicates can be deleted.
⚫ Optimization: duplicates can be deleted during run generation as
well as at intermediate merge steps in external sort-merge.
⚫ Hashing is similar – duplicates will come into the same bucket.
� Projection:
⚫ perform projection on each tuple
⚫ followed by duplicate elimination.
OTHER OPERATIONS : AGGREGATION
Aggregation can be implemented in a manner similar to duplicate
elimination.
⚫ Sorting or hashing can be used to bring tuples in the same group
together, and then the aggregate functions can be applied on each
group.
⚫ Optimization: combine tuples in the same group during run
generation and intermediate merges, by computing partial aggregate
values
For count, min, max, sum: keep aggregate values on tuples
found so far in the group.
When combining partial aggregate for count, add up the
aggregates
For avg, keep sum and count, and divide sum by count at the
end
QUERY PROCESSING
Overview
Measures of Query Cost
Selection Operation
Sorting
Join Operation
Other Operations
QUERY OPTIMIZATION
� Introduction
� Transformation of Relational Expressions
� Catalog Information for Cost Estimation
� Statistical Information for Cost Estimation
� Cost-based optimization
� Dynamic Programming for Choosing Evaluation Plans
� Materialized views
INTRODUCTION
� Alternative ways of evaluating a given query
⚫ Equivalent expressions
⚫ Different algorithms for each operation
INTRODUCTION (CONT.)
� An evaluation plan defines exactly what algorithm is used for each
operation, and how the execution of the operations is coordinated.

 Find out how to view query execution plans on your favorite database
INTRODUCTION (CONT.)

� Cost difference between evaluation plans for a query can be enormous


⚫ E.g. seconds vs. days in some cases
� Steps in cost-based query optimization
1. Generate logically equivalent expressions using equivalence rules
2. Annotate resultant expressions to get alternative query plans
3. Choose the cheapest plan based on estimated cost
� Estimation of plan cost based on:
⚫ Statistical information about relations. Examples:
� number of tuples, number of distinct values for an attribute
⚫ Statistics estimation for intermediate results
� to compute cost of complex expressions
⚫ Cost formulae for algorithms, computed using statistics
MEASURES OFmeasured
� Cost is generally QUERY COST
as total elapsed time for answering query
⚫ Many factors contribute to time cost
� disk accesses, CPU, or even network communication
� Typically disk access is the predominant cost, and is also relatively
easy to estimate. Measured by taking into account
⚫ Number of seeks * average-seek-cost
⚫ Number of blocks read * average-block-read-cost
⚫ Number of blocks written * average-block-write-cost
� Cost to write a block is greater than cost to read a block
� data is read back after being written to ensure that the
write was successful
MEASURES OF QUERY COST (CONT.)
� For simplicity we just use the number of block transfers from disk and
the number of seeks as the cost measures
⚫ tT – time to transfer one block
⚫ tS – time for one seek
⚫ Cost for b block transfers plus S seeks
b * tT + S * tS
� We ignore CPU costs for simplicity
⚫ Real systems do take CPU cost into account
� We do not include cost to writing output to disk in our cost formulae
MEASURES OF QUERY COST (CONT.)
� Several algorithms can reduce disk IO by using extra buffer space
⚫ Amount of real memory available to buffer depends on other
concurrent queries and OS processes, known only during
execution
� We often use worst case estimates, assuming only the
minimum amount of memory needed for the operation is
available
� Required data may be buffer resident already, avoiding disk I/O
⚫ But hard to take into account for cost estimation
EQUIVALENCE RULES
1. Conjunctive selection operations can be deconstructed into a
sequence of individual selections.
s q Ù q ( E ) =s q (s q ( E ))
1 2 1 2
2. Selection operations are commutative.
s q (s q ( E )) =s q (s q ( E ))
1 2 2 1

3. Only the last in a sequence of projection operations is needed, the


others can be omitted.
 L1 ( L2 ( ( Ln ( E )) ))   L1 ( E )
4. Selections can be combined with Cartesian products and theta joins.
a. (E1 X E2) = E1  E2

b. 1(E1 2 E2) = E1 1 2 E2


EQUIVALENCE RULES (CONT.)
5. Theta-join operations (and natural joins) are commutative.
E1  E2 = E2  E1
6. (a) Natural join operations are associative:
(E1 E2) E3 = E1 (E2 E3)

(b) Theta joins are associative in the following manner:

(E1 1 E2) 2 3 E 3 = E1 1 3 (E2 2 E3 )

where 2 involves attributes from only E2 and E3.


PICTORIAL DEPICTION OF EQUIVALENCE RULES
EQUIVALENCE RULES
7. The selection operation (CONT.)
distributes over the theta join operation under
the following two conditions:
(a) When all the attributes in 0 involve only the attributes of one
of the expressions (E1) being joined.

0E1  E2) = (0(E1))  E2

(b) When  1 involves only the attributes of E1 and 2 involves


only the attributes of E2.
1 E1  E2) = (1(E1))  ( (E2))
EQUIVALENCE RULES
8. The projection operation (CONT.)
distributes over the theta join operation as
follows:
(a) if  involves only attributes from L1  L2:
Õ L1 È L2 ( E1 q E2 ) = ( Õ L1 ( E1 )) q ( Õ L2 ( E2 ))

(b) Consider a join E1  E2.


⚫ Let L1 and L2 be sets of attributes from E1 and E2, respectively.
⚫ Let L3 be attributes of E1 that are involved in join condition , but
are not in L1  L2, and
⚫ let L4 be attributes of E2 that are involved in join condition , but
are not in L1  L2.

Õ L È L ( E1
1 2 q E2 ) = Õ L È L (( Õ L È L ( E1 ))
1 2 1 3 q (Õ L
2 È L4 ( E2 )))
EQUIVALENCE RULES (CONT.)
9. The set operations union and intersection are commutative
E1  E2 = E2  E1
E1  E2 = E2  E1
 (set difference is not commutative).
10. Set union and intersection are associative.
(E1  E2)  E3 = E1  (E2  E3)
(E1  E2)  E3 = E1  (E2  E3)
11. The selection operation distributes over ,  and –.
 (E1 – E2) =  (E1) – (E2)
and similarly for  and  in place of –
Also:  (E 1 – E2) = (E1) – E2
and similarly for  in place of –, but not for 
12. The projection operation distributes over union
L(E1  E2) = (L(E1))  (L(E2))
HEURISTIC OPTIMIZATION
� A query tree is a tree data structure that corresponds to a
relational algebra expression.
� It represents the input relations of the query as leaf nodes
of the tree, and represents the relational algebra operations
as internal nodes.
� An execution of the query tree consists of executing an
internal node operation whenever its operands are
available and then replacing that internal node by the
relation that results from executing the operation.
� The order of execution of operations starts at the leaf
nodes, which represents the input database relations for
the query, and ends at the root node, which represents the
final operation of the query
� SELECT LNAME FROM EMPLOYEE, WORKS_ON,
PROJECT WHERE PNAME='AQUARIUS' AND
PNUMBER=PNO AND ESSN=SSN AND BDATE >
'1957-12-31';

You might also like