Chapter Two Query Processing (2)
Chapter Two Query Processing (2)
theta-join
Equal-Join: is a special case of condition join where the
condition c contains only equalities.
Syntax: R1⋈equality conditionR2. Result schema similar to cross-
product, but only one copy of fields for which equality is
specified.
Natural Join: Join on all common fields.
Translating SQL Queries into
Relational Algebra
SQL is the query language that is used in most
commercial RDBMSs
SQL Query is first translated into an equivalent
extended relational algebra expression-
represented as a query tree data structure that is
then optimized
SQL queries are decomposed into query blocks
The basic unit that can be translated into the algebraic
operators and optimized.
A query block contains a single SELECT-FROM-
WHERE expression, as well as GROUP BY and
HAVING clause if these are part of the block.
Nested queries within a query are identified as
separate query blocks.
Aggregate operators in SQL must be included in
the extended algebra( MAX, MIN, SUM, COUNT).
Department of Computer Science 15
Translating SQL Queries into
Relational Algebra
Consider the following SQL query
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > ( SELECT MAX (Salary)
FROM
EMPLOYEE
WHERE Dno=5 );
This query retrieves the names of employees
(from any department in the company) who earn
a salary that is greater than the highest salary in
department 5
The query includes a nested subquery and hence
would be decomposed into two blocks
Department of Computer Science 16
Translating SQL Queries into
Relational Algebra
The inner block is:
( SELECT MAX (Salary)
FROM EMPLOYEE
WHERE Dno=5 );
This query retrieves the highest salary in
department 5.
The outer query block is:
Na Balan
CID ANO ANO
me ce
C01 A01 Raj A01 3000
Name
Meet
ΠName ( σBalance<2500 (Account Customer) )
Jay
Department of Computer Science 20
Cont’d…
Combined selection operation can be
divided into sequence of individual selections.
This transformation is called cascade of σ.
Example:
Customer
CID
AN Na
O me
Balan
ce σANO<3 Λ Balance<2000 Output
C01 1 Raj 3000 (Customer)
AN Na Balan
OUT CID
O me ce
C02 2 Meet 1000
PUT C02 2 Meet 1000
C03 3 Jay 2000
Customer
CID
AN Na
O me
Balan
ce σANO<3 (σBalance<2000 Output
C01 1 Raj 3000 (Customer))
AN Na Balan
OUT CID
O me ce
C02 2 Meet 1000
PUT C02 2 Meet 1000
C03 3 Jay 2000
16 16 3 16
2
21 21 16 19
3
3 3 21 21
7
2 merg merg 24
14
2 2 e
7 e 31
create 7 7 pass- 16 pass-
14 runs 1 2 33
21
14 14
initial sorted
relation runs runs output
Department of Computer Science 24
Algorithms for SELECT
Operation
There are many algorithms for executing a
SELECT operation, which is basically a search
operation to locate the records in a disk file that
satisfy a certain condition
Examples:
• (OP1): σSsn='123456789' (EMPLOYEE)
• (OP2): σDNUMBER>5(DEPARTMENT)
• (OP3): σDno=5(EMPLOYEE)
• (OP4): σDno=5 AND SALARY>30000 AND
SEX=‘F’(EMPLOYEE)
• (OP5): σESSN=‘123456789’ AND
PNO=10(WORKS_ON)
(Worst case)
1. with depositor as outer relation
No. of blocks access = ndepositor * bcustomer
+ bdepositor
= 5000 * 400 + 100
= 2000100
2. with customer as outer relation
No. of blocks access = ncustomer * bdepositor
+ bcustomer
= 10000 * 100 + 400
= 1000400
σBalance<2500 (customer)
(account)
(a) Initial
(canonical) query
tree for SQL query
Q.
(b) Moving
SELECT
operations down
the query tree.
(e) Moving
PROJECT
operations down
the query tree
Issues
Cost function
Number of execution strategies to be
considered
Explanation:
Suppose that we had a constraint on the
database schema that stated that no employee
can earn more than his or her direct supervisor.
If the semantic query optimizer checks for the
existence of this constraint, it need not execute
the query at all because it knows that the result
of the query will be empty. Techniques known as
theorem proving can be used for this purpose
Department of Computer Science 60