07.overview of Query Processing
07.overview of Query Processing
Autumn, 2008
Chapter 7
Overview of Query
Processing
1 Distributed Database Systems
SQL: Non-Procedural Language of RDB
Tuple calculus
{ t | F(t) } where:
t : tuple variable
F(t) : well formed formula
Example
Get the No. and name of all managers
2 Distributed Database Systems
( ) ( ) { } " " | , MANAGER TITLE t EMP t ENAME ENO t = . e
SQL: Non-Procedural Language of RDB
Domain calculus
where:
x
i
: domain variables
: well formed formula
Example
{ x, y | E(x, y, "manager") }
3 Distributed Database Systems
( ) { } , , , | , , ,
2 1 2 1 n n
x x x F x x x
( )
n
x x x F , , ,
2 1
Variables are position sensitive!
SQL: Non-Procedural Language of RDB
SQL is a tuple calculus language
SELECT ENO,ENAME
FROM EMP
WHERE TITLE=manager
4 Distributed Database Systems
End user uses non-procedural languages
to express queries.
Query Processor
Query processor transforms queries into
procedural operations to access data
5 Distributed Database Systems
Query Processor
Distributed query processor has to deal
with
query decomposition, and
data localization
6 Distributed Database Systems
7.1 Query Processing Problems
Distributed Database Systems 7
7.1 Query Processing Problems
Centralized query processor must
transform calculus query into
algebra operation, and
choose the best execution plan
Example:
SELECT ENAME
FROM E,G
WHERE E.ENO = G.ENO
AND RESP=manager
8 Distributed Database Systems
7.1 Query Processing Problems
Relational Algebra 1
Relational Algebra 2
9 Distributed Database Systems
( ) ( ) G E
Manager RESP ENO ENAME " " =
o t
( ) ( ) G E
ENO G ENO E Manager RESP ENAME
= . = . . " "
o t
Execution plan 2 is better for consuming
less resources!
7.1 Query Processing Problems
In DDB, the query processor must
consider the communication cost and
select the best site!
Same query as last example, but G and E
are distributed.
Simple plan:
To transport all segments to query site and
execute there. This causes too much network
traffic, very costly.
10 Distributed Database Systems
7.1 Query Processing Problems
Distributed Query Example
Distribution of E and G
11 Distributed Database Systems
7.1 Query Processing Problems
Distributed Query Example
Query
12 Distributed Database Systems
( ) ( ) G E
Manager REPSP ENO ENAME " " =
o t
7.1 Query Processing Problems
Distributed Query Example
Optimized Processing
13 Distributed Database Systems
7.2 Objectives of Query Processing
Distributed Database Systems 14
7.2 Objectives of Query Processing
Two-fold objectives:
Transformation, and
Optimization
15 Distributed Database Systems
7.2 Objectives of Query Processing
Cost to be considered for optimization:
CPU time
I/O time, and
Communication time
16 Distributed Database Systems
WAN: the last cost is dominant
LAN: all three are equal
7.3 Complexity of Relational Algebra Operations
Distributed Database Systems 17
7.3 Complexity of Relational Algebra Operations
Measured by n (cardinality) and tuples are
sorted on comparison attributes
Distributed Database Systems 18
O(n)
O(nlogn)
O(nlogn)
O(n
2
)
) duplicates (with , t o
GROUP ), duplicates (with t
, , , ,