Algorithms For Query Processing and Optimization
Algorithms For Query Processing and Optimization
stored on disk that do not fit entirely in main memory, such as most database files.
Sort-Merge strategy:
Starts by sorting small subfiles (runs) of the main file and then merges the sorted
runs, creating larger sorted subfiles that are merged in turn.
Sorting phase:
nR = (b/nB)
Merging phase:
dM = Min (nB-1, nR)
nP = (logdM(nR))
Parameters
b: number of file blocks;
nB: available buffer space (in blocks);
nR: number of initial runs;
dM: degree of merging;
nP: number of passes.
External Sort-Merge
Let M denote memory size (in pages).
1. Create sorted runs. Let i be 0 initially.
Repeatedly do the following till the end of the relation:
(a) Read M blocks of relation into memory
(b) Sort the in-memory blocks
(c) Write sorted data to run Ri; increment i.
Let the final value of i be N
2. Merge the runs
External Sort-Merge (Cont.)
2. Merge the runs (N-way merge). We assume (for now) that N < M.
1. Use N blocks of memory to buffer input runs, and 1 block to buffer output. Read the first
block of each run into its buffer page
2. repeat
1. Select the first record (in sort order) among all buffer pages
2. Write the record to the output buffer. If the output buffer is full write it to disk.
3. Delete the record from its input buffer page.
If the buffer page becomes empty then
read the next block (if any) of the run into the buffer.
3. until all input buffer pages are empty:
External Sort-Merge (Cont.)
• If N M, several merge passes are required.
• In each pass, contiguous groups of M - 1 runs are merged.
• A pass reduces the number of runs by a factor of M -1, and creates
runs longer by the same factor.
• E.g. If M=11, and there are 90 runs, one pass reduces the
number of runs to 9, each 10 times the size of the initial runs
• Repeated passes are performed till all runs have been merged into
one.
Example: External Sorting Using Sort-Merge
Algorith
ms for
External
Sorting
Algorithms for SELECT
Examples:
(OP1): SSN='123456789' (EMPLOYEE)
(OP2): DNUMBER>5(DEPARTMENT)
(OP3): DNO=5(EMPLOYEE)
(OP4): DNO=5 AND SALARY>30000 AND SEX=F(EMPLOYEE)
In this method, the records of each file are scanned only once each for matching
with the other file—unless both A and B are non-key attributes, in which case the
method needs to be modified slightly.
Algorithm for SORT MERGE JOIN
Example
Algorithm for
SORT
MERGE JOIN
Algorithms for JOIN
Operations
Methods for implementing joins:
J4 Hash-join:
The records of files R and S are both hashed to the same hash file, using
the same hashing function on the join attributes A of R and B of S as hash
keys.
A simple hash join is simply a two-phase algorithm that:
Build phase: A single pass through the file with fewer records (say, R)
Query graph:
A graph data structure that corresponds to a relational calculus expression. It
does not indicate an order on which operations to perform first. There is only a
single graph corresponding to each query.
Using Heuristics in Query Optimization
Example:
For every project located in ‘Stafford’, retrieve the project number, the controlling
department number and the department manager’s last name, address and
birthdate.
Relation algebra:
PNUMBER, DNUM, LNAME, ADDRESS, BDATE
(((PLOCATION=‘STAFFORD’(PROJECT))
DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))
SQL query:
Q2: SELECT P.NUMBER,P.DNUM,E.LNAME,
E.ADDRESS, E.BDATE
FROM PROJECT AS P,DEPARTMENT AS D,
EMPLOYEE AS E
WHERE P.DNUM=D.DNUMBER AND
D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’;
Using Heuristics in Query
Optimization̤
Q: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = ‘AQUARIUS’ AND
PNMUBER=PNO AND ESSN=SSN
AND BDATE > ‘1957-12-31’;
Using
Heuristics in
Query
Optimization
Using
Heuristics in
Query
Optimization
Using Heuristics in
Query Optimization
Example….
Optimization Steps
Using Heuristics in Query Optimization
General Transformation Rules for Relational Algebra Operations:
1. Cascade of : A conjunctive selection condition can be broken up into a cascade (sequence) of
individual operations:
c1 AND c2 AND ... AND cn(R) = c1 (c2 (...(cn(R))...) )
2. Commutativity of : The operation is commutative:
c1 (c2(R)) = c2 (c1(R))
3. Cascade of : In a cascade (sequence) of operations, all but the last one can be ignored:
List1 (List2 (...(Listn(R))...) ) = List1(R)
4. Commuting with : If the selection condition c involves only the attributes A1, ..., An in the
projection list, the two operations can be commuted:
A1, A2, ..., An (c (R)) = c (A1, A2, ..., An (R))
General Transformation Rules for Relational
Algebra Operations (contd.):
L ( R
C S) = (A1, ..., An (R)) C ( B1, ..., Bm (S))
If the join condition C contains additional attributes not in
L, these must be added to the projection list, and a final
operation is needed.
Using Heuristics in Query
Optimization
General Transformation Rules for Relational Algebra
Operations (contd.):
8. Commutativity of set operations: The set operations υ and
∩ are commutative but “–” is not.
9. Associativity of , x, υ, and ∩ : These four operations are
individually associative; that is, if stands for any one of these four
operations (throughout the expression), we have
(RS) T = R(ST )
10.Commuting with set operations: The operation commutes
with υ , ∩ , and –. If stands for any one of these three
operations, we have
c ( R S ) = (c (R)) (c (S))
Using Heuristics in Query
Optimization
General Transformation Rules for Relational Algebra
Operations (contd.):
The operation commutes with υ.
L ( R υ S ) = (L (R)) υ (L (S))
Other transformations
Using Heuristics in Query
Optimization
Outline of a Heuristic Algebraic Optimization Algorithm:
1. Using rule 1, break up any select operations with conjunctive conditions
into a cascade of select operations.
2. Using rules 2, 4, 6, and 10 concerning the commutativity of select with other
operations, move each select operation as far down the query tree as is
permitted by the attributes involved in the select condition.
3. Using rule 9 concerning associativity of binary operations, rearrange the leaf nodes
of the tree so that the leaf node relations with the most restrictive select operations
are executed first in the query tree representation.
4. Using Rule 12, combine a Cartesian product operation with a subsequent
select operation in the tree into a join operation.
5. Using rules 3, 4, 7, and 11 concerning the cascading of project and the commuting
of project with other operations, break down and move lists of projection attributes
down the tree as far as possible by creating new project operations as needed.
6. Identify subtrees that represent groups of operations that can be executed by
a single algorithm.
Using Heuristics in Query
Optimization (15)
Summary of Heuristics for Algebraic Optimization:
1. The main heuristic is to apply first the operations that reduce
the size of intermediate results.
2. Perform select operations as early as possible to reduce the number
of tuples and perform project operations as early as possible to
reduce the number of attributes. (This is done by moving select and
project operations as far down the tree as possible.)
3. The select and join operations that are most restrictive should be
executed before other similar operations. (This is done by
reordering the leaf nodes of the tree among themselves and
adjusting the rest of the tree appropriately.)
Using Selectivity and Cost Estimates in
Query Optimization
Cost Components for Query Execution
1. Access cost to secondary storage
2. Storage cost
3. Computation cost
4. Memory usage cost
5. Communication cost
stated that no employee can earn more than his or her direct
supervisor. If the semantic query optimizer checks for the
existence of this constraint, it need not execute the query at all
because it knows that the result of the query will be empty.