17 Query Processing PDF
17 Query Processing PDF
& OPTIMIZATION
CHAPTER 19 (6/E)
CHAPTER 15 (5/E)
LECTURE OUTLINE
Query Processing Methodology
Basic Operations and Their Costs
Generation of Execution Plans
2
QUERY PROCESSING IN A DDBMS
query
processor
3
SELECTING ALTERNATIVES
SELECT ENAME
FROM EMP,ASG
WHERE EMP.ENO = ASG.ENO
AND ASG.RESP = "Manager"
Strategy 1
ENAME(RESP=“Manager”EMP.ENO=ASG.ENO(EMP×ASG))
Strategy 2
ENAME(EMP ⋈ENO (RESP=“Manager” (ASG))
4
PICTORIALLY
Strategy 1 Strategy 2
ENAME ENAME
ASG.RESP=‘Manager’EMP.ENO=ASG.ENO ⋈ ENO
× ASG.RESP=‘Manager’
5
QUERY PROCESSING METHODOLOGY
Query in high-level language SQL
Query
• check SQL syntax
Decomposition & Translation • check existence of
relations and
Algebraic Query attributes
• generate alternative • replace views by
access plans, i.e., their definitions
procedure, for Query Optimization • transform query
processing the query into an internal
• select an efficient form
Query Execution Plan
access plan
Runtime Processor
Query Result
6
EXAMPLE
• Scan the Vendor table, select all tuples
where Vno = [1000, 2000], eliminate
attributes other than Vno and Vname, and
place the result in a temporary relation R1
SELECT V.Vno, Vname,
count(*), sum(Amount) • Join the tables R1 and Transaction,
FROM Vendor V, eliminate attributes other than Vno, Vname,
Transaction T
WHERE V.Vno = T.Vno and Amount, and place the result in a
AND V.Vno between 1000 temporary relation R2. This may involve:
and 2000 • sorting R1 on Vno
GROUP BY V.Vno, Vname • sorting Transaction on Vno
HAVING sum(Amount) > 100 • merging the two sorted relations to produce R2
• Perform grouping on R2, and place the
result in a temporary relation R3. This may
involve:
• sorting R2 on Vno and Vname
• grouping tuples with identical values of Vno and
Vname
• counting the number of tuples in each group, and
adding their Amounts
• Scan R3, select all tuples with
7
sum(Amount) > 100 to produce the result.
EXAMPLE
Scan
(Sum(Amount) > 100)
SELECT V.Vno, Vname, count(*),
sum(Amount)
FROM Vendor V, Transaction T
Grouping
WHERE V.Vno = T.Vno
(Vno, Vname)
AND V.Vno between 1000 and
2000
GROUP BY V.Vno, Vname
Join
HAVING sum(Amount) > 100
(V.Vno = T.Vno)
Scan
(Vno between 1000 and 2000)
Vendor Transaction
8
QUERY OPTIMIZATION ISSUES
Determining the “shape” of the execution plan
• Order of execution
Determining which how each “node” in the plan should be executed
• Operator implementations
These are interdependent and an optimizer would do both in
generating the execution plan
9
“SHAPE ” OF THE EXECUTION PLAN
Finding query trees that are “equivalent”
• Produce the same result – provably
These are based on the transformation (equivalence) rules
Commutativity of selection
• p1(A1)(p2(A2)R) p2(A2)(p1(A1)R)
Commutativity of binary operations
• R×SS×R
• R ⋈S S ⋈R
• RSSR
• R∩SS∩R
Associativity of binary operations
• ( R × S) × T R × (S × T)
• (R ⋈S) ⋈T R ⋈ (S ⋈T)
• (R S) T (S R) T
Cascading of unary operations
• A”( A’ (R)) A’(R) where R[A] and A' A, A" A and A' A"
• p1(A1)(p2(A2)(R)) p1(A1)p2(A2)(R)
10
OTHER TRANSFORMATION RULES
Commuting selection with projection
• B (p(A) R) p(A)(B R) (where B A)
11
EXAMPLE TRANSFORMATION
E.ENAME Project
Find the names of employees other
than J. Doe who worked on the
CAD/CAM project for either one or G.DUR=12 G.DUR=24
two years.
P.PNAME=‘CAD/CAM’ Select
SELECT ENAME
FROM PROJ P, ASG G, EMP E
WHERE G.ENO=E.ENO E.ENAME<>‘J. DOE’
AND G.PNO=P.PNO
AND E.ENAME <> 'J. Doe' ⋈PNO
AND P.PNAME='CAD/CAM'
AND (G.DUR=12 OR
G.DUR=24) ⋈ENO Join
12
PROJ ASG EMP
EQUIVALENT QUERY
E.ENAME
⋈PNO,ENO
13
ANOTHER EQUIVALENT QUERY
ENAME
⋈PNO
PNO,ENAME
⋈EN
O
PNO PNO,ENO PNO,ENAME
14
CLICKER QUESTION #36
Is the right query plan equivalent to the left query plan?
E.ENAME
P.PNAME=‘CAD/CAM’ ⋈EN
O
E.ENAME<>‘J. DOE’
⋈PNO
⋈PNO
(a) Yes
(b) No
15
IMPORTANT PROBLEM – JOIN ORDER
Assume you have
R ⋈S ⋈T ⋈W
⋈ T ⋈ ⋈
R S R S T T
Most systems implement linear join trees
• Left-linear
16
JOIN ORDERING
Even with left-linear, how do you know which order?
• Assume natural join over common attributes
⋈ ⋈ ⋈
⋈ W ⋈ T ⋈ T
⋈ T ⋈ W ⋈ W …
R S R S R S
17
SOME OPERATOR IMPLEMENTATIONS
Tuple Selection
• without an index
• with a clustered index
• with an unclustered index
• with multiple indices
Projection
Joining
• nested loop join
• sort-merge join
• and others...
Grouping and Duplicate Elimination
• by sorting
• by hashing
Sorting
18
EXAMPLE – JOIN ALGORITHMS
SELECT C.Cnum, A.Balance
FROM Customer C, Accounts A
WHERE C.Cnum = A.Cnum
19
EXAMPLE – JOIN ALGORITHMS (2)
SELECT C.Cnum, A.Balance
FROM Customer C, Accounts A
WHERE C.Cnum = A.Cnum
Index join:
20
COMPLEXITY OF OPERATORS
Assume
• Relations of cardinality n
• Sequential scan Operation Complexity
Select
Project O(n)
(without duplicate elimination)
Project
(with duplicate elimination) O(n log n)
Group
Join
Semi-join O(n log n)
Division
Set Operators
Cartesian Product O(n2)
21
COST OF PLANS
Alternative access plans may be compared according to cost.
The cost of an access plan is the sum of the costs of its component
operations.
There are many possible cost metrics. However, most metrics
reflect the amounts of system resources consumed by the access
plan. System resources may include:
• disk block I/O’s
• processing time
• network bandwidth
22
LECTURE SUMMARY
Query processing methodology
Basic query operations and their costs
Generation of execution plans
23