0% found this document useful (0 votes)
56 views

17 Query Processing PDF

The document discusses query processing and optimization. It outlines the query processing methodology which includes query decomposition, translation, optimization, execution plan generation, and runtime processing. It describes basic operations like selection, projection, join and their costs. It also discusses generating alternative access plans, selecting the most efficient plan, and transforming queries into internal forms.

Uploaded by

aleckchimsimbe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

17 Query Processing PDF

The document discusses query processing and optimization. It outlines the query processing methodology which includes query decomposition, translation, optimization, execution plan generation, and runtime processing. It describes basic operations like selection, projection, join and their costs. It also discusses generating alternative access plans, selecting the most efficient plan, and transforming queries into internal forms.

Uploaded by

aleckchimsimbe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

QUERY PROCESSING

& OPTIMIZATION

CHAPTER 19 (6/E)
CHAPTER 15 (5/E)
LECTURE OUTLINE
 Query Processing Methodology
 Basic Operations and Their Costs
 Generation of Execution Plans

2
QUERY PROCESSING IN A DDBMS

high level user query

query
processor

Low-level data manipulation


commands for D-DBMS

3
SELECTING ALTERNATIVES

SELECT ENAME
FROM EMP,ASG
WHERE EMP.ENO = ASG.ENO
AND ASG.RESP = "Manager"

Strategy 1
ENAME(RESP=“Manager”EMP.ENO=ASG.ENO(EMP×ASG))
Strategy 2
 ENAME(EMP ⋈ENO (RESP=“Manager” (ASG))

Strategy 2 avoids Cartesian product, so may be “better”

4
PICTORIALLY

Strategy 1 Strategy 2

ENAME ENAME

ASG.RESP=‘Manager’EMP.ENO=ASG.ENO ⋈ ENO

× ASG.RESP=‘Manager’

EMP ASG EMP ASG

5
QUERY PROCESSING METHODOLOGY
Query in high-level language SQL

Query
• check SQL syntax
Decomposition & Translation • check existence of
relations and
Algebraic Query attributes
• generate alternative • replace views by
access plans, i.e., their definitions
procedure, for Query Optimization • transform query
processing the query into an internal
• select an efficient form
Query Execution Plan
access plan

Query Code Generator

Code to execute query

Runtime Processor

Query Result

6
EXAMPLE
• Scan the Vendor table, select all tuples
where Vno = [1000, 2000], eliminate
attributes other than Vno and Vname, and
place the result in a temporary relation R1
SELECT V.Vno, Vname,
count(*), sum(Amount) • Join the tables R1 and Transaction,
FROM Vendor V, eliminate attributes other than Vno, Vname,
Transaction T
WHERE V.Vno = T.Vno and Amount, and place the result in a
AND V.Vno between 1000 temporary relation R2. This may involve:
and 2000 • sorting R1 on Vno
GROUP BY V.Vno, Vname • sorting Transaction on Vno
HAVING sum(Amount) > 100 • merging the two sorted relations to produce R2
• Perform grouping on R2, and place the
result in a temporary relation R3. This may
involve:
• sorting R2 on Vno and Vname
• grouping tuples with identical values of Vno and
Vname
• counting the number of tuples in each group, and
adding their Amounts
• Scan R3, select all tuples with

7
sum(Amount) > 100 to produce the result.
EXAMPLE

Scan
(Sum(Amount) > 100)
SELECT V.Vno, Vname, count(*),
sum(Amount)
FROM Vendor V, Transaction T
Grouping
WHERE V.Vno = T.Vno
(Vno, Vname)
AND V.Vno between 1000 and
2000
GROUP BY V.Vno, Vname
Join
HAVING sum(Amount) > 100
(V.Vno = T.Vno)

Scan
(Vno between 1000 and 2000)

Vendor Transaction

8
QUERY OPTIMIZATION ISSUES
 Determining the “shape” of the execution plan
• Order of execution
 Determining which how each “node” in the plan should be executed
• Operator implementations
 These are interdependent and an optimizer would do both in
generating the execution plan

9
“SHAPE ” OF THE EXECUTION PLAN
 Finding query trees that are “equivalent”
• Produce the same result – provably
 These are based on the transformation (equivalence) rules
 Commutativity of selection
• p1(A1)(p2(A2)R)  p2(A2)(p1(A1)R)
 Commutativity of binary operations
• R×SS×R
• R ⋈S  S ⋈R
• RSSR
• R∩SS∩R
 Associativity of binary operations
• ( R × S) × T  R × (S × T)
• (R ⋈S) ⋈T  R ⋈ (S ⋈T)
• (R  S)  T  (S  R)  T
 Cascading of unary operations
• A”( A’ (R))   A’(R) where R[A] and A'  A, A"  A and A'  A"
• p1(A1)(p2(A2)(R))  p1(A1)p2(A2)(R)

10
OTHER TRANSFORMATION RULES
 Commuting selection with projection
• B (p(A) R)  p(A)(B R) (where B  A)

 Commuting selection with binary operations


• p(A)(R × S)  (p(A) (R)) × S (where A belongs to R only)
• p(A )(R ⋈(A ,B )S)  (p(A ) (R)) ⋈(A ,B )S (where Ai belongs to R only)
i j k i j k
• p(A )(R  S)  p(A ) (R)  p(A ) (S) (where Ai belongs to R and S)
i i i
• p(A )(R ∩ S)  p(A ) (R) ∩ p(A ) (s) (where Ai belongs to R and S)
i i i
 Commuting projection with binary operations
• C(R × S)  A’(R) × B’(S)
• C(R ⋈(A ,B )S)  A’(R) ⋈(A ,B ) B’(S)
j k j k
• C(R  S)  C(R)  C(S)
• C(R ∩ S)  C(R) ∩ C(S)
where R[A] and S[B]; C = A'  B' where A'  A, B'  B

11
EXAMPLE TRANSFORMATION
E.ENAME Project
Find the names of employees other
than J. Doe who worked on the
CAD/CAM project for either one or G.DUR=12  G.DUR=24
two years.
P.PNAME=‘CAD/CAM’ Select
SELECT ENAME
FROM PROJ P, ASG G, EMP E
WHERE G.ENO=E.ENO E.ENAME<>‘J. DOE’
AND G.PNO=P.PNO
AND E.ENAME <> 'J. Doe' ⋈PNO
AND P.PNAME='CAD/CAM'
AND (G.DUR=12 OR
G.DUR=24) ⋈ENO Join

12
PROJ ASG EMP
EQUIVALENT QUERY

E.ENAME

P.PNAME=‘CAD/CAM’  (G.DUR=12  G.DUR=24) E.ENAME<>‘J. Doe’

⋈PNO,ENO

EMP PROJ ASG

13
ANOTHER EQUIVALENT QUERY
ENAME

⋈PNO

PNO,ENAME

⋈EN
O
PNO PNO,ENO PNO,ENAME

PNAME = "CAD/CAM" DUR =12DUR=24 ENAME ≠ "J. Doe"

PROJ ASG EMP

14
CLICKER QUESTION #36
 Is the right query plan equivalent to the left query plan?
E.ENAME

G.DUR=12  G.DUR=24 E.ENAME

P.PNAME=‘CAD/CAM’ ⋈EN
O

E.ENAME<>‘J. DOE’
⋈PNO
⋈PNO

⋈EN P.PNAME=‘CAD/CAM’ G.DUR=12  G.DUR=24 E.ENAME<>‘J. DOE’


O

PROJ ASG EMP PROJ ASG EMP

(a) Yes
(b) No

15
IMPORTANT PROBLEM – JOIN ORDER
 Assume you have
R ⋈S ⋈T ⋈W

Linear Join Tree Bushy Join Tree



⋈ ⋈
W

⋈ T ⋈ ⋈
R S R S T T
 Most systems implement linear join trees
• Left-linear

16
JOIN ORDERING
 Even with left-linear, how do you know which order?
• Assume natural join over common attributes

⋈ ⋈ ⋈
⋈ W ⋈ T ⋈ T

⋈ T ⋈ W ⋈ W …

R S R S R S

17
SOME OPERATOR IMPLEMENTATIONS
 Tuple Selection
• without an index
• with a clustered index
• with an unclustered index
• with multiple indices
 Projection
 Joining
• nested loop join
• sort-merge join
• and others...
 Grouping and Duplicate Elimination
• by sorting
• by hashing
 Sorting

18
EXAMPLE – JOIN ALGORITHMS
SELECT C.Cnum, A.Balance
FROM Customer C, Accounts A
WHERE C.Cnum = A.Cnum

 Nested loop join:

for each tuple c in Customer do


for each tuple a in Accounts do
if c.Cnum = a.Cnum then
output c.Cnum,a.Balance
end
end

19
EXAMPLE – JOIN ALGORITHMS (2)
SELECT C.Cnum, A.Balance
FROM Customer C, Accounts A
WHERE C.Cnum = A.Cnum

 Index join:

for each tuple c in Customer do


use the index to find Accounts tuples a
where a.Cnum matches c.Cnum
if there are any such tuples a then
output c.Cnum, a.Balance
end
end
 Sort-merge join:

sort Customer and Accounts on Cnum


merge the resulting sorted relations

20
COMPLEXITY OF OPERATORS
 Assume
• Relations of cardinality n
• Sequential scan Operation Complexity

Select
Project O(n)
(without duplicate elimination)
Project
(with duplicate elimination) O(n  log n)
Group
Join
Semi-join O(n  log n)
Division
Set Operators
Cartesian Product O(n2)

21
COST OF PLANS
 Alternative access plans may be compared according to cost.
 The cost of an access plan is the sum of the costs of its component
operations.
 There are many possible cost metrics. However, most metrics
reflect the amounts of system resources consumed by the access
plan. System resources may include:
• disk block I/O’s
• processing time
• network bandwidth

22
LECTURE SUMMARY
 Query processing methodology
 Basic query operations and their costs
 Generation of execution plans

23

You might also like