0% found this document useful (0 votes)
116 views23 pages

Advanced Database Management Systems: Query Processing: Chapter 1

This document discusses query processing in advanced database management systems. It covers topics like query parsing, optimization, execution plans, and cost estimation. The optimizer determines efficient execution plans by transforming query expressions and applying rules and heuristics. It considers available indexes, operator implementations, and cost estimates to select low-cost plans. Examples show transforming a sample query through different optimization steps like pushing down selections and projections early.

Uploaded by

Yaikob Kebede
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views23 pages

Advanced Database Management Systems: Query Processing: Chapter 1

This document discusses query processing in advanced database management systems. It covers topics like query parsing, optimization, execution plans, and cost estimation. The optimizer determines efficient execution plans by transforming query expressions and applying rules and heuristics. It considers available indexes, operator implementations, and cost estimates to select low-cost plans. Examples show transforming a sample query through different optimization steps like pushing down selections and projections early.

Uploaded by

Yaikob Kebede
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

Advanced Database Management

Systems

Query Processing: Chapter 1


Database Instance and Available Indexes:

Examples from: Principles of Database Systems by Greg Riccardi, Addison Wesley, 2001
select * from Customer
where accountId = 101
select * from Customer
where accountId >= 101 and accountId < 300
select * from Customer where lastName < 'D'
Scanning,
SQL Expression Tree
Parsing,
Query (algebra)
Validating

Optimizer
Execution
Query Code Plan
Generator

Query DB Runtime
Result
Code Processor
Check syntax
 Select * from Eployees having salary >1000
Check Schema Elements
 Attributes, relations … used
Converts SQL to RA expression
Query optimization determines
the most efficient (or sufficiently efficient)
process for executing the query
Optimization reorganizes
an expression tree for a query
using algebraic transformation rules
Information used:
 implementation techniques for algebra operators
 algebraic transformation rules
 heuristic rules
 cost estimates
Commutative: σ ⋈ x ⋃ ⋂
Associative: ⋈ x ⋃ ⋂

Cascades of select:
 σ c1 AND c2 AND … AND cn (R) ≡ σc1(σc2( … σcn(R))…))
 selects can then be commuted

Cascades of project:
 πL1(πL2( … πLn(R))…)) ≡ πL1(R)
 projects cannot be commuted
Commuting select and project:

 πA1,A2,…,An(σc(R)) ≡ σc(πA1,A2,…,An(R))

valid when selection condition only involves


attributes in projection list

Converting select/cross-product into join:

 σc(R x S) ≡ R ⋈c S
Commuting select and join or cross-product:

 σc(R ⋈ S) ≡ σc(R) ⋈ S
valid when selection condition involves only attributes in R

Commuting project with join or cross-product:

 πL(R ⋈c S) ≡ (πA1,A2,…,An(R)) ⋈c (πB1,B2,…,Bn(S))


where A1,…,An ⊆ L are attributes of R,
and B1,…,Bn ⊆ L are attributes of S,
and A1,…,An, B1,…,Bn are involved in C
(slightly more complicated if C contains attributes not in L)
Break up conjunctive select conditions
into a cascade of selects
 more flexibility in moving selects

Push selects as early as possible,


using commutativity of select with other operators
Reorder sub-expressions such
that most restrictive selects are done first
Combine select/cross-product into join

Push projects as early as possible


 keep only necessary attributes
Enumerate the query plans
 Apply algebraic transformations & heuristics
to get all reasonable trees

Estimate the cost of each plan


 Account for size of relations, available indexes, information
about file layout, info about value distributions …

Choose the best (fastest) plan


select e.lname, e.fname, w.pno, w.hours
from employee e, works_on w
where e.ssn = w.essn and w.hours > 20;

πe.lname, e.fname, w.pno, w.hours

σe.ssn=w.essn AND w.hours > 20

Employee e works_on w
πe.lname, e.fname, w.pno, w.hours

Heuristic: σw.hours > 20


Conjunctive Note: selects
select  cascade could commute
of selects
σe.ssn=w.essn

Employee e works_on w
πe.lname, e.fname, w.pno, w.hours

Heuristic:
Combine select
σw.hours > 20
and cross  join

⋈e.ssn=w.essn

Employee e works_on w
πe.lname, e.fname, w.pno, w.hours

Heuristic:
Push selects as
early as possible ⋈e.ssn=w.essn

Employee e σw.hours > 20

works_on w
πe.lname, e.fname, w.pno, w.hours

πe.ssn, e.lname, e.fname, w.essn, w.pno, w.hours


First cascade
Heuristic: projects to get
attributes needed
⋈e.ssn=w.essn
Push projects as
early as possible to push through
join

Employee e σw.hours > 20

works_on w
πe.lname, e.fname, w.pno, w.hours

Heuristic:
⋈e.ssn=w.essn
Push projects as Push project
early as possible through join

πw.essn, w.pno, w.hours


πe.ssn, e.lname, e.fname

Employee e
σw.hours > 20

works_on w
πe.lname, e.fname, w.pno, w.hours

Heuristic:
⋈e.ssn=w.essn
Push projects as Push project
early as possible through select

πe.ssn, e.lname, e.fname σw.hours > 20

Employee e πw.essn, w.pno, w.hours

works_on w
At this point we’ve generated
nine different query plans for the same query

Adding strategies/algorithms for implementing


individual operations would add
even more potential plans

Next Step: Estimate the cost of executing each


plan
Disk Access
Cpu cycle
Transit Time in network- distributed system
 CPU cost is difficult to calculate and cpu cost is insignificant
compared with Disk Access cost
 Consider only disk access
No. of seeks (N) –Random I/O cost < Sequential
I/O cost
 Cost =N*avg seek time
No. of blocks read
 Cost =N*avg block read cost
No. of blocks written
 Cost = N*avg block write cost

Cost of writing >> cost of reading because once


written it has to be read to check if it is written
correctly

You might also like