Introduction to Database Systems
CSE 444
Lecture 18: Query Processing Overview
CSE 444 - Spring 2009
Where We Are
• We are learning how a DBMS executes a query
– How come a DBMS can execute a query so fast?
• Lecture 15-16: Data storage, indexing, physical tuning
• Lecture 17: Relational algebra (we will finish it today)
• Lecture 18: Overview of query processing steps
– Includes a description of how queries are executed
• Lecture 19: Operator algorithms
• Lecture 20: Overview of query optimization
CSE 444 - Spring 2009 2
Outline for Today
• Steps involved in processing a query
– Logical query plan
– Physical query plan
– Query execution overview
• Readings: Section 15.1 of the book
– Query processing steps
– Query execution using the iterator model
– An introduction to next lecture on operator algos
CSE 444 - Spring 2009 3
Query Evaluation Steps
SQL query
Parse & Rewrite Query
Logical
Select Logical Plan
Query plan
optimization
Select Physical Plan
Physical
plan
Query Execution
Disk 4
Example Database Schema
Supplier(sno,sname,scity,sstate)
Part(pno,pname,psize,pcolor)
Supply(sno,pno,price)
View: Suppliers in Seattle
CREATE VIEW NearbySupp AS
SELECT sno, sname
FROM Supplier
WHERE scity='Seattle' AND sstate='WA'
CSE 444 - Spring 2009 5
Example Query
Find the names of all suppliers in Seattle
who supply part number 2
SELECT sname FROM NearbySupp
WHERE sno IN ( SELECT sno
FROM Supplies
WHERE pno = 2 )
CSE 444 - Spring 2009 6
Steps in Query Evaluation
• Step 0: Admission control
– User connects to the db with username, password
– User sends query in text format
• Step 1: Query parsing
– Parses query into an internal format
– Performs various checks using catalog
• Correctness, authorization, integrity constraints
• Step 2: Query rewrite
– View rewriting, flattening, etc.
CSE 444 - Spring 2009 7
Rewritten Version of Our Query
Original query:
SELECT sname
FROM NearbySupp
WHERE sno IN ( SELECT sno
FROM Supplies
WHERE pno = 2 )
Rewritten query:
SELECT S.sname
FROM Supplier S, Supplies U
WHERE S.scity='Seattle' AND S.sstate='WA’
AND S.sno = U.sno
AND U.pno = 2;
CSE 444 - Spring 2009 8
Continue with Query Evaluation
• Step 3: Query optimization
– Find an efficient query plan for executing the query
– We will spend a whole lecture on this topic
• A query plan is
– Logical query plan: an extended relational algebra tree
– Physical query plan: with additional annotations at each
node
• Access method to use for each relation
• Implementation to use for each relational operator
CSE 444 - Spring 2009 9
Extended Algebra Operators
• Union ∪, intersection ∩, difference -
• Selection σ
• Projection π
• Join
• Duplicate elimination δ
• Grouping and aggregation γ
• Sorting τ
• Rename ρ
CSE 444 - Spring 2009 10
Logical Query Plan
π sname
σ sscity=‘Seattle’ ∧sstate=‘WA’ ∧ pno=2
sno = sno
Suppliers Supplies
CSE 444 - Spring 2009 11
Query Block
• Most optimizers operate on individual query
blocks
• A query block is an SQL query with no nesting
– Exactly one
• SELECT clause
• FROM clause
– At most one
• WHERE clause
• GROUP BY clause
• HAVING clause
CSE 444 - Spring 2009 12
Typical Plan for Block (1/2)
...
π fields
σ selection condition
SELECT-PROJECT-JOIN
join condition
Query
join condition …
R S
CSE 444 - Spring 2009 13
Typical Plan For Block (2/2)
havingcondition
γ fields, sum/count/min/max(fields)
π fields
σ selection condition
join condition
… …
CSE 444 - Spring 2009 14
How about Subqueries?
SELECT Q.name
FROM Person Q
WHERE Q.age > 25
and not exists
SELECT *
FROM Purchase P
WHERE P.buyer = Q.name
and P.price > 100
CSE 444 - Spring 2009 15
How about Subqueries?
SELECT Q.name -
FROM Person Q
WHERE Q.age > 25
name
and not exists name
SELECT *
FROM Purchase P σ
WHERE P.buyer = Q.name Price > 100
and P.price > 100 σ
age>25
buyer=name
Person Purchase Person
CSE 444 - Spring 2009 16
Physical Query Plan
• Logical query plan with extra annotations
• Access path selection for each relation
– Use a file scan or use an index
• Implementation choice for each operator
• Scheduling decisions for operators
CSE 444 - Spring 2009 17
Physical Query Plan
(On the fly) π sname
(On the fly) σ sscity=‘Seattle’ ∧sstate=‘WA’ ∧ pno=2
(Nested loop)
sno = sno
Suppliers Supplies
(File scan) (File scan)
CSE 444 - Spring 2009 18
Final Step in Query Processing
• Step 4: Query execution
– How to synchronize operators?
– How to pass data between operators?
• Approach:
– Iterator interface with
– Pipelined execution or
– Intermediate result materialization
CSE 444 - Spring 2009 19
Iterator Interface
• Each operator implements iterator interface
• Interface has only three methods
• open()
– Initializes operator state
– Sets parameters such as selection condition
• get_next()
– Operator invokes get_next() recursively on its inputs
– Performs processing and produces an output tuple
• close(): cleans-up state
CSE 444 - Spring 2009 20
Pipelined Execution
• Applies parent operator to tuples directly as
they are produced by child operators
• Benefits
– No operator synchronization issues
– Saves cost of writing intermediate data to disk
– Saves cost of reading intermediate data from disk
– Good resource utilizations on single processor
• This approach is used whenever possible
CSE 444 - Spring 2009 21
Pipelined Execution
(On the fly) π sname
(On the fly) σ sscity=‘Seattle’ ∧sstate=‘WA’ ∧ pno=2
(Nested loop)
sno = sno
Suppliers Supplies
(File scan) (File scan)
CSE 444 - Spring 2009 22
Intermediate Tuple
Materialization
• Writes the results of an operator to an
intermediate table on disk
• No direct benefit but
• Necessary for some operator implementations
• When operator needs to examine the same
tuples multiple times
CSE 444 - Spring 2009 23
Intermediate Tuple Materialization
(On the fly) π sname
(Sort-merge join)
sno = sno
(Scan: write to T1) (Scan: write to T2)
σ sscity=‘Seattle’ ∧sstate=‘WA’ σ pno=2
Suppliers Supplies
(File scan) (File scan)
CSE 444 - Spring 2009 24
Next Time
• Algorithms for physical op. implementations
• How to find a good query plan?
CSE 444 - Spring 2009 25