Query Processing and Optimization
Query Processing and Optimization
Definitions
• Query processing
– translation of query into low-level activities
– evaluation of query
– data extraction
• Query optimization
– selecting the most efficient query evaluation
optimizer
evaluation
output evaluation plan
engine
data
data data statistics
Advanced Databases Query processing and optimization 4
Relational Algebra (1/2)
• Query language
• Operations:
– select: σ
– project: π
– union:
– difference: -
– product: x
– join:
loop
σname=Paul
student course
cid; hash join
i=0;
repeat
read M pages of relation R into memory
sort the M pages
write them into file Ri
increment i
until no more pages
N=i // number of runs
//assuming N < M
allocate a page for each run file Ri // N pages allocated
read a page Pi of each Ri
repeat
choose first record (in sort order) among N pages, say from page Pj
write record to output and delete from page Pj
if page is empty read next page Pj’ from Rj
until all pages are empty
Advanced Databases Query processing and optimization 22
Projection
• πΑ1,Α2… (R)
• remove unwanted attributes
– scan and drop attributes
• remove duplicate records
– sort resulting records using all attributes as sort order
– scan sorted result, eliminate duplicates (adjucent)
• cost
– initial scan + sorting + final scan
R0 S0
R1 S1
R . . S
. .
. .
Rn-1 Sn-1
Advanced Databases Query processing and optimization 31
Exercise: joins
• R S
• NR=215
• BR = 100
• NS=26
• BS = 30
• B+ index on S
– order 4
– full nodes
• nested loop join: best case - worst case
• block nested loop join: best case - worst case
• indexed nested loop join
σcoursename=Advanced DBs
courseid; index-nested
loop
course
cid; hash join
student takes
σcoursename=Advanced DBs
courseid; index-nested
loop
course
cid; hash join
student takes
ccourseid; index-nested
loop
course
cid; hash join
student takes
courseid
pipelined materialized
R S
cid
σcoursename=Advanced DBs
courseid; index-nested
loop
course
cid; hash join
student takes
Advanced Databases Query processing and optimization 40
Size Estimation (2/2)
• RxS
– NR * NS
• R S
– R S = : NR* NS
– R S key for R: maximum output size is Ns
– R S foreign key for R: NS
– R S = {A}, neither key of R nor S
• NR*NS / V(A,S)
• NS*NR / V(A,R)
•
combining selection with join and product
– σθ1(R x S) = R θ1 S
• commutativity of joins
– R θ1 S=S θ1 R
• distribution of selection over join
– σθ1^θ2(R S) = σθ1(R) σθ2 (S)
• distribution of projection over join
– πA1,A2(R S) = πA1(R) πA2 (S)
• associativity of joins: R (S T) = (R S) T
Advanced Databases Query processing and optimization 42
Cost Optimizer (1/2)
• transforms expressions
– equivalent expressions
– heuristics, rules of thumb
• perform selections early
• perform projections early
• replace products followed by selection σ (R x S) with joins R S
• start with joins, selections with smallest result
– create left-deep join trees
πname
πname
σcoursename=Advanced DBs
loop loop
σcoursenam =
course
cid; hash join cid; hash join
Advanced DBs