05 Optimization
05 Optimization
Definitions
• Query processing
– translation of query into low-level activities
– evaluation of query
– data extraction (read data from file and implement operations)
• Query optimization
– selecting the most efficient query evaluation
2
Query Processing (1/2)
• SELECT * FROM student WHERE name=Paul
• Parse query and translate
– check syntax, verify names, etc
– translate into relational algebra (RDBMS)
– create evaluation plans (in other words: execution plan)
• Find best plan (optimization)
• Execute plan
3
Query Processing (2/2)
optimizer
evaluation
output evaluation plan
engine
data
data data statistics
4
Relational Algebra (1/2)
• Query language
• Operations:
– select: σ
– project: π
– union:
– difference: -
– product: x
– join:
• Extended relational algebra operations:
- duplicate elimination
- grouping and aggregation
- sorting
5
Relational Algebra (2/2)
• SELECT * FROM student WHERE name=Paul
– σname=Paul(student)
• πname( σcid<00112235(student) )
• πname(σcoursename=Advanced DBs((student cid takes) courseid course) )
6
Why Optimize?
• Many alternative options to evaluate a query
– πname(σcoursename=Advanced DBs((student cid takes) courseid course) )
– πname((student cid takes) courseid σcoursename=Advanced DBs(course)) )
• Several options to evaluate a single operation
– σname=Paul(student)
• scan file (read all data blocks of the table)
• use secondary index on student.name
• Multiple access paths
– access path: how can records be accessed
(using index; which index?; etc.)
7
Evaluation plans
• Specify which access path to follow
• Specify which algorithm to use to evaluate operator
• Specify how operators interleave πname
• Optimization:
– estimate the cost of each plan (not all plans)
– select plan with lowest estimated cost σcoursename=Advanced DBs l
loop
σname=Paul
student course
cid; hash join
8
Estimating Cost
• What needs to be considered:
– Disk I/Os
• sequential (reading neighboring pages is faster)
• random (reading pages in the order of requests)
– CPU time
– Network communication
• What are we going to consider:
– Disk I/Os
• page (data block) reads/writes
– Ignoring cost of writing final output
9
Operations and Costs (1/2)
• Operations: σ, π, , , -, x,
• Costs:
– NR: number of records in R (other notation: TR or T(R) -> tuple)
– LR: size of record in R (length of record -> L(R))
– bfR: blocking factor (other notation: FR or bf(R))
• number of records in a page (data block)
– BR: number of pages to store relation R (-> B(R))
– V(R, A): number of distinct values of attribute A in R
other notation: IA(R) (Image size) or V(A,R)
– SC(R,A): selection cardinality of A in R (number of matching rec.)
• A key: SC(R, A)=1
• A nonkey: SC(R, A)= TR / V(R,A) (uniform distribution assumption)
– HTi: number of levels in index I (-> height of tree)
– rounding up fractions and logarithms
10
Operations and Costs (2/2)
• relation takes(cid, courseid) [cid -> student id]
– 7000 tuples
– student cid 8 bytes
– course id 4 bytes -> L(R) = 8+4 = 12 bytes
– 40 courses
– 1000 students
– page size 512 bytes (data block size)
– output size (in pages) of query:
which students take the Advanced DBs course?
• Ttakes = 7000
• V(courseid, takes) = 40
• SC(courseid, takes)=ceil( Ttakes/V(courseid, takes) )=ceil(7000/40)=175
• bftakes = floor(512/12) = 42 Btakes = 7000/42 = 167 pages
• bfoutput = floor(512/8) = 64 Boutput = 175/64 = 3 pages
11
Cost of Selection σ (1/2)
• Linear search
– read all pages, find records that match (assuming equality search)
– average cost:
• nonkey (multiple occurences): BR, key: 0.5*BR (half in average)
• Binary search
– on ordered field
– average cost: log 2 BR m
• m additional pages to be read (first found then read the duplicates)
• m = ceil( SC(R,A)/bfR ) - 1
• Primary/Clustered
Index (B+ tree)
– average cost:
• single record: HTi + 1
• multiple records: HTi + ceil( SC(R,A)/bfR )
12
Cost of Selection σ (2/2)
• Secondary Index (B+ tree)
– average cost:
• key field: HTi + 1
• nonkey field
– worst case HTi + SC(A,R)
– linear search more desirable if many matching records !!!
13
Complex selection σexpr
• conjunctive selections: 1 2 ... n
– perform simple selection using θi with the lowest evaluation cost
• e.g. using an index corresponding to θi
• apply remaining conditions θ on the resulting records
• cid 00112233 courseid312 (takes)
•
cost: the cost of the simple selection on selected θ
– multiple indices
• select indices that correspond to θis
• scan indices and return RIDs (ROWID in Oracle)
• answer: intersection of RIDs
• cost: the sum of costs + record retrieval
• disjunctive selections: 1 2 ... n
– multiple indices
• union of RIDs
– linear search
14
Projection and set operations
• SELECT DISTINCT cid FROM takes
– π requires duplicate elimination
– sorting
• set operations require duplicate elimination
– RS
– RS
– sorting
15
Sorting
• efficient evaluation for many operations
• required by query:
– SELECT cid, name FROM student ORDER BY name
• implementations
– internal sorting (if records fit in memory)
– external sorting
(that’s why we need temporary space on disk)
16
External Sort-Merge Algorithm (1/3)
• Sort stage: create sorted runs
i=0;
repeat
read M pages of relation R into memory (M: size of Memory)
sort the M pages
write them into file Ri
increment i
until no more pages
N=i // number of runs
17
External Sort-Merge Algorithm (2/3)
• Merge stage: merge sorted runs
18
External Sort-Merge Algorithm (3/3)
• Merge stage: merge sorted runs
• What if N >= M ?
– perform multiple passes
– each pass merges M-1 runs until relation is processed
– in next pass number of runs is reduced
– final pass generated sorted output
19
Sort-Merge Example
pass run
a 12
d 95 a 12
d 95 a 12
a 12 R1 d 95
x 44 a 12 b 38
x 44 f 10
f 10 d 95
s 95 f 10 o 73
a 12 e 87
f 10 d 95 o 73 s 95
R2 f 10
o 73 a 12 s 95 x 44 pass
n 67
t 45 x 44 o 73
file n 67 e 87 b 38
memory s 95
e 87 n 67 e 87
R3 t 45
z 11 t 45 n 67
v 22
v 22 t 45
b 38 x 44
b 38 v 22
R4 v 22 z 11
z 11
z 11 20
Sort-Merge cost
• BR the number of pages of R
• Sort stage: 2 * BR
– read/write relation
• Merge stage:
BR
– initially runs to be merged
M
– each pass M-1 runs sorted
BR
– log
thus, total number of passes:
M 1
M
– at each pass 2 * BR pages are read/written
• read/write relation (BR + BR)
• apart from final write (B R)
• Total cost:
BR
– 2 * BR + 2 * BR * log
M 1
-
M BR eg. BR = 1000000, M=100
21
Projection
• πΑ1,Α2… (R)
• remove unwanted attributes
– scan and drop attributes
• remove duplicate records
– sort resulting records using all attributes as sort order
– scan sorted result, eliminate duplicates (adjacent)
• cost
– initial scan + sorting + final scan
22
Join
• πname(σcoursename=Advanced DBs((student cid takes) courseid course) )
• implementations
– nested loop join
– block-nested loop join
– indexed nested loop join
– sort-merge join
– hash join
23
Nested loop join (1/2)
• R S
24
Nested loop join (2/2)
• Costs:
– best case when smaller relation fits in memory
• We keep it in memory and read the other relation once.
• BR+BS
– worst case when memory can hold only one page of each relation
• We read S for each tuple in R
• TR * Bs + BR
You should use this formula only when the memory is very small (M=2) so
it can hold only 1 page from each relation, and the blocking factors are 1,
that is bfR= bfS=1. This means there is only 1 record/page.
25
Block nested loop join (1/2)
for each page XR of R
for each page XS of S
for each tuple tR in XR
for each tS in XS
if (tR tS match) output tR.tS
end
end
end
end
• 1 read operation reads several tuples (bfR > 1 and bfS > 1)
• We call it also nested loop join algorithm.
26
Block nested loop join (2/2)
• Costs:
– best case when smaller relation fits in memory
• We keep it in memory and read the other relation once.
• BR+BS
– worst case when memory holds one page of each relation
• We read S for each page in R
• BR * B s + B R
We should use this formula when the memory is very small (M=2) and can
hold only 1 page from each relation, but the blocking factors are > 1, so
bfR > 1 and bfS > 1.
27
Block nested loop join
(an improvement)
Memory size: M pages
for each M – 1 size chunk MR of R
for each page XS of S
for each tuple tR in MR
for each tS in XS
if (tR tS match) output tR.tS
end
end
end
end
• Costs:
– best case when smaller relation fits in memory
• We keep it in memory and read the other relation once.
• BR+BS
– general case
• We read S for each M-1 size chunk in R
• (BR / (M-1)) * Bs + BR
29
Indexed nested loop join
• R S
• Index on inner relation (S)
• for each tuple in outer relation (R) probe index of inner relation
• Costs:
– BR + T R * c
• c the cost of index-based selection of inner relation
c ≈ T(S)/V(S,A)
(if A is the join column and index is kept in memory)
30
Sort-merge join
• R S
• Relations sorted on the join attribute
• Merge sorted relations
– pointers to first record in each relation
– read in a group of records of S with the same values in the join
attribute
– read records of R and process
d D e 67
• Relations in sorted order to be read once
e E e 87
• Cost:
– cost of sorting + BS + BR x X n 11
v V v 22
z 38
31
Hash join
• R S
• use h1 on joining attribute to map records to partitions that fit in memory
– records of R are partitioned into R0… Rn-1
– records of S are partitioned into S0… Sn-1
• join records in corresponding partitions
– using a hash-based indexed block nested loop join
• Cost: 2*(BR+BS) + (BR+BS)
R0 S0
R1 S1
R . . S
. .
. .
Rn-1 Sn-1
32
Evaluation
• evaluate multiple operations in a plan
• materialization
• pipelining
πname
σcoursename=Advanced DBs
courseid; index-nested
loop
course
cid; hash join
student takes
33
Materialization
• create and read temporary relations
• create implies writing to disk
– more page writes
πname
σcoursename=Advanced DBs
courseid; index-nested
loop
course
cid; hash join
student takes
34
Pipelining (1/2)
• creating a pipeline of operations
• reduces number of read-write operations
• implementations
πname
– demand-driven - data pull
– producer-driven - data push
σcoursename=Advanced DBs
courseid; index-nested
loop
course
cid; hash join
student takes
35
Pipelining (2/2)
• can pipelining always be used?
• any algorithm? (no -> external sort-merge)
• cost of R S
– materialization and hash join: BR + 3(BR+BS)
– pipelining and indexed nested loop join: TR * HTi
courseid
pipelined materialized
R S
cid
σcoursename=Advanced DBs
36
Choosing evaluation plans
• cost based optimization
• enumeration of plans
– R S T, 12 possible plans (3! * 2)
(R S) T, R (S T)
• cost estimation of each plan
• overall cost
– cannot optimize operations independently
37
Cost estimation
• operation (σ, π, …)
• implementation
• size of inputs πname
• size of outputs
• sorting
σcoursename=Advanced DBs
courseid; index-nested
loop
course
cid; hash join
student takes
38
Expression Equivalence
• conjunctive selection decomposition
– (R) 1 ( 2 (R))
12
• commutativity of selection
– ( (R)) ( (R))
1 2 2 1
•
combining selection with join and product
– σθ1(R x S) = … R θ1 S=…
• commutativity of joins
– R θ1 S=S θ1 R
• distribution of selection over join
– σθ1^θ2(R S) = σθ1(R) σθ2 (S)
• distribution of projection over join
– πA1,A2(R S) = πA1(R) πA2 (S)
• associativity of joins: R (S T) = (R S) T
39
Cost Optimizer (1/2)
• transforms expressions
– equivalent expressions
– heuristics, rules of thumb
• perform selections early
• perform projections early
• replace products followed by selection σ (R x S) with joins R S
• start with joins, selections with smallest result
– create left-deep join trees
40
Cost Optimizer (2/2)
πname
πname
σcoursename=Advanced DBs
loop loop
σcoursename =
course
cid; hash join cid; hash join
Advanced DBs
41
Summary
• Estimating the cost of a single operation
• Estimating the cost of a query plan
• Optimization
– choose the most efficient plan
42
Table Types
Heap Clustered
• Type • Description
• Ordinary (heap- • Data is stored as an unordered
organized) table collection (heap).
• Partitioned table • Data is divided into smaller, more
manageable pieces.
• Index-organized • Data (including non-key values) is sorted and
stored in a B-tree index structure.
table (IOT)
• Clustered table • Related data from more than one table
are stored together.
Partitioned IOT
What Is a Partition
and Why Use It?
•A partition is:
– A piece of a “very large” table or index
– Stored in its own segment
– Used for improved performance and manageability
RANGE PARTITION
DBA_PART_TABLES
DBA_TAB_PARTITIONS
DBA_TAB_SUBPARTITIONS
HASH PARTITION, LIST PARTITION
Table access
by ROWID
Non-key columns
Key column
Row header
Index-Organized Tables
and Heap Tables
– Compared to heap tables, IOTs:
• Have faster key-based access to table data
• Do not duplicate the storage of primary key values
• Require less storage
• Use secondary indexes and logical row IDs
• Have higher availability because table reorganization does
not invalidate secondary indexes
Index-Organized Tables
W0824 10
102 ORD_DT CUST_CD
07-JAN-97 N45
ORD_NO ORD_DT CUST_CD PROD QTY
------ ------ ------ A2091 11
101 05-JAN-97 R01 G7830 20
102 07-JAN-97 N45
N9587 26
Index Hash
cluster cluster
Hash
function
Advanced Databases Query processing and optimization 54
Advanced Databases Query processing and optimization 55
Situations Where Clusters
Are Useful
DBA_CLUSTERS
DBA_CLU_COLUMNS
DBA_TABLES.CLUSTER_NAME ‘PERSONNEL’
HASH CLUSTER
CREATE CLUSTER personnel1
( department_number NUMBER )
SIZE 512 HASHKEYS 500
STORAGE (INITIAL 100K NEXT 50K);
DBA_CLUSTERS
DBA_CLU_COLUMNS
DBA_CLUSTER_HASH_EXPRESSIONS