15-QueryOptimization
15-QueryOptimization
Query optimization
DB
MG
1
DBMS Architecture
SQL INSTRUCTION
OPTIMIZER
CONCURRENCY CONTROL
MANAGEMENT OF ACCESS
METHODS
Index Files
System DATABASE
Catalog
Data Files
2
DB
MG
Query optimizer
3
DB
MG
Query optimizer
4
DB
MG
Query optimizer
SQL
QUERY
LEXICAL, SYNTACTIC
AND SEMANTIC
ANALYSIS
5
DB
MG
Lexical, syntactic and semantic analysis
6
DB
MG
Lexical, syntactic and semantic analysis
Output
Internal representation in (extended) relational
algebra
Why relational algebra?
It explicitly represents the order in which operators
are applied
It is procedural (different from SQL)
There is a corpus of theorems and properties
exploited to modify the initial query tree
7
DB
MG
Query optimizer
SQL
QUERY
LEXICAL, SYNTACTIC
AND SEMANTIC DATA
ANALYSIS DICTIONARY
INTERNAL REPRESENTATION
BASED ON RELATIONAL ALGEBRA
ALGEBRAIC
OPTIMIZATION
8
DB
MG
Algebraic optimization
9
DB
MG
Query optimizer
SQL
QUERY
LEXICAL, SYNTACTIC
AND SEMANTIC DATA
ANALYSIS DICTIONARY
INTERNAL REPRESENTATION
BASED ON RELATIONAL ALGEBRA
ALGEBRAIC
OPTIMIZATION
COST BASED
OPTIMIZATION
10
DB
MG
Cost based optimization
11
DB
MG
Cost based optimization
Output
Access program in executable format
It exploits the internal structures of the DBMS
Set of dependencies
conditions on which the validity of the query plan
depends
e.g., the existence of an index
12
DB
MG
Query optimizer
SQL
QUERY
LEXICAL, SYNTACTIC
AND SEMANTIC DATA
ANALYSIS DICTIONARY
INTERNAL REPRESENTATION
BASED ON RELATIONAL ALGEBRA
ALGEBRAIC
OPTIMIZATION
13
DB
MG
ACCESS PROGRAM SET OF DEPENDENCIES
Execution modes
Compile and go
Compilation and immediate execution of the
statement
No storage of the query plan
Dependencies are not needed
14
DB
MG
Execution modes
15
DB
MG
Database Management Systems
Algebraic optimization
DB
MG
16
Algebraic optimization
SQL
QUERY
LEXICAL, SYNTACTIC
AND SEMANTIC DATA
ANALYSIS DICTIONARY
INTERNAL REPRESENTATION
BASED ON RELATIONAL ALGEBRA
ALGEBRAIC
OPTIMIZATION
17
DB
MG
ACCESS PROGRAM SET OF DEPENDENCIES
Algebraic optimization
18
DB
MG
Transformations
1. Atomization of selection
sF1 Ʌ F2 (E) ≡ sF2 (sF1 (E)) ≡ sF1 (sF2 (E))
19
DB
MG
Transformations
1. Atomization of selection
sF1 Ʌ F2 (E) ≡ sF2 (sF1 (E)) ≡ sF1 (sF2 (E))
2. Cascading projections
pX(E) ≡ pX (pX,Y(E))
20
DB
MG
Transformations
1. Atomization of selection
sF1 Ʌ F2 (E) ≡ sF2 (sF1 (E)) ≡ sF1 (sF2 (E))
2. Cascading projections
pX(E) ≡ pX (pX,Y(E))
3. Anticipation of selection with respect to join
(pushing selection down)
sF (E1 E2) ≡ E1 (sF (E2))
F is a predicate on attributes in E2 only
21
DB
MG
Transformations
22
DB
MG
Transformations
23
DB
MG
Transformations
24
DB
MG
Transformations
25
DB
MG
Transformations
26
DB
MG
Transformations
27
DB
MG
Transformations
28
DB
MG
Transformations
9. Other properties
sF1 V F2(E) ≡ (sF1 (E)) (sF2 (E))
sF1 Ʌ F2(E) ≡ (sF1 (E)) (sF2 (E))
29
DB
MG
Transformations
30
DB
MG
Example
Tables
EMP (Emp#, ………, Dept#, Salary)
DEPT (Dept#, DName,……………)
SQL query
31
DB
MG
Example: Algebraic transformations
32
DB
MG
Example: Algebraic transformations
Prop #1
33
DB
MG
Example: Algebraic transformations
Prop #1
Prop #5
34
DB
MG
Example: Algebraic transformations
Prop #3
35
DB
MG
Example: Algebraic transformations
Prop #3
Prop #2 and #4
36
DB
MG
Example: Query tree
pDept# pDept#,DName
sSalary>1000 DEPT
EMP
37
DB
MG
Example: Cardinalities
38
DB
MG
Database Management Systems
DB
MG
39
Cost based optimization
SQL
QUERY
LEXICAL, SYNTACTIC
AND SEMANTIC DATA
ANALYSIS DICTIONARY
INTERNAL REPRESENTATION
BASED ON RELATIONAL ALGEBRA
ALGEBRAIC
OPTIMIZATION
40
DB
MG
ACCESS PROGRAM SET OF DEPENDENCIES
Cost based optimization
It is based on
Data profiles
statistical information describing data distribution for
tables and intermediate relational expressions
Approximate cost formulas for access operations
Allow evaluating the cost of different alternatives for
executing a relational operator
41
DB
MG
Database Management Systems
Data profiles
DB
MG
42
Table profiles
43
DB
MG
Table profiles
44
DB
MG
Data profiles
45
DB
MG
Database Management Systems
Access operators
DB
MG
46
Query tree
pDept# pDept#,DName
sSalary>1000 DEPT
EMP
47
DB
MG
Query tree
48
DB
MG
Sequential scan
49
DB
MG
Sorting
50
DB
MG
Predicate evaluation
51
DB
MG
B+-tree versus bitmap
Bitmap VS B-Tree
600
500
Disk space (MB)
400
300
200
100
0
0 5 10 15 20 25 30 35 40 45
NK
B-Tree Bitmap
B-tree NRLen(Pointer)
Bitmap NR NK 1 bit Courtesy of Golfarelli, Rizzi,
”Data warehouse, teoria e
Len(Pointer) = 48 bit pratica della progettazione”,
McGraw Hill 2006 52
DB
MG
Predicate evaluation
53
DB
MG
Example: Predicate evaluation
5 F Y Piemonte 0 1 1
1 1 0
0 0 0
0 0 0
1 1 1
DB RID 5 54
MG
Predicate evaluation
55
DB
MG
Join operation
56
DB
MG
Nested loop
a a
internal a
or direct scan
join
attribute 57
DB
MG
Nested loop
58
DB
MG
Nested loop
Efficient when
inner table is small and fits in memory
optimized scan
join attribute in the inner table is indexed
index scan
Execution cost
The nested loop join technique is not symmetric
The execution cost depends on which table takes
the role of inner table
59
DB
MG
Merge scan
right
Left table left scan Right table
scan
A A
a a
b
d
b
b
c
e
e
join
attribute 60
DB
MG
Merge scan
j j
p z
Join
62
DB
MG
Attribute
Hash join
63
DB
MG
Bitmapped join index
65
DB
MG
Bitmapped join
66
DB
MG
Example: Bitmapped join
67
DB
MG
Bitmapped join
2 0 … 1 0 0 1 1
3 0 … 0 1 0 OR 1 = 1
4 1 … 0 0 0 0 0
… … … … … 1 0 1
… … …
68
DB
MG
Bitmapped join
69
DB
MG
Group by
Sort based
Sort on the group by attributes
Next compute aggregate functions on groups
Hash based
Hash function on the group by attributes
Next sort each bucket and compute aggregate
functions
Materialized views may be exploited to improve
the performance of aggregation operations
70
DB
MG
Database Management Systems
DB
MG
71
Cost based optimization
Inputs
Data profiles
Internal representation of the query tree
Output
“Optimal” query execution plan
Set of dependencies
It evaluates the cost of different alternatives for
reading each table
executing each relational operator
It exploits approximate cost formulas for access
operations
72
DB
MG
General approach to optimization
73
DB
MG
General approach to optimization
74
DB
MG
Example
Given 3 tables
R, S, T
Compute the join
R S T
Execution alternatives
4 join techniques to evaluate (for both joins)
3 join orders
In total, at most
4 * 4 * 3 = 48 different alternatives
75
DB
MG
Example
R 1
S 2
T
R 1
S 2
T S 1
T 2
R R 1
T 2
S
R INNER S INNER
T INNER T OUTER
76
DB
MG LEAF NODE
Best execution plan selection
77
DB
MG
Best execution plan selection
78
DB
MG