ADB Course Chapter_4.1 Query Execution
ADB Course Chapter_4.1 Query Execution
Overview
Relational operators
Execution methods
CS 245 31
Query Execution Overview
Recall that one of our key principles in data
intensive systems was declarative APIs
» Specify what you want to compute, not how
CS 245 32
Query Execution Overview
Query representation
(e.g. SQL)
Many execution
Optimized logical plan methods: per-record
exec, vectorization,
compilation
Physical plan
(code/operators to run)
CS 245 33
Plan Optimization Methods
Rule-based: systematically replace some
expressions with other expressions
» Replace X OR TRUE with TRUE
» Replace M*A + M*B with M*(A+B) for matrices
CS 245 34
Execution Methods
Interpretation: walk through query plan
operators for each record
CS 245 35
Typical RDBMS Execution
SQL query
parse
parse tree
result
convert
logical query plan
execute
apply rules statistics
Pi
“improved” l.q.p
pick best
estimate result sizes
l.q.p. +sizes {(P1,C1), (P2,C2), ...}
{P1, P2, …}
CS 245
36
Query Execution
Overview
Relational operators
Execution methods
CS 245 37
The Relational Algebra
Collection of operators over tables (relations)
» Each table has named attributes (fields)
CS 245 38
Relational Algebra Operators
Basic set operators:
Intersection: R ∩ S
Difference: R – S
CS 245 39
Relational Algebra Operators
Basic set operators:
Intersection: R ∩ S
consider both distinct (set union)
Union: R ∪ S and non-distinct (bag union)
Difference: R – S
Cartesian Product: R ⨯ S
CS 245 40
Relational Algebra Operators
Special query processing operators:
CS 245 41
Relational Algebra Operators
Special query processing operators:
Examples: departmentGMax(salary)(Employees)
GMax(salary)(Employees)
CS 245 42
Algebraic Properties
Many properties about which combinations
of operators are equivalent
» That’s why it’s called an algebra!
CS 245 43
Properties: Unions, Products
and Joins
Tuple order in a relation
doesn’t matter (unordered)
R∪S=S∪R
R ∪ (S ∪ T) = (R ∪ S) ∪ T
Attribute order in a relation
R⨯S=S⨯R doesn’t matter either
(R ⨯ S) ⨯ T = R ⨯ (S ⨯ T)
R⨝S=S⨝R
(R ⨝ S) ⨝ T = R ⨝ (S ⨝ T)
CS 245 44
Properties: Selects
σp∧q(R) =
σp∨q(R) =
CS 245 45
Properties: Selects
σp∧q(R) = σp(σq(R))
CS 245 46
Bags vs. Sets
R = {a,a,b,b,b,c}
S = {b,b,c,c,d}
R∪S=?
CS 245 47
Bags vs. Sets
R = {a,a,b,b,b,c}
S = {b,b,c,c,d}
R∪S=?
CS 245 49
Properties: Project
Let: X = set of attributes
Y = set of attributes
PX∪Y (R) =
CS 245 50
Properties: Project
Let: X = set of attributes
Y = set of attributes
CS 245 51
Properties: Project
Let: X = set of attributes
Y = set of attributes
CS 245 52
Properties: σ + ⨝
Let p = predicate with only R attribs
σp(R ⨝ S) =
σq(R ⨝ S) =
CS 245 53
Properties: σ + ⨝
Let p = predicate with only R attribs
σp(R ⨝ S) = σp(R) ⨝ S
σq(R ⨝ S) = R ⨝ σq(S)
CS 245 54
Properties: σ + ⨝
Some rules can be derived:
σp∧q(R ⨝ S) =
σp∧q∧m(R ⨝ S) =
σp∨q(R ⨝ S) =
CS 245 55
Properties: σ + ⨝
Some rules can be derived:
CS 245 56
Prove One, Others for Practice
σp∧q(R ⨝ S) = σp (σq(R ⨝ S))
= σp (R ⨝ σq(S))
= σp (R) ⨝ σq(S)
CS 245 57
Properties: P + σ
Let x = subset of R attributes
z = attributes in predicate p
(subset of R attributes)
Px(σp (R)) =
CS 245 58
Properties: P + σ
Let x = subset of R attributes
z = attributes in predicate p
(subset of R attributes)
CS 245 59
Properties: P + σ
Let x = subset of R attributes
z = attributes in predicate p
(subset of R attributes)
CS 245 60
Properties: P + ⨝
Let x = subset of R attributes
y = subset of S attributes
z = intersection of R,S attributes
CS 245 61
Typical RDBMS Execution
SQL query
parse
parse tree
result
convert
logical query plan
execute
apply rules statistics
Pi
“improved” l.q.p
pick best
estimate result sizes
l.q.p. +sizes {(P1,C1), (P2,C2), ...}
{P1, P2, …}
CS 245
62
Example SQL Query
SELECT title
FROM StarsIn
WHERE starName IN (
SELECT name
FROM MovieStar
WHERE birthdate LIKE ‘%1960’
);
CS 245 63
Parse Tree <Query>
<SFW>
starName <SFW>
CS 245
64
Logical Query Plan
Ptitle
sstarName=name
´
StarsIn Pname
sbirthdate LIKE ‘%1960’
MovieStar
CS 245 65
Improved Logical Query Plan
Ptitle Question:
Push Ptitle
starName=name to StarsIn?
StarsIn Pname
sbirthdate LIKE ‘%1960’
MovieStar
CS 245
66
Estimate Result Sizes
StarsIn P
s
MovieStar
CS 245 67
One Physical Plan
StarsIn MovieStar
CS 245
68
Another Physical Plan
StarsIn MovieStar
CS 245
69
Another Physical Plan
Sort-merge join
StarsIn MovieStar
Physical plan
P1 P2 … Pn
candidates
C1 C2 … Cn
Pick best!
Covered in next few lectures!
CS 245 71
Query Execution
Overview
Relational operators
Execution methods
CS 245 72
Now That We Have a Plan,
How Do We Run it?
Several different options that trade between
complexity, setup time & performance
CS 245 73
Example: Simple Query
SELECT quantity * price
FROM orders
WHERE productId = 75
CS 245 74
Method 1: Interpretation
interface Operator { interface Expression {
Tuple next(); Value compute(Tuple in);
} }
CS 245 76
Example Operator Classes
class TableScan: Operator {
String tableName;
Tuple next() {
// read & return next record from file
}
}
Tuple next() {
tuple = parent.next();
fields = [expr.compute(tuple) for expr in exprs];
return new Tuple(fields);
}
}
CS 245 77
Running Our Query with
Interpretation
ops = Project(
expr = Times(Attr(“quantity”), Attr(“price”)),
parent = Select(
expr = Equals(Attr(“productId”), Literal(75)),
parent = TableScan(“orders”)
)
);
recursively calls Operator.next()
while(true) {
Tuple t = ops.next(); and Expression.compute()
if (t != null) {
out.write(t);
} else {
break;
Pros & cons of this
} approach?
}
CS 245 78
Method 2: Vectorization
Interpreting query plans one record at a time
is simple, but it’s too slow
» Lots of virtual function calls and branches for
each record (recall Jeff Dean’s numbers)
CS 245 79
Implementing Vectorization
class TupleBatch { class ValueBatch {
// Efficient storage, e.g. // Efficient storage
// schema + column arrays }
}
interface Expression {
interface Operator { ValueBatch compute(
TupleBatch next(); TupleBatch in);
} }
... ...
CS 245 80
Typical Implementation
Values stored in columnar arrays (e.g. int[])
with a separate bit array to mark nulls
CS 245 81
Pros & Cons of Vectorization
+ Faster than record-at-a-time if the query
processes many records
CS 245 82
Method 3: Compilation
Turn the query into executable code
CS 245 83
Compilation Example
Pquanity*price (σproductId=75 (orders))
– Complex to implement
CS 245 85
What’s Used Today?
Depends on context & other bottlenecks