0% found this document useful (0 votes)
0 views56 pages

ADB Course Chapter_4.1 Query Execution

Uploaded by

chebl6001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views56 pages

ADB Course Chapter_4.1 Query Execution

Uploaded by

chebl6001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Query Execution

Overview

Relational operators

Execution methods

CS 245 31
Query Execution Overview
Recall that one of our key principles in data
intensive systems was declarative APIs
» Specify what you want to compute, not how

We saw how these can translate into many


storage strategies

How to execute queries in a declarative API?

CS 245 32
Query Execution Overview
Query representation
(e.g. SQL)

Logical query plan


(e.g. relational algebra)

Many execution
Optimized logical plan methods: per-record
exec, vectorization,
compilation
Physical plan
(code/operators to run)
CS 245 33
Plan Optimization Methods
Rule-based: systematically replace some
expressions with other expressions
» Replace X OR TRUE with TRUE
» Replace M*A + M*B with M*(A+B) for matrices

Cost-based: propose several execution plans


and pick best based on a cost model

Adaptive: update execution plan at runtime

CS 245 34
Execution Methods
Interpretation: walk through query plan
operators for each record

Vectorization: walk through in batches

Compilation: generate code (like System R)

CS 245 35
Typical RDBMS Execution
SQL query

parse
parse tree
result
convert
logical query plan
execute
apply rules statistics
Pi
“improved” l.q.p
pick best
estimate result sizes
l.q.p. +sizes {(P1,C1), (P2,C2), ...}

consider physical plans estimate costs

{P1, P2, …}
CS 245
36
Query Execution
Overview

Relational operators

Execution methods

CS 245 37
The Relational Algebra
Collection of operators over tables (relations)
» Each table has named attributes (fields)

Codd’s original RA: tables are sets of tuples


(unordered and tuples cannot repeat)

SQL’s RA: tables are bags (multisets) of


tuples; unordered but each tuple may repeat

CS 245 38
Relational Algebra Operators
Basic set operators:

Intersection: R ∩ S

Union: R ∪ S for tables with same schema

Difference: R – S

Cartesian Product: R ⨯ S { (r, s) | r ∈ R, s ∈ S }

CS 245 39
Relational Algebra Operators
Basic set operators:

Intersection: R ∩ S
consider both distinct (set union)
Union: R ∪ S and non-distinct (bag union)

Difference: R – S

Cartesian Product: R ⨯ S

CS 245 40
Relational Algebra Operators
Special query processing operators:

Selection: σcondition(R) { r ∈ R | condition(r) is true }

Projection: Pexpressions(R) { expressions(r) | r ∈ R }

Natural Join: R ⨝ S { (r, s) ∈ R ⨯ S) | r.key = s.key }


where key is the common fields

CS 245 41
Relational Algebra Operators
Special query processing operators:

Aggregation: keysGagg(attr)(R) SELECT agg(attr)


FROM R
GROUP BY keys

Examples: departmentGMax(salary)(Employees)

GMax(salary)(Employees)

CS 245 42
Algebraic Properties
Many properties about which combinations
of operators are equivalent
» That’s why it’s called an algebra!

CS 245 43
Properties: Unions, Products
and Joins
Tuple order in a relation
doesn’t matter (unordered)
R∪S=S∪R
R ∪ (S ∪ T) = (R ∪ S) ∪ T
Attribute order in a relation
R⨯S=S⨯R doesn’t matter either

(R ⨯ S) ⨯ T = R ⨯ (S ⨯ T)

R⨝S=S⨝R
(R ⨝ S) ⨝ T = R ⨝ (S ⨝ T)
CS 245 44
Properties: Selects
σp∧q(R) =

σp∨q(R) =

CS 245 45
Properties: Selects
σp∧q(R) = σp(σq(R))

σp∨q(R) = σp(R) ∪ σq(R)

careful with repeated elements

CS 245 46
Bags vs. Sets
R = {a,a,b,b,b,c}

S = {b,b,c,c,d}

R∪S=?

CS 245 47
Bags vs. Sets
R = {a,a,b,b,b,c}

S = {b,b,c,c,d}

R∪S=?

• Option 1: SUM of counts


R ∪ S = {a,a,b,b,b,b,b,c,c,c,d}
• Option 2: MAX of counts
R ∪ S = {a,a,b,b,b,c,c,d}
CS 245 48
Executive Decision
Use “SUM” option for bag unions

Some rules that work for set unions cannot


be used for bags

CS 245 49
Properties: Project
Let: X = set of attributes
Y = set of attributes

PX∪Y (R) =

CS 245 50
Properties: Project
Let: X = set of attributes
Y = set of attributes

PX∪Y (R) = PX(PY(R))

CS 245 51
Properties: Project
Let: X = set of attributes
Y = set of attributes

PX∪Y (R) = PX(PY(R))

CS 245 52
Properties: σ + ⨝
Let p = predicate with only R attribs

q = predicate with only S attribs

m = predicate with only R, S attribs

σp(R ⨝ S) =

σq(R ⨝ S) =
CS 245 53
Properties: σ + ⨝
Let p = predicate with only R attribs

q = predicate with only S attribs

m = predicate with only R, S attribs

σp(R ⨝ S) = σp(R) ⨝ S

σq(R ⨝ S) = R ⨝ σq(S)
CS 245 54
Properties: σ + ⨝
Some rules can be derived:

σp∧q(R ⨝ S) =

σp∧q∧m(R ⨝ S) =

σp∨q(R ⨝ S) =

CS 245 55
Properties: σ + ⨝
Some rules can be derived:

σp∧q(R ⨝ S) = σp(R) ⨝ σq(S)

σp∧q∧m(R ⨝ S) = σm(σp(R) ⨝ σq(S))

σp∨q(R ⨝ S) = (σp(R) ⨝ S) ∪ (R ⨝ σq(S))

CS 245 56
Prove One, Others for Practice
σp∧q(R ⨝ S) = σp (σq(R ⨝ S))

= σp (R ⨝ σq(S))

= σp (R) ⨝ σq(S)

CS 245 57
Properties: P + σ
Let x = subset of R attributes

z = attributes in predicate p
(subset of R attributes)

Px(σp (R)) =

CS 245 58
Properties: P + σ
Let x = subset of R attributes

z = attributes in predicate p
(subset of R attributes)

Px(σp (R)) = σp(Px(R))

CS 245 59
Properties: P + σ
Let x = subset of R attributes

z = attributes in predicate p
(subset of R attributes)

Px(σp (R)) = Px(σp(Px∪z(R)))

CS 245 60
Properties: P + ⨝
Let x = subset of R attributes
y = subset of S attributes
z = intersection of R,S attributes

Px∪y(R ⨝ S) = Px∪y ((Px∪z (R)) ⨝ (Py∪z (S)))

CS 245 61
Typical RDBMS Execution
SQL query

parse
parse tree
result
convert
logical query plan
execute
apply rules statistics
Pi
“improved” l.q.p
pick best
estimate result sizes
l.q.p. +sizes {(P1,C1), (P2,C2), ...}

consider physical plans estimate costs

{P1, P2, …}
CS 245
62
Example SQL Query
SELECT title
FROM StarsIn
WHERE starName IN (
SELECT name
FROM MovieStar
WHERE birthdate LIKE ‘%1960’
);

(Find the movies with stars born in 1960)

CS 245 63
Parse Tree <Query>

<SFW>

SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> <Tuple> IN <Query>

title StarsIn <Attribute> ( <Query> )

starName <SFW>

SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> <Attribute> LIKE <Pattern>

name MovieStar birthDate ‘%1960’

CS 245
64
Logical Query Plan

Ptitle
sstarName=name
´
StarsIn Pname
sbirthdate LIKE ‘%1960’
MovieStar
CS 245 65
Improved Logical Query Plan
Ptitle Question:
Push Ptitle
starName=name to StarsIn?

StarsIn Pname
sbirthdate LIKE ‘%1960’
MovieStar
CS 245
66
Estimate Result Sizes

Need expected size

StarsIn P
s

MovieStar

CS 245 67
One Physical Plan

Parameters: join order,


Hash join
memory size, project attributes, ...
H
Parameters:
Seq scan Index scan
select condition, ...

StarsIn MovieStar

CS 245
68
Another Physical Plan

Parameters: join order,


Hash join
memory size, project attributes, ...
H
Parameters:
Index scan Seq scan
select condition, ...

StarsIn MovieStar

CS 245
69
Another Physical Plan

Sort-merge join

Seq scan Seq scan

StarsIn MovieStar

Which plan is likely to be better?


CS 245
70
Estimating Plan Costs
Logical plan

Physical plan
P1 P2 … Pn
candidates

C1 C2 … Cn

Pick best!
Covered in next few lectures!
CS 245 71
Query Execution
Overview

Relational operators

Execution methods

CS 245 72
Now That We Have a Plan,
How Do We Run it?
Several different options that trade between
complexity, setup time & performance

CS 245 73
Example: Simple Query
SELECT quantity * price
FROM orders
WHERE productId = 75

Pquanity*price (σproductId=75 (orders))

CS 245 74
Method 1: Interpretation
interface Operator { interface Expression {
Tuple next(); Value compute(Tuple in);
} }

class TableScan: Operator { class Attribute: Expression {


String tableName; String name;
} }

class Select: Operator { class Times: Expression {


Operator parent; Expression left, right;
Expression condition; }
}
class Equals: Expression {
class Project: Operator { Expression left, right;
Operator parent; }
Expression[] exprs;
}
CS 245 75
Example Expression Classes
class Attribute: Expression {
String name;
probably better to use a
numeric field ID instead
Value compute(Tuple in) {
return in.getField(name);
}
}

class Times: Expression {


Expression left, right;

Value compute(Tuple in) {


return left.compute(in) * right.compute(in);
}
}

CS 245 76
Example Operator Classes
class TableScan: Operator {
String tableName;

Tuple next() {
// read & return next record from file
}
}

class Project: Operator {


Operator parent;
Expression[] exprs;

Tuple next() {
tuple = parent.next();
fields = [expr.compute(tuple) for expr in exprs];
return new Tuple(fields);
}
}

CS 245 77
Running Our Query with
Interpretation
ops = Project(
expr = Times(Attr(“quantity”), Attr(“price”)),
parent = Select(
expr = Equals(Attr(“productId”), Literal(75)),
parent = TableScan(“orders”)
)
);
recursively calls Operator.next()
while(true) {
Tuple t = ops.next(); and Expression.compute()
if (t != null) {
out.write(t);
} else {
break;
Pros & cons of this
} approach?
}
CS 245 78
Method 2: Vectorization
Interpreting query plans one record at a time
is simple, but it’s too slow
» Lots of virtual function calls and branches for
each record (recall Jeff Dean’s numbers)

Keep recursive interpretation, but make


Operators and Expressions run on batches

CS 245 79
Implementing Vectorization
class TupleBatch { class ValueBatch {
// Efficient storage, e.g. // Efficient storage
// schema + column arrays }
}
interface Expression {
interface Operator { ValueBatch compute(
TupleBatch next(); TupleBatch in);
} }

class Select: Operator { class Times: Expression {


Operator parent; Expression left, right;
Expression condition; }
}

... ...

CS 245 80
Typical Implementation
Values stored in columnar arrays (e.g. int[])
with a separate bit array to mark nulls

Tuple batches fit in L1 or L2 cache

Operators use SIMD instructions to update


both values and null fields without branching

CS 245 81
Pros & Cons of Vectorization
+ Faster than record-at-a-time if the query
processes many records

+ Relatively simple to implement

– Lots of nulls in batches if query is selective

– Data travels between CPU & cache a lot

CS 245 82
Method 3: Compilation
Turn the query into executable code

CS 245 83
Compilation Example
Pquanity*price (σproductId=75 (orders))

generated class with the right


class MyQuery { field types for orders table
void run() {
Iterator<OrdersTuple> in = openTable(“orders”);
for(OrdersTuple t: in) {
if (t.productId == 75) {
out.write(Tuple(t.quantity * t.price));
}
}
}
} Can also theoretically generate
CS 245
vectorized code
84
Pros & Cons of Compilation
+ Potential to get fastest possible execution

+ Leverage existing work in compilers

– Complex to implement

– Compilation takes time

– Generated code may not match hand-written

CS 245 85
What’s Used Today?
Depends on context & other bottlenecks

Transactional databases (e.g. MySQL):


mostly record-at-a-time interpretation

Analytical systems (Vertica, Spark SQL):


vectorization, sometimes compilation

ML libs (TensorFlow): mostly vectorization


(the records are vectors!), some compilation
CS 245 86

You might also like