0% found this document useful (0 votes)
15 views

Lecture09 Optimization Structural

The document discusses conjunctive queries (CQs) in databases, including definitions and examples of different types of CQs, extensions to CQs, and the complexity of query evaluation. It also covers semi-join reductions, which can optimize query evaluation by reducing the size of relations using semijoins, and the Yannakakis algorithm for efficiently evaluating acyclic CQs.

Uploaded by

Miranda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Lecture09 Optimization Structural

The document discusses conjunctive queries (CQs) in databases, including definitions and examples of different types of CQs, extensions to CQs, and the complexity of query evaluation. It also covers semi-join reductions, which can optimize query evaluation by reducing the size of relations using semijoins, and the Yannakakis algorithm for efficiently evaluating acyclic CQs.

Uploaded by

Miranda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CSE 544

Principles of Database
Management Systems

Fall 2016
Lecture 9 – Structural query optimization
Conjunctive Queries

•  Definition:
Q(X) :- R1(X1), R2(X2), ..., Rm(Xm)
•  Same as a single datalog rule
•  Terminology:
–  Atoms
–  Head variables
–  Existential variables

•  CQ = denotes the set of conjunctive queries

CSE 544 - Fall 2016 2


Examples

•  Example of CQ
q(x,y) = ∃z.(R(x,z) ∧ ∃u.(R(z,u) ∧ R(u,y)))

q(x) = ∃z.∃u.(R(x,z) ∧ R(z,u) ∧ R(u,y))


•  Examples of non-CQ:
q(x,y) = S(x,y)∧∀z.(R(x,z) à R(y,z))

q(x) = T(x) ∨ ∃z.S(x,z)

3
Types of CQ

•  Full CQ: head variables are all variables


Q(x,y,z,u) :- R(x,y),S(y,z),T(z,u)

•  Boolean CQ: no head variables


Q() :- R(x,y),S(y,z),T(z,u)

•  With or without self-joins:


Q(x,u) :- R(x,y),S(y,z),R(z,u)
Q(x,u) :- R(x,y),S(y,z),T(z,u)

CSE 544 - Fall 2016 4


Extensions

•  With inequalities CQ<:


Q(x) :- R(x,y),S(y,z),T(z,u),y<u

•  With disequalities CQ≠:


Q(x) :- R(x,y),S(y,z),T(z,u),y≠u

•  With aggregates:
Q(x,count(*)) :- R(x,y),S(y,z),T(z,u)
Q(x, sum(u)) :- R(x,y),S(y,z),T(z,u)

CSE 544 - Fall 2016 5


Complexity of Query Evaluation
•  The query evaluation problem is this:
given a query Q and a database D, compute Q(D)

•  Three complexity measures:


–  Data complexity. Fix Q. The complexity is f(|D|)
Variation: f(|Input|, |Output|)
–  Query (or expression) complexity. Fix D. The complexity is f(|Q|)
–  Combined complexity. The complexity if f(|D|,|Q|)

•  Example: data complexity of R ⋈ S

6
Complexity of Query Evaluation
•  The query evaluation problem is this:
given a query Q and a database D, compute Q(D)

•  Three complexity measures:


–  Data complexity. Fix Q. The complexity is f(|D|)
Variation: f(|Input|, |Output|)
–  Query (or expression) complexity. Fix D. The complexity is f(|Q|)
–  Combined complexity. The complexity if f(|D|,|Q|)

•  Example: data complexity of R ⋈ S


–  PTIME in |R|, |S| (trivially so...)

7
Complexity of Query Evaluation
•  The query evaluation problem is this:
given a query Q and a database D, compute Q(D)

•  Three complexity measures:


–  Data complexity. Fix Q. The complexity is f(|D|)
Variation: f(|Input|, |Output|)
–  Query (or expression) complexity. Fix D. The complexity is f(|Q|)
–  Combined complexity. The complexity if f(|D|,|Q|)

•  Example: data complexity of R ⋈ S


–  PTIME in |R|, |S| (trivially so...)
–  Better: O(|R| * |S|)

8
Complexity of Query Evaluation
•  The query evaluation problem is this:
given a query Q and a database D, compute Q(D)

•  Three complexity measures:


–  Data complexity. Fix Q. The complexity is f(|D|)
Variation: f(|Input|, |Output|)
–  Query (or expression) complexity. Fix D. The complexity is f(|Q|)
–  Combined complexity. The complexity if f(|D|,|Q|)

•  Example: data complexity of R ⋈ S


–  PTIME in |R|, |S| (trivially so...)
–  Better: O(|R| * |S|)
–  Even better: O( (|R|+|S|) log (|R|+|S|) + |Output| )

9
Complexity of Query Evaluation
•  The query evaluation problem is this:
given a query Q and a database D, compute Q(D)

•  Three complexity measures:


–  Data complexity. Fix Q. The complexity is f(|D|)
Variation: f(|Input|, |Output|)
–  Query (or expression) complexity. Fix D. The complexity is f(|Q|)
–  Combined complexity. The complexity if f(|D|,|Q|)

•  Example: data complexity of R ⋈ S


–  PTIME in |R|, |S| (trivially so...)
–  Better: O(|R| * |S|)
–  Even better: O( (|R|+|S|) log (|R|+|S|) + |Output| )
•  Discuss more about complexity in class...
10
Question in Class

•  Q(x,w) :- R(x,y),S(y,z),T(z,u),K(u,v),L(v,w)

•  Assume |R|=|S|=|T|=|K|=|L| = N

•  What is the complexity of Q?

CSE 544 - Fall 2016 11


Question in Class

•  Q(x,w) :- R(x,y),S(y,z),T(z,u),K(u,v),L(v,w) Πxw

•  Assume |R|=|S|=|T|=|K|=|L| = N ⨝

•  What is the complexity of Q? ⨝ L

•  What is the complexity of this plan? ⨝ K

⨝ T

CSE 544 - Fall 2016


R S 12
Question in Class

•  Q(x,w) :- R(x,y),S(y,z),T(z,u),K(u,v),L(v,w) Πxw

•  Assume |R|=|S|=|T|=|K|=|L| = N ⨝

•  What is the complexity of Q? ⨝ L

•  What is the complexity of this plan? ⨝ K

•  Can you find a more efficient plan? ⨝ T

CSE 544 - Fall 2016


R S 13
Question in Class
Πxw
•  Push projections down:
What about this complexity?

Πxu
•  Can we still improve?
L(v,w)

Πxu

⨝ K(u,v)
Πxz
T(z,u)

R(x,y) S(y,z)
Law of Semijoins
•  Input: R(A1,…An), S(B1,…,Bm)
•  Output: T(A1,…,An)
Definition: the semi-join operation is
R ⋉ S = Π A1,…,An (R ⨝ S)

•  Data complexity: O(|R| + |S|) ignoring log-factors

•  The law of semijoins is:

R ⨝ S = (R ⋉ S) ⨝ S
CSE 544 - Fall 2016 15
Laws with Semijoins

•  Very important in parallel databases


•  Often combined with Bloom Filters (my plan is to discuss
them in the next lecture)
•  Read pp. 747 in the textbook

•  Also used in query optimization, sometimes called “magic


sets” (see Chaudhuri’s paper)

•  Historical note: magic sets were invented after semi-join


reductions, and the connection became clear only later

CSE 544 - Fall 2016 16


Semijoin Reducer
•  Given a query:
Q = R 1 ⨝ R2 ⨝ . . . ⨝ Rn
•  A semijoin reducer for Q is
Ri1 = Ri1 ⋉ Rj1
Ri2 = Ri2 ⋉ Rj2
.....
Rip = Rip ⋉ Rjp
such that the query is equivalent to:

Q = Rk1 ⨝ Rk2 ⨝ . . . ⨝ Rkn


•  A full reducer is such that no dangling tuples remain

CSE 544 - Fall 2016 17


Example

•  Example:
Q = R(A,B) ⨝ S(B,C)

•  A semijoin reducer is:

R1(A,B) = R(A,B) ⋉ S(B,C)


•  The rewritten query is:

Q = R1(A,B) ⨝ S(B,C)

18
Semijoin Reducer

•  More complex example:


Q = R(A,B) ⨝ S(B,C) ⨝ T(C,D,E)
•  What is a full reducer?

19
Semijoin Reducer

•  More complex example:


Q = R(A,B) ⨝ S(B,C) ⨝ T(C,D,E)
•  A full reducer is:

S’(B,C) := S(B,C) ⋉ R(A,B)


T’(C,D,E) := T(C,D,E) ⋉ S’(B,C)
S’’(B,C) := S’(B,C) ⋉ T’(C,D,E)
R’(A,B) := R (A,B) ⋉ S’’(B,C)

Q = R’(A,B) ⨝ S’’(B,C) ⨝ T’(C,D,E)


20
Practice at Home...

•  Find semi-join reducer for


R(x,y),S(y,z),T(z,u),K(u,v),L(v,w)

CSE 544 - Fall 2016 21


Not All Queries Have Full Reducers

•  Example:

Q = R(A,B) ⨝ S(B,C) ⨝ T(A,C)

•  Can write many different semi-join reducers

•  But no full reducer of length O(1) exists

22
Acyclic Queries

•  Fix a Conjunctive Query without self-joins

•  Q is acyclic if its atoms can be organized in a tree


such that for every variable the set of nodes that
contain that variable form a connected component

R(x,y)
K(z,v) R(x,y)

S(y,z,u)
S(y,z)
L(v,m) T(z,x)

T(y,z,w) R(x,y),S(y,z),T(z,x)
Acyclic CSE 544 - Fall 2016 23
is cyclic
Yannakakis Algorithm

•  Given: acyclic query Q


•  Compute Q on any database in time O(|Input|+|Output|)

•  Step 1: semi-join reduction


–  Pick any root node x in the tree decomposition of Q
–  Do a semi-join reduction sweep from the leaves to x
–  Do a semi-join reduction sweep from x to the leaves
•  Step 2: compute the joins bottom up, with early
projections

CSE 544 - Fall 2016 24


Examples in Class

•  Boolean query: Q() :- ...


R(x,y)
K(z,v)
•  Non-boolean: Q(x,m) :- ...

S(y,z,u)
•  With aggregate: Q(x,sum(m)) :- ... L(v,m)

•  And also: Q(x,count(*)) :- ...


T(y,z,w)

In all cases: runtime = O(|R|+|S|+...+|L| + |Output|)


Testing if Q is Acyclic

•  An ear of Q is an atom R(X) with the following property:


–  Let X’ ⊆ X be the set of join variables (meaning: they occur in at
least one other atom)
–  There exists some other atom S(Y) such that X’ ⊆ Y

•  The GYO algorithm (Graham,Yu,Özsoyoğlu) for testing if


Q is acyclic:
–  While Q has an ear R(X), remove the atom R(X) from the query
–  If all atoms were removed, then Q is acyclic
–  If atoms remain but there is no ear, then Q is cyclic

•  Show example in class


CSE 544 - Fall 2016 26
Computing Cyclic Queries
R(x,y) ρ*=3/2

K(z,v) R(x,y),
S(y,z,u) A(y,s),B(x,s) ρ*=2

K(z,v),F(v,q)
L(v,m) S(y,z,u)
ρ*=1 ρ*=1

T(y,z,w)
ρ*=2
L(v,m)

T(y,z,w),
O(n + |Output|) C(z,t),D(t,p),E(p,y)
O(n2 + |Output|)
ftw = max(ρ*) = 2

Next lecture...

You might also like