0% found this document useful (0 votes)
31 views9 pages

(Slide) Containment Conjunctive Queries

Slide de Containment Conjunctive Queries

Uploaded by

thbinhqn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views9 pages

(Slide) Containment Conjunctive Queries

Slide de Containment Conjunctive Queries

Uploaded by

thbinhqn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Conjunctive Queries

= safe, Datalog rules:


H :- G1 &    & Gn

 Most common form of query; equivalent to





select-project-join queries.
Useful for optimization of active elements,
e.g., checking distributed constraints,
maintaining materialized views.)
Useful for information integration.

Applying a CQ to a Database

If Q is a CQ, and D is a database of EDB facts,


then Q(D) is the set of heads of Q that we get
when we:
 Substitute constants for variables in the body
of Q in all possible ways.
 Require all subgoals to become true.

Example
p(X; Y ) : , q(X; Z ) & q(Z; Y )

 EDB = fq(1; 2); q(2; 3); q(3; 4)g.


 Only substitutions that make subgoals both


true:
1. X ! 1; Y ! 3; Z ! 2.
2. X ! 2; Y ! 4; Z ! 3.
Yield heads p(1; 3) and p(2; 4).

Containment
Q1  Q2 i for every database D, Q1(D)  Q2(D).
 Containment problem is NP-complete, but



not a \hard" problem in practical situations


(short queries, few pairs of subgoals with same
predicate).
Function symbols do not make problems more
dicult.
Adding negated subgoals and/or arithmetic
subgoals, e.g., X < Y , makes things more
complex, but important special cases.

Example
1

A: p(X,Y) :- r(X,W) & b(W,Z) & r(Z,Y)


B : p(X,Y) :- r(X,W) & b(W,W) & r(W,Y)

 Claim: B  A.
 In proof, suppose p(x; y) is in B (D). Then
there is some w such that r(x; w), b(w; w), and
r(w; y) are in D.
 In A, make the substitution X ! x, Y ! y,
W ! w, Z ! w.
 Thus, the head of A becomes p(x; y), and all
subgoals of A are in D.
 Thus, p(x; y) is also in A(D), proving B  A.

Testing Containment of CQ's

1. Containment mappings.
2. Canonical databases.
 Similar for basic CQ case, but (2) is useful for
more general cases like negated subgoals.

Containment Mappings

Mapping from variables of CQ Q2 to variables of


CQ Q1 such that
1. Head of Q2 becomes head of Q1 .
2. Each subgoal of Q2 becomes some subgoal of
Q1.
It is not necessary that every subgoal of
Q1 is the target of some subgoal of Q2.

Example

A, B as above:
A: p(X,Y) :- r(X,W) & b(W,Z) & r(Z,Y)
B : p(X,Y) :- r(X,W) & b(W,W) & r(W,Y)

 Containment mapping from A to B : X ! X ,


Y ! Y , W ! W, Z ! W.
 No containment mapping from B to A.
Subgoal b(W; W ) in B can only go to b(W; Z )
in A. That would require both W ! W and
W ! Z.

Example
C1: p(X) :- a(X,Y) & a(Y,Z) & a(Z,W)
C2: p(X) :- a(X,Y) & a(Y,X)
2

 Containment mapping from C1 to C2. X !


X, Y ! Y , Z ! X, W ! Y .
 No containment mapping from C2 to C1.

Proof:
a) X ! X required for head.
b) Thus, rst subgoal of C2 must map to
rst subgoal of C1; Y must map to Y .
c) Similarly, 2nd subgoal of C2 must map to
2nd subgoal of C1, so X must map to Z .
d) But we already found X maps to X .

Containment Mapping Theorem


Q1  Q2 i there exists a containment mapping
from Q2 to Q1 .

Proof (If)
Let : Q2 ! Q1 be a containment mapping. Let D
be any DB.
 Every tuple t in Q1(D) is produced by some
substitution  on the variables of Q1 that
makes Q1's subgoals all become facts in D.
 Claim:    is a substitution for variables of
Q2 that produces t.
1.   (Fi) = (some Gj ). Therefore, it is
in D.
2.   (H2 ) = (H1 ) = t.
 Thus, every t in Q1(D) is also in Q2 (D); i.e.,
Q1  Q2.

Proof (Only If)

Key idea: frozen CQ.


1. Create a unique constant for each variable of
the CQ Q.
2. Frozen Q is a database consisting of all the
subgoals of Q, with the chosen constants
substituted for variables.

Example
p(X) :- a(X,Y) & a(Y,Z) & a(Z,W)

Let x be the constant for X , etc. The relation


for predicate a consists of the three tuples (x; y),
(y; z ), and (z; w).
3

Proof (Only If) Continued


Let Q1  Q2 . Let database D be the frozen Q1 .
 Q1(D) contains t, the \frozen" head of Q1

Sounds gruesome, but the reason is that


we can use the substitution in which
each variable of Q1 is replaced by its
corresponding constant.
Since Q1  Q2 , Q2(D) must also contain t.
Let  be the substitution of constants from
D for the variables of Q2 that makes each
subgoal of Q2 a tuple of D and yields t as the
head.
Let  be the substitution that maps constants
of D to their unique, corresponding variable of
Q1.





Q2:

Q1:

E :- F1 &    Fm (X; Y )
t

ab

H :- G1 &    & Gi(A; B ) &   

    is a containment mapping from Q2 to Q1


because:
a) The head of Q2 is mapped by  to t, and
t is the frozen head of Q1, so    maps
the head of Q2 to the \unfrozen" t, that
is, the head of Q1 .
b) Each subgoal Fi of Q2 is mapped by  to
some tuple of D, which is a frozen version
of some subgoal Gj of Q1. Then   
maps Fi to the unfrozen tuple, that is, to
Gj itself.

Dual View of Containment Mappings

A containment mapping, de ned as a mapping on


variables, induces a mapping on subgoals.
 Therefore, we can alternatively de ne a
containment mapping as a function on
subgoals, thus inducing a mapping on
variables.
 The containment mapping condition becomes:
the subgoal mapping does not cause a variable
to be mapped to two di erent variables or
4

constants, nor cause a constant to be mapped


to a variable or a constant other than itself.

Example
Again consider
A: p(X,Y) :- r(X,W) & b(W,Z) & r(Z,Y)
B : p(X,Y) :- r(X,W) & b(W,W) & r(W,Y)

 Previously, we found the containment


mapping X ! X , Y ! Y , W ! W , Z ! W
from A to B .
 We could as well describe this mapping as
r(X; W ) ! r(X; W ), b(W; Z ) ! b(W; W ),
and r(Z; Y ) ! r(W; Y ).

Method of Canonical Databases


Instead of looking for a containment mapping from
Q2 to Q1 in order to test Q1  Q2, we can apply
the following test:
1. Create a canonical database D that is the
frozen body of Q1.
2. Compute Q2(D).
3. If Q2(D) contains the frozen head of Q1, then
Q1  Q2; else not.
 The proof that this method works is
essentially the same as the argument for
containment mappings:
The only way the frozen head of Q1
can be in Q2 (D) is for there to be a
containment mapping Q2 ! Q1 .

Example
C1: p(X) :- a(X,Y) & a(Y,Z) & a(Z,W)
C2: p(X) :- a(X,Y) & a(Y,X)
Here is the test for C2  C1 :
 Choose constants X ! 0, Y ! 1.
 Canonical DB from C1 is
D = fa(0; 1); a(1; 0)g

 C1(D) = fp(0); p(1)g.


5

 Since the frozen head of C2 is p(0), which is in


C1(D), we conclude C2  C1.
Note that the instantiation of C1 that
shows p(0) is in C1(D) is X ! 0, Y ! 1,
Z ! 0, and W ! 1.

If we replace 0 and 1 by the variables


X and Y they stand for, we have the
containment mapping from C1 to C2 .

Saraiya's Containment Test


 Containment of CQ's is NP-complete in





general.
Sariaya's algorithm is a polynomial-time test
of Q1  Q2 for the common case that no
predicate appears more than twice among the
subgoals of Q1.
They can appear any number of times in
Q2.
The algorithm is a reduction to 2SAT and
yields a linear-time algorithm.
Our algorithm is more direct, but quadratic.

The Algorithm

Pick a subgoal of Q2, and consider the


consequences of mapping it to the two possible
subgoals of Q1.
 Follow all consequences of this choice:
subgoals that must map to subgoals, and
variables that must map to variables.
If we know p(X1 ; : : :; Xn) must map to
p(Y1 ; : : :; Yn), then infer that each Xi
must map to Yi .
If p(X1 ; : : :; Xn ) is a subgoal of Q2, and
we know Xi maps to some variable Z ,
and exactly one of the p-subgoals of
Q1 has Z in the ith component, then
conclude p(X1 ; : : :; Xn) maps to this
subgoal.
One of two things must happen:
1. We derive a contradiction: a subgoal or
variable that must map to two di erent
things.
If so, try the other choice if there is one;
fail if there is no other choice.
6

2. We close the set of inferences we must make.


Then we can forever forget about the
question of how to map the determined
subgoals and variables.
We have found one mapping that works
and that can't interfere with the mapping
of any other subgoals or variables, so we
make another arbitrary choice if there are
any unmapped subgoals.

Example

Let us test C1  C2 , where:


C1: p(B) :- a(A,B) & a(B,A) & b(A,C) & b(C,B)
C2: p(X) :- a(X,Y) & b(Y,Z) & b(Z,W) & a(W,X)

 Note this simple example omits some options:


C1 could have a predicate appearing only once
in the body, and C2 could have 3 or more
occurrences of some predicates.
 Here is a description of inferences that might
be made:
(1) Suppose a(X; Y ) ! a(A; B )
(2)
Then X ! A, Y ! B
(3)
Now, b(Y; Z ) ! b(B; ?)
(4)
Since there is no b(B; ?), fail
(5) Thus, we must map a(X; Y ) ! a(B; A)
(6)
Then X ! B and Y ! A,
(7)
b(Y; Z ) ! b(A; C ), Z ! C ,
(8)
b(Z; W ) ! b(C; B ), W ! B
(9)
Now, a(W; X ) must map to a(B; B )
(10)
Since a(B; B ) does not exist, fail

 Note, however, that if the last subgoal of C1


were b(C; A), we would have W ! A at
line (8) and a(W; X ) ! a(A; B ) at line (9).

That completes the containment mapping


successfully, with X ! B , Y ! A, Z !
C , and W ! A.

Generalization to Unions of CQ's


P1 [ P2 [    [ Pk  Q1 [ Q2 [    [ Qn i for all
Pi there exists some Qj such that Pi  Qj .
Proof (If)
Obvious.
7

Proof (Only If)

Assume the containment holds.


 Let D be the canonical (frozen) database from
CQ Pi .
 Since the containment holds, and Pi (D) surely
includes the frozen head of Pi, there must be
some Qj such that Qj (D) includes the frozen
head of Pi .
 Thus, Pi  Qj .

Union Theorem Just Misses Being False

Consider generalized CQ's allowing arithmeticcomparison subgoals.


P1: p(X) :- e(X) & 10 <= X & X <= 20
Q1: p(X) :- e(X) & 10 <= X & X <= 15
Q2: p(X) :- e(X) & 15 <= X & X <= 20

 P1  Q1 [ Q2, but P1  Q1 and P1  Q2 are


both false.

CQ Contained in Recursive Datalog

Test relies on method of canonical DB's;


containment mapping approach doesn't work (it's
meaningless).
 Make DB D from frozen body of CQ.
 Apply program to D. If frozen head of CQ
appears in result, then yes (contained), else
no.

Example
 CQ Q1 is:

Q1: path(X,Y) :- arc(X,Z) &

arc(Z,W) & arc(W,Y)

 Q2 is the value of path in the following

recursive Datalog program:


r1: path(X,Y) :- arc(X,Y)
r2: path(X,Y) :- path(X,Z) & path(Z,Y)

 Intuitively, Q1 = paths of length 3; Q2 =




paths of length 1 or more.


Freeze Q1, say with 0, 1, 2, 3 as constants for
X , Z , W , Y , respectively.
D = farc(0; 1); arc(1; 2); arc(2; 3)g
8

 Frozen head is path(0; 3).


 Easy to infer that path(0; 3) is in Q2(D) |
use r1 three times to infer path(0; 1),
path(1; 2), path(2; 3), then use r2 to infer
path(0; 2), path(0; 3).

Harder Cases
 Datalog program  CQ: doubly exponential

complexity. Reference: Chaudhuri, S. and


M. Y. Vardi [1992]. \On the equivalence of
datalog programs," Proc. Eleventh ACM
Symposium on Principles of Database
Systems, pp. 55{66.
Datalog program  Datalog program:
undecidable.

You might also like