0% found this document useful (0 votes)
12 views7 pages

Assignment Dataintegration 300341285 Gurdarshan Singh

Uploaded by

gurdarshan681
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views7 pages

Assignment Dataintegration 300341285 Gurdarshan Singh

Uploaded by

gurdarshan681
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

University of Ottawa / Université d’Ottawa

Faculty of Engineering
School of Electrical Engineering and Computer Science

Assignment 2 – Data Integration

Course CSI5137 Advanced Topics in Data Management


Academic year 2023-24
Submitted By Gurdarshan Singh
Student No. 300341285

Semester Fall
Instructor Verena Kantere

Announced date 17.11.2023


Submission date 03.12.2023

Every student must submit the assignment individually.

Ex.1 (25) Ex.2 (5) Ex.3 (15) Ex.4 (10) Ex.5 (8) Ex.6 (12) Ex.7 (10) Ex.8 (15)

Section A – Writing conjunctive queries and Datalog

Exercise 1 (25%)
Consider the Sailors-Boats-Reserves DB, which describes Sailors that Reserve Boats.
Sailors are described by their id ‘sid’, their name ‘sname’ their ‘rating’ and ‘age’. Boats
are described by their id ‘bid’, their name ‘bname’ and their ‘color’. Reserves stores
reservations of boats by sailors, as ‘sid’/’bid’ pairs, together with the ‘date’ of
reservation.

Sailors (sid, sname, rating,


age)Boats (bid, bname, color)
Reserves (sid, bid, date)
Write each of the following queries in the language of Conjunctive Queries/Datalog.

1. (3%) Find the colors of boats reserved by the sailor ‘Albert’.


Ans 1.
reservedBoatColor(BoatColor) :- Sailors(SailorID, 'Albert', _, _), Reserves(SailorID,
BoatID, _), Boats(BoatID, _, BoatColor).

2. (3%) Find all sailor id’s of sailors who have a rating of at least ‘8’ or reserved
boat‘103’.
Ans 2.
highRatedOrReserved103(SailorID) :- Sailors(SailorID, _, Rating, _), Rating >=
8.
highRatedOrReserved103(SailorID) :- Reserves(SailorID, '103', _).

3. (3%) Find the names of sailors who have not reserved a ‘red’ boat.
Ans 3.
notReservedRedBoat(SailorName) :- Sailors(SailorID, SailorName, _, _), not
Reserves(SailorID, BoatID, _), Boats(BoatID, _, 'red').

4. (4%) Find the names of sailors who have reserved at least two different boats.
Ans 4.
reservedTwoDifferentBoats(SailorName) :- Reserves(SailorID, BoatID1, _),
Reserves(SailorID, BoatID2, _), BoatID1 != BoatID2, Sailors(SailorID, SailorName, _,
_).

5. (4%) Find the sailor id’s of sailors whose rating is better than that of every
sailorcalled ‘Bob’.
Ans 5.
betterThanBob(SailorID) :- Sailors(SailorID, _, Rating, _), Rating > Sailors(_,
'Bob', BobRating, _).

6. (4%) Find the sailor id’s of sailors with the highest rating.
Ans 6.
highestRating(SailorID) :- Sailors(SailorID, _, Rating, _), not exists(Sailors(_, _,
HigherRating, _), HigherRating > Rating).
7. (4%) Find the name and age of the oldest sailor(s).
Ans 7.
oldestSailor(SailorName, Age) :- Sailors(_, SailorName, _, Age), not exists(Sailors(_,
_, _, OlderAge), OlderAge > Age).

Section B – Query unfolding and containment

Exercise 2 (5%)

Let us assume two relations A(x,y,z) and B(x,y) and the following two views:
U (u1, u2) :– A(u1, u2, w)
W (u1, u2) :– A(u1, u2, w), A(u1, u2, w), B(u1, w)
There is a query Q written as a composition of U and W:
Q(x1, x2) :– U(x1,y),U(x1,y),W(x2,y)
Unfold the query so that the body contains only subgoals with predicates A and B.
Ans 2.
The original query Q(x1, x2) is defined as:
Q(x1, x2) :- U(x1, y), U(x1, y), W(x2, y)

Now, substitute the definitions of U and W:


Q’(x1, x2) :- A(x1, y, w), A(x1, y, w), A(x2, y, w), A(x2, y, w), B(x2, w)

The unfolded query now contains only subgoals with predicates A and B:
Q’’(x1, x2) :- A(x1, y, w), A(x2, y, w), B(x2, w)

Intuitively, the unfolding of Q’ is equivalent to Q’’:

Exercise 3 (15%)

Prove containment or equivalence for the following cases of queries (i.e. prove whether
Q1 ⊆
Q2 or Q2 ⊆ Q1 holds; or both, for equivalence Q1≡ Q2):

1. (5%) Q1(x, y) :- R(x, y)


Q2(x, y) :- R(x, u), R(v, u), R(v, y)

Ans 1.
There is no containment mapping from Q1 to Q2
– X cannot be mapped correctly
There is a containment mapping from Q2 to Q1:
u->y ; v->x
Q2 ⊇ Q1, but Q2 ⊈ Q1, therefore Q1 and Q2 are not equivalent.

2. (5%) Q1(x) :- R(x, u), R(u, v)


Q2(x) :- R(x, u), R(x, y), R(u, v), R(u, w)

Ans 2.
There is no containment mapping from Q1 to Q2:
– X cannot be mapped correctly
There is a containment mapping from Q2 to Q1:
y->u ; w->v
Q2 ⊇ Q1, but Q2 ⊈ Q1, therefore Q1 and Q2 are not equivalent.

3. (5%) Q1(x) :- R(x, u), R(u, u)


Q2(x) :- R(x, u), R(u, v), R(v, w)

Ans 3.
There is no containment mapping from Q1 to Q2:
– X cannot be mapped correctly
There is a containment mapping from Q2 to Q1:
v->u ; w->u
Q2 ⊇ Q1, but Q2 ⊈ Q1, therefore Q1 and Q2 are not equivalent.

Exercise-4 (10%)
Consider the two conjunctive queries below:
Q1(x) :- R(x, y), R(y, z)
Q2(x) :- R(x, y)
1. (3%) Prove that Q1 ⊆ Q2
2. (7%) Find a conjunctive query Q3(x) such that Q1 ⊆ Q3 ⊆ Q2

Ans 4.
Given:
Q1(x) :- R(x, y), R(y, z)
Q2(x) :- R(x, y)
1. R(x, y) and R(y, z) are required for Q1 to be true. R(x, y) is all that is required for Q2 to
be true. If Q1 is true, it implies that R(x, y) is true, which is the only need for Q2. As a
result, Q1 ⊆ Q2.

2. R(x, y) and R(y, z) are required for Q1 to be true. Q3 needs R(x, y), R(y, z), and an extra
condition R(z, w) to be true. R(x, y) is all that is required for Q2 to be true. If Q3 is true,
it implies Q1 since Q3 has the same criteria as Q1 plus one more. If Q3 is true, it implies
Q2 because it meets the prerequisites for Q2. As a result, Q1 ⊆ Q3 ⊆ Q2.

Therefore, the conjunctive theory Q3(x) :- R(x, y), R(y, z), R(z, x)

Section C – Query rewriting

Exercise 5 (8%)

Let us consider the following queries, Q1, Q2, and view, V:

Q1b,f (x, z) :- P(x, y), P(y, z)


Q2f,b(x, y) :- P(x, y)
Vb,f (x, y) :- P(x, y)

Q1 requires that the variable x is bound to a constant when


executed.Q2 requires that the variable y is bound to a constant
when executed.
V can only give results when it is provided with a constant for the variable x.

1. (4%) Can Q1 be rewritten using V? If so, show the rewriting. If not, explain why.

Ans 1. To determine if Q1 can be rewritten using V, let's analyze the conditions:

Q1 requires that the variable x is bound to a constant.


V can only give results when it is provided with a constant for the variable x.
Therefore, Q1 can be rewritten using V. The rewriting would be:

Q1b,f(x, z) :- Vb,f(x, y), P(y, z)

This rewriting utilizes the view V to satisfy the condition that x needs to be bound to a
constant.

2. (4%) Can Q2 be rewritten using V? If so, show the rewriting. If not, explain why.

Ans 2. To determine if Q2 can be rewritten using V, let's analyze the conditions:


Q2 requires that the variable y is bound to a constant.
V can only give results when it is provided with a constant for the variable x.
Therefore, Q2 cannot be directly rewritten using V because the required constant binding is
on a different variable (y in Q2, but x in V).

Exercise 6 (12%)

Assume the following views:

V1(E, P, M) -: emp(E), phone(E, P), mgr(E, M)


V2(E, O, D) -: emp(E), office(E, O), dept(E, D)
V3(E, P) -: emp(E), phone(E, P), dept(E, ‘ToyDept’)

Assume that there is a query Q that asks for ‘Sally’s phone and office
information:Q(P, O) :- phone(‘Sally’, P), office(‘Sally’, O)

Assume that we want to rewrite Q using the above views and employing the Inverse
Rulesquery rewriting algorithm.

1. (7%) Write all the inverse rules. Note that a set of inverse rules should be
created foreach relation appearing as a predicate in subgoals in the views.

Ans 1.
1. Inverse Rules:
For each relation appearing as a predicate in subgoals in the views, we create inverse
rules. Here are the inverse rules for the given views:

For V1:

Inverse Rule for emp: emp(E) :- V1(E, P, M)


Inverse Rule for phone: phone(E, P) :- V1(E, P, M)
Inverse Rule for mgr: mgr(E, M) :- V1(E, P, M)

For V2:

Inverse Rule for emp: emp(E) :- V2(E, O, D)


Inverse Rule for office: office(E, O) :- V2(E, O, D)
Inverse Rule for dept: dept(E, D) :- V2(E, O, D)

For V3:

Inverse Rule for emp: emp(E) :- V3(E, P)


Inverse Rule for phone: phone(E, P) :- V3(E, P)
Inverse Rule for dept: dept(E, 'ToyDept') :- V3(E, P)

2. (5%) Rewrite Q employing the inverse rules.

Ans 2. The original query is: Q(P, O) :- phone('Sally', P), office('Sally', O)

Now, let's rewrite using the inverse rules:


Use the inverse rule for phone to obtain emp(E) :- phone(E, P)

Use the inverse rule for office to obtain emp(E) :- office(E, O)

Combine the two inverse rules using a common variable E:


Q(P, O) :- emp(E), phone(E, P), office(E, O)

Q(P, O) :- emp(E), phone(E, P), office(E, O)


So, the rewritten query using the inverse rules is:
Q(P, O) :- emp(E), phone(E, P), office(E, O)

Section D – Schema mappings

Exercise-7 (10%)
Let us consider the following global schema, that stores data about students that study in
universities:
Global schema
StudiesIn (studentName, universityName)
UndergradStudent (studentName)
Let us also consider the local schema of sources S1 and S2. S1 stores data related to
Canadian
students (in which program they are studying and in which university) and S2 stores data
for
Canadian programs and the courses they offer:
Local schemata
S1.CanadianStudents (student, program, university).
S2.CanadianPrograms (program, course)
We are given the following Global-As-View (GAV) mappings:
Μ1. UndergradStudent (N) ⊇ S1.CanadianStudents (N, P, U), S2.CanadianPrograms (P,
C)
Μ2. StudiesIn (N, U) ⊇ S1.CanadianStudents (N, T, U)
The following query is posed on the global schema:
Q(X):- StudiesIn(S, X), UndergradStudent(S)
1. (5%) Rewrite Q using the GAV mappings, employing the technique of unfolding.
2. (5%) Is it possible to simplify the rewritten query so that it does not have more than 2
subgoals?

Ans 7.
Rewriting Q using GAV mappings (Unfolding):
The query Q(X): −StudiesIn(S,X), UndergradStudent(S) can be rewritten using the given GAV
mappings M1 and M2 through the technique of unfolding.
1. Unfold using M1:
UndergradStudent(S)⇒S1.CanadianStudents(S,P,U),S2.CanadianPrograms(P,C)
2. Unfold using M2:
StudiesIn(S,X)⇒S1.CanadianStudents(S,T,X)
3. Combine the unfolded mappings:
Q(X):−S1.CanadianStudents(S,P,U),S2.CanadianPrograms(P,C),S1.CanadianStudent
s(S,T,X)
After Simplifying:
Now, let's attempt to simplify the query to have no more than 2 subgoals.
1. Combine the common variable S:
Q(X):−S1.CanadianStudents(S,P,U),S1.CanadianStudents(S,T,X),S2.CanadianPrograms(
P,C)
2. Simplify by removing redundant subgoals:
Q(X):−S1.CanadianStudents(S,T,X),S2.CanadianPrograms(P,C)

Because M1 and M2 are present, the unfolded inquiry has three subgoals. However, by merging
common variables and deleting superfluous subgoals, the reduced query with no more than two
subgoals is achieved. The substance of the original query is retained while readability and
efficiency are improved.

Exercise-8 (15%)
Let us consider the following global schema that stores warehouses and the products they
contain:
Global schema
Warehouse (productid, productname, quantity, lastupdate)
Let us consider the following local schemata, that stores data about markets and products:
Local schema
MarketOverview(prodid, quantity, lastupdate)
Product(prodid, productname, manufacturer, price)
1. (10%) Show the GAV and LAV mappings for the above schema under the open- world
assumption.
2. (5%) Comment on whether the GAV mappings are better than the LAV mappings or
the
opposite for this specific example. Say which mappings you would choose for this example.
Note: Assume correspondence of variables only if they have the same name.

Ans 8.
Global schema
Warehouse (productid, productname, quantity, lastupdate)
Local schema
MarketOverview(prodid, quantity, lastupdate)
Product(prodid, productname, manufacturer, price)
1. GAV mappings under open-world assumption
Warehouse (NULL, productname, quantity, lastupdate) ⊇
MarketOverview(prodid, quantity, lastupdate),
Product(prodid, productname, manufacturer, price)
LAV mappings under open-world assumption
MarketOverview(NULL, quantity, lastupdate) ⊆
Warehouse(productid, productname, quantity, lastupdate)

Product(NULL, productname, NULL, NULL) ⊆


Warehouse(prodid, productname, manufacturer, price)
2. Using the LAV mappings, the join between MarketOverview.prodid and Product.prodid
is lost in this instance.

By utilizing GAV mapping, we can keep the combination of values on the columns
productname and quantity.
Therefore, for this example, GAV mappings are better than LAV mappings. Hence, GAV
mapping is preferred.

You might also like