0% found this document useful (0 votes)
6 views7 pages

cs317 s2022 Midsem

The document outlines a mid-semester exam consisting of various questions related to SQL, relational algebra, functional dependencies, multisets, temporal data, and big data concepts. Each question includes specific tasks, answers, and rubrics for grading. The exam tests students' understanding of database concepts, query formulation, and data integrity principles.

Uploaded by

Shreyas Katdare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views7 pages

cs317 s2022 Midsem

The document outlines a mid-semester exam consisting of various questions related to SQL, relational algebra, functional dependencies, multisets, temporal data, and big data concepts. Each question includes specific tasks, answers, and rubrics for grading. The exam tests students' understanding of database concepts, query formulation, and data integrity principles.

Uploaded by

Shreyas Katdare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Midsem Exam (60 marks)

1. Short answers (8 marks) Priyesh Kumar ([email protected])


a. What is the problem with an application program constructing an SQL query as
follows: String query = “select * from student where id =’”+ id + “‘;”. (2 marks)
Answer:
Vulnerable to SQL injection attack.
Example, suppose id = ‘fake’ OR 1=1;--’, then above query will be
evaluated as “select * from student where id = ‘fake’ OR 1=1; --'; ”
which evaluates to true always and exposes all rows.
[Optional] Solution is to escape special characters before substituting
variables.

Rubric: 1 mark for problem/example, 1 mark for explanation. [0.5 marks


for runtime exception]

b. Given relations reading(time, val), write an SQL query to find for each time t the
average of the readings at t-1, t and t+1. Those times where there is no reading
at t-1 or t+1 are to be omitted. (3 marks)
Answer:
SELECT r2.time, (r1.val+r2.val+r3.val)/3
FROM reading as r1, reading as r2, reading as r3
WHERE r1.time+1=r2.time AND r2.time+1=r3.time
Rubric: 1 mark for from clause, 1 mark for where clause and 1 mark for select
clause
Alternative using SQL window syntax:
SELECT time, avg(val) over(ORDER BY time ROWS BETWEEN 1
PRECEDING AND 1 FOLLOWING) as ‘avg’
FROM reading;
Rubric: 1 mark each for preceding and following row, 1 mark for complete query

c. Given entity sets e1 and e2 with primary keys p1 and p2 respectively and a
many-to-one relationship from e1 to e2 called adv, write down the SQL create
table statement for relation r, with all relevant constraints. (errata: should have
said relations, not relation r) (3 marks)

e1(p1, other attributes)


e2(p2, other attributes)
adv(p1, p2, other attributes)

As each tuple of e1 can be associated with at most 1 tuple of e2, we can


merge e1 and adv relations as follows.
e1(p1, other attributes, p2) where p2 is foreign key
e2(p2, other attributes)

Corresponding SQL statement


CREATE TABLE e2 (p2 INT, other attributes…, PRIMARY KEY (p2));
CREATE TABLE e1(p1 INT,
p2 int NOT NULL, other attributes,
PRIMARY KEY p1,
FOREIGN KEY p2 REFERENCES e2(p2)
)
Rubrics: create table for e2 with primary key: 1 mark, create table
for e1 with primary key 1 mark, foreign key ½ mark, e1.p2 not null ½
mark.

Alternative without merging will get partial marks:

CREATE TABLE e1 (p1 INT, other attributes…, PRIMARY KEY (p1));


CREATE TABLE e2 (p2 INT, other attributes…, PRIMARY KEY (p2));
CREATE TABLE adv (
p1 INT, p2 INT, other attributes…,
PRIMARY KEY (p1,p2),
FOREIGN KEY (p1) REFERENCES e1(p1)
FOREIGN KEY (p2) REFERENCES e2(p2));
Rubrics: create table for e2 with primary key: 1 mark
Create table for e1: 1/2 mark, create table for adv with all primary
and foreign keys: 1 mark: Total 2.5 marks for this answer.

2. Query equivalence (10 marks): Pooja Gayakwad ([email protected])


a. Give a relational algebra query equivalent to the following SQL query with
relations r(A,B) and s(B,C), where r.A is the primary key of r, and (s.B,s,C) is the
primary key for s. You can assume aggregate operations in relational algebra
behave the same as in SQL with null values. (Hint: read the other parts of this
question before you start answer this part) (5 marks)

SELECT A, (SELECT count(C) FROM s WHERE s.B=r.B)


FROM r

Answer: RA query- (R.a γ (r ⟕ s)) [Note the left outer join used here;
count(S.c)
it is also OK to use inner joins and union to implement left outer join)
Rubrics: 1 mark for gamma operator use, 1 mark for the group by attribute, 1
mark for aggregate, 2 marks for r left outer join s
b. Would your relational algebra query and SQL query be equivalent if r.A is not a
primary key for r.A? Explain your answer. (3 marks)

Answer: Not always. There can be duplicate rows in the output. The duplicate
rows, if any, will be eliminated merged by the RA query. Also for [1.5 marks]

Reason: As attribute A can have duplicate values when A is not a primary key,
there can be rows with the same value for A but different value for B. [1.5 marks]

c. How can you simplify the query if r.B is a foreign key referencing s.B, and r.B is
declared as not null? (2 marks)
Answer: In that case the left outer join will be equivalent to an inner join because
every r.B value will have a matching s.B value, so we can replace the left outer
join by an inner join
Rubrics: 1 mark for saying inner join (or natural join) instead of outer join, 1 mark
for explanation of why it will be equivalent.

3. FDs (10 marks): Given the relational schema r(A,B,C,D,E,F) and FDs Paras garg
([email protected])
A-> BC, B->C, CD->E, AD->E
do the following:
a. Find a candidate key and explain why it is a candidate key (3 marks)
b. Find a canonical cover, showing all the steps in computing it. (4 marks)
c. Give a 3NF decomposition of the relation with an explanation of how you
computed it (3 marks)

Answer :

A. Candidate key : ADF


Reason : F is not a part of any functional dependency and A,D does not occur on rhs of
functional dependencies hence (ADF) are the minimal attributes of candidate key.
(ADF)+ can be easily verified to have all attributes, so it is a superkey, while deleting any
one of A, D and F results in a set of attributes that is not a superkey. Thus ADF is a
candidate key.
Rubrics: 1.5 marks for correct answer, 0.5 to show that it is a superkey, and 1 mark to
explain that it is minimal
B. Steps: Starting with A->B, A->C, B->C, CD->E, AD->E, Remove extraneous
attributes:
a. Since we can infer A->C using A->B and B->C , C is redundant in A->C, and thus
A->C can be removed.
b. Since we can use A->C and CD->E to infer AD->E, E is redundant in AD-> E and
thus AD->E can be removed
c. Remaining FDs are A-> B, B->C, CD->E
d. There are no other extraneous attribute on RHS or LHS, and no FDs have same
LHS for merging.
e. Canonical Cover is A-> B, B->C, CD->E
Rubrics: 0.5 marks for each of 3 FDs in canonical cover, 1 marks for
explanation of each or two extraneous attribute, 0.5 marks to mentioning no FDs
have same LHS for merging.
C. 3 NF synthesis
a. Create a relation from each FD in canonical cover:
R1(A,B), R2(B,C), R3(C,D,E)
b. None of the relations is contained in another, so no schema is redundant
c. We need to ensure a candidate key is present. None of the Ri contains a
candidate key, so we add the relation R4(A,D,F) from the candidate key ADF.
Rubrics: 0.5 marks for each of 4 relations, 0.5 marks for mentioning canonical
cover, and 0.5 marks for adding candidate key

4. Multisets and null values (10 marks): Multisets behave differently from sets.
Consider relation r(A,B,C) - Shiva Tarun(180050042)
a. A is primary key of relation r implies that A->BC. Does A-> BC imply that A is
primary key of r, if r is a multiset relation? Explain your answer. (3 marks)

Answer: No, A->BC Doesn’t imply that A is the primary key. (1 mark)

Counter-example/explanation- Consider the case of where there are two identical


rows. Here A->BC holds but A is not the primary key.(2 marks)

b. Given B->C, the decomposition r1(A,B) and r2(B,C) is lossless join with sets.
Does this property hold with multisets, where r1 and r2 are created by multiset
projection on r, and the multiset version of join is used instead of the set version?
Give a small example to explain your answer. (4 marks)

Answer: Its a lossy decomposition. (1 mark)

Counter-example/explanation
Consider the case where we have two rows [x,y,z]. B->C holds here. On
decomposition we get two [x,y] and two [y,z]. On joining them back we get four
[x,y,z]. (3 marks)

c. Restricting ourselves to sets, suppose that B may have null values Then is the
decomposition lossless join? Explain your answer (3 marks)
Answer:Its a lossy decomposition. (1 mark)

Counter-example/explanation - Consider a row [x,null,y]. On decomposition we


get [x,null] and [null,y]. When we join them back later we don’t get the initial row.
As we can’t compare null with null. (2 marks) ;

5. Temporal data (10 marks). Given two temporal relations r(A, B, start, end) and s(B, C,
start, end), where the valid time of a tuple is [start,end) Sai
Phanindra([email protected])
a. Write an SQL query to check if the relation r satisfies the constraint that r.A is a
temporal primary key. The query should return a non-empty relation if the
temporal primary key constraint is violated, and the empty relation if the
constraint is satisfied. You can assume a function overlaps(s1, e1, s2, e2) which
returns true iff [s1, e1) overlaps with [s2, e2) (5 marks)
Answer:
SELECT r1.A
FROM r as r1, r as r2
WHERE r1.A = r2.A and overlaps(r1.start, r1.end, r2.start, r2.end) and
!(r1.B=r2.B and r1.start = r2.start and r1.end = r2.end)

Rubrics: 2 marks for select clause and 2 relations in from clause


1 mark for each of the WHERE clause conditions
b. Write an SQL query to check if the temporal foreign key dependency r.B
references s.B is satisfied, using the following functions (5 marks):
i. an aggregate function UIV, used in an SQL query as UIV(start, end). The
aggregate result is not a single interval, but rather a collection of intervals.
ii. contains(ic1,i2) which checks if a collection of intervals ic1 contains
interval i2
Answer:
WITH foo AS ( SELECT B, UIV(start, end) AS u
FROM s
GROUP BY B)
SELECT r.B
FROM r
WHERE NOT EXISTS (SELECT * FROM foo WHERE foo.B=r.B
AND contains(u,(r.start, r.end)))
Non-empty result implies temporal foreign key is not satisfied

Rubrics: Using groupby to create union of intervals in with clause: 2 marks


Using subquery to check for not exists: 1 mark
Each where clause condition in subquery: 1 mark x 2.
NOTE: instead of with clause, the whole groupby query can be nested within
the main query.
0 marks for wrong use of UIV (no cribs are allowed).

6. Big Data and map-reduce (12 marks): Vinayak Gosula ([email protected]) Using
the signature map(record) and reduce (key, list-of-values), write pseudocode for the
following::
a. Given a relation packet(time, src, dest, size) recording each packet flowing on a
network link, write a map-reduce program that outputs the total number of bytes
flowing from each source in each second. NOTE: this is not a streaming system,
the relation is stored already. (6 marks)

Answer: Assuming 1 record per line, and inputs to map function are lines,
MAP(record):
time, src, dest, size = record.split()
emit( {src, floor(time)}, size )

REDUCE(key, list):
s=0;
For each value in list
s=s+value
emit( key, s )
Rubrics: 1 mark for 1st line of map, 2 marks for 2nd line of map
(partial mark of 1 mark may be given if floor is omitted.
3 marks for reduce (partial mark of 1 may be given in case of small error)
b. Write map-reduce code to execute the following query on relations r(A,B), and
s(B,C,D) (6 marks):
SELECT r.A, r.B, s.C, s.D FROM r LEFT OUTER JOIN s ON (r.B=s.B)

Answer:
MAP(record)
If record is from r emit (r.B, (“r”, r.A))
If record is from s emit (s.B, (“s”, s.C, s.D)
REDUCE(key, list)
r-list = s-list = empty
Iterate over list and add records to r-list or s-list depending
on value as “r” or “s”
If s-list is empty
for each record in r-list emit (r.A, key, null, null)
Else for each record r1 in r-list, for each record s1 in s-list
Emit (r1.A, key, s1.C, s1.D)
NOTE: value emitted by map can include r.B and s.B,
although not required since it is in key
Rubrics: Map: 1 mark for each emit.
Reduce: 1 mark for creation of the two lists
1 marks for proper if condition
1 mark each for correct emit in each of two cases

You might also like