Assignment 2: Write Clearly Your Name, Student Number and Lab Number On The Front Page of Your Assignment
Assignment 2: Write Clearly Your Name, Student Number and Lab Number On The Front Page of Your Assignment
Write clearly your Name, Student Number and Lab Number on the front
page of your assignment.
Deliverables:
The answers of the following questions 1, 2, 3, and 4 clearly typed on paper.
The TA will take into account the cleanliness of what was handed in. It is your responsibility to
make your assignment readable.
There are inference rules for functional dependencies. Three of those rules are known as the
Armstrong's Axioms: Reflexivity, Augmentation and transitivity. These axioms are sound and
complete. We saw in class two other rules that are inferred from these axioms. These derived rules
are known as the decomposition rule and the union rule.
There is yet another inference rule called pseudotransitive rule that stipulates that:
if XÿY and WYÿZ then WXÿZ
Prove this rule using the known axioms.
Solution:
1- Xÿ Y (given)
2- WY ÿ Z (given)
3- WX ÿ WY (augmentation on 1.)
4- WX ÿ Z (transitivity on 3. and 2.)
2- Is the set of functional dependencies F minimal? If not, try to find a minimal set of functional
dependencies that is equivalent to F (minimal cover). Prove the equivalence.
Solution:
No, the set of functional dependencies F is not minimal since the right-hand side of the rules
have more than just one attribute.
The minimal cover G of F is:
SIN ÿ E_Name
SIN ÿ B_Date
SIN ÿ Address
SIN ÿ D_Num
D_Num ÿ D_Name
D_Num ÿ D_Manager
To prove that two sets of functional dependencies F and E are equivalent, we either show that
F+ = E+ or that E covers F and F covers E.
To show that F covers E, we calculate X+ with respect to F for every FD XÿY in E and check
whether X+ includes the attributes in Y.
Rather than calculating G+ and F+ we show that the coverage of G and F.
F covers G
{SIN}+ = {SIN, E_Name, B_Date, Address, D_Num, D_Name, D_Manager} and {D_Num}+ =
{D_Num, D_Name, D_Manager} with respect to F (see 2.1). All right hand side of any FD in G
is included.
G covers F
{SIN}+ = {SIN, E_Name, B_Date, Address, D_Num, D_Name, D_Manager} and {D_Num}+ =
{D_Num, D_Name, D_Manager} with respect to G and all right-hand side of any FD in F is
included.
2NF 3NF
R1 (A, B, C) R1 (A, B, C)
R2 (B,D, E, F) R2 (B,D, E, F)
R3 (A, D, G, H, J) R3.1 (A, D, G, H)
R4 (A, I) R4 (A, I)
R3.2 (H, J)
Question 4: (Query Optimization) [40%]
Employee(SIN, E-Name, B-Date, Address, Sex, Salary, Supervisor) with 10,000 tuples;
Works(ESIN, PNO) with 20,000 tuples;
Project(P-Name, P-Type, P-Num, Location, D-Num) with 500 tuples.
Knowing that one page can accommodate 100 tuples of Employee, 400 tuples of Works, or 120
tuples of Project, and assuming that we have 6 buffers in main memory calculate the cost for
evaluating Q if we choose Bloc-Nested Loop joins or Sort-merge joins for both of the two joins, or
Bloc-Nested Loop for the first join and Sort-Merge for the second join. The first join is between
Project and Works while the second joins the result with Employee. Assume that the same number
of tuples of the result of the first join can fit per page as we can fit Project tuples (120). Which plan
would be the best? Assume that there are 5 types of projects and ¾ of the employees are born after
January 29, 1961. All distributions are uniform. Push selections as early as possible in all cases.
Draw your query plans.
Solution:
Employee has 10,000 tuples, 100 per page there are 10000/100 = 100 pages
Works has 20,000 tuples with 400 per page there are 20000/400 = 50 pages
Project has 500 tuples with 120 per page there are 500/120 = 4.16 ≈ 5 pages
Since there are 5 types of projects, the selection on projects with type=design will generate 100
tuples fitting in one page.
Since there are ¾ of employees born after 1961, the selection with the birth date constraint will
generate 7500 tuples fitting in 75 pages.
Since the selection on Project is smaller than the Works
π E-Name, Salary relation, Works should better be the outer relation.
BNL On the fly
The first select costs 5 I/Os.
PNO=P-Num
Since the result is the size of one buffer, it can reside in
Write T1 Write T2 main memory to do the join. Thus, the cost of the first join
BNL σB-Date>1961-01-29
is the cost of scanning Works: 50 I/Os
The result of the first join is estimated at 20000/ 500 *100 =
pipeline PNO=P-Num
4000 tuples. This is assuming a uniform distribution (i.e. the
Employee number of employees assigned per project is uniform.).
σP-Type=design
Since the distributions are assumed uniform: we have
Works 20,000 works tuples and 500 projects. That is 40 employees
per project. Since we have 100 projects with type “design”,
Project that gives us 4000 tuples.
At 120 tuples per page, the result is about 34 pages (exactly 33 and a third). Thus writing T1 costs
34 I/Os. The cost of the second select is 100 I/Os and writing T2 costs 75 I/Os. The cost of the
second join is 34+ 34/4 *75 = 709 I/Os.
Thus the total cost for this plan is 5+50+34+100+75+709= 973 I/Os.
π E-Name, Salary
The first select costs 5 I/Os
On the fly
The result can fit in one buffer and can be sorted in main
SMJ memory.
Sort T2
PNO=P-Num
Sorting Works on PNO costs 2*log5(50) * 50 = 2*3*50 =
Write T2 Sort+Write T3 300 I/Os. The SM join would cost 50 I/Os since the outer
relation fits in memory.
SMJ σ B-Date>1961-01-29 Writing T2 costs 34 I/Os (see above) and sorting T2 on
pipeline PNO=P-Num
ESIN costs 2*3*34 = 204 I/Os
Employee Selecting Employees costs 100 I/Os for scanning and 75
σ Sort+Write T1
P-Type=design
I/Os to write T3. Sorting T3 on SIN costs 2*3*75 = 450
Works I/Os. The final join costs 75+34 = 109 I/Os.
Thus the total cost for this plan is
Project
5+300+50+34+204+100+75++450+109=1327 I/Os.
π E-Name, Salary
On the fly
The first select costs 5 I/Os.
SMJ Since the result is the size of one buffer, it can reside in
Sort T1
PNO=P-Num
main memory to do the join. Thus, the cost of the first join
Write T1 Sort+Write T2 is the cost of scanning Works: 50 I/Os
Writing T1 costs 34 I/Os (see above) and sorting T1 on
BNL σ B-Date>1961-01-29 ESIN costs 2*3*34 = 204 I/Os
pipeline PNO=P-Num
Selecting Employees costs 100 I/Os for scanning and 75
Employee I/Os to write T3. Sorting T3 on SIN costs 2*3*75 = 450
σP-Type=design
I/Os. The final join costs 75+34 = 109 I/Os.
Works Thus the total cost for this plan is
5+50+34+204+100+75++450+109=1027 I/Os.
Project
The best plan among these three is to use Bloc-Nested Loops join for both joins.