hw3 Sols
hw3 Sols
IMPORTANT:
• Upload this PDF with your answers to Gradescope by 11:59pm on Sunday Oct 24, 2021.
• Plagiarism: Homework may be discussed with other students, but all homework is to be
completed individually.
• You have to use this PDF for all of your answers.
For your information:
• Graded out of 100 points; 2 questions total
• Rough time estimate: ≈ 1 - 2 hours (0.5 - 1 hours for each question)
Revision : 2021/11/16 01:04
1
15-445/645 (Fall 2021) Homework #3 Page 2 of 5
(b) [5 points] Again, assuming that the DBMS has six buffers. What is the total I/O cost to
sort the file?
2 60,000,000 120,000,000 2 144,000,000 2 240,000,000 2 480,000,000
(c) [10 points] What is the smallest number of buffers B that the DBMS can sort the target
file using only two passes?
2 172 2 173 2 174 2,450 2 2,451 2 2,452 2 2,827 2 2,828
2 2,829 2 3,999,999 2 4,000,000 2 4,000,001
Solution: We want B where N ≤ B × (B − 1). If B = 2450, then 6, 000, 000 ≤
2050 × 2449 = 6, 000, 050; any smaller value for B would fail.
(d) [10 points] What is the smallest number of buffers B that the DBMS can sort the target
file using only six passes?
2 14 15 2 16 2 1,240 2 1,241 2 1,242 2 1,256 2 1,257
2 1,258 2 2,934 2 2,935 2 2,936 2 3,999,999 2 4,000,000
2 4,000,001
Solution: B × (B − 1)5 = 15 × 14 × 14 × 14 × 14 × 14 = 8, 067, 360. Any smaller
value of B would fail.
(e) [5 points] Suppose the DBMS has twenty-four buffers. What is the largest database file
(expressed in terms of N , the number of pages) that can be sorted with external merge
Question 1 continues. . .
15-445/645 (Fall 2021) Homework #3 Page 3 of 5
Homework #3 continues. . .
15-445/645 (Fall 2021) Homework #3 Page 4 of 5
Answer the following questions on computing the I/O costs for the joins. You can assume the
simplest cost model where pages are read and written one at a time. You can also assume that
you will need one buffer block to hold the evolving output block and one input block to hold
the current input block of the inner relation. You may ignore the cost of the writing of the final
results.
(a) [5 points] Block nested loop join with R as the outer relation and S as the inner relation:
2 11,200 2 23,000 56,400 2 85,000 2 92,600
M
Solution: M + d B−2 e × N = 1, 400 + d 1,400
58
e × 2, 200 = 1, 400 + 55, 000 = 56, 400
(b) [5 points] Block nested loop join with S as the outer relation and R as the inner relation:
2 31,200 2 43,000 2 43,600 2 52,900 55,400
N
Solution: N + d B−2 e × M = 2, 200 + d 2,200
58
e × 1, 400 = 2, 200 + 53, 200 = 55, 400
(c) Hash join with S as the outer relation and R as the inner relation. You may ignore recursive
partitioning and partially filled blocks.
i. [5 points] What is the cost of the partition phase?
2 2,800 2 4,400 2 5,000 2 5,800 7,200
Solution: 2 × (M + N ) = 2 × (1, 400 + 2, 200) = 2 × 3, 600 = 7, 200
ii. [5 points] What is the cost of the probe phase?
2 2,800 2 4,400 3,600 2 4,800 2 7,200
Solution: (M + N ) = (1, 400 + 2, 200) = 3, 600
(d) [10 points] Assume that the tables do not fit in main memory and that a high cardinality
of distinct values hash to the same bucket using your hash function h1 . Which of the
following approaches works the best?
Question 2 continues. . .
15-445/645 (Fall 2021) Homework #3 Page 5 of 5
2 Create hashtables for the inner and outer relation using h1 and rehash into an embed-
ded hash table using h1 for large buckets
Create hashtables for the inner and outer relation using h1 and rehash into an
embedded hash table using h2 != h1 for large buckets
2 Use linear probing for collisions and page in and out parts of the hashtable needed
at a given time
2 Create 2 hashtables half the size of the original one, run the same hash join algo-
rithm on the tables, and then merge the hashtables together
Solution: Use Grace hash join with recursive partitioning, which is what the correct
option describes.
(e) Sort-merge join with S as the outer relation and R as the inner relation:
i. [4 points] What is the cost of sorting the tuples in R on attribute a?
2 3,000 5,600 2 7,400 2 9,600 2 10,800
Solution: passes = 1 + dlogB−1 (d M
B
e)e = 1 + dlog59 (d 1,400
60
e)e = 1 + 1 = 2
2M × passes = 2 ∗ 1, 400 ∗ 2 = 5, 600
ii. [4 points] What is the cost of sorting the tuples in S on attribute a?
2 3,400 2 4,000 2 6,400 2 7,600 8,800
Solution: passes = 1 + dlogB−1 (d N
B
e)e = 2
2N × passes = 2 ∗ 2, 200 ∗ 2 = 8, 800
iii. [10 points] What is the cost of the merge phase assuming there are no duplicates in
the join attribute?
2 1,400 2 1,800 3,600 2 4,400 2 4,800
Solution: M + N = 1, 400 + 2, 200 = 3, 600
iv. [10 points] What is the cost of the merge phase in the worst-case scenario?
2 1,080,000 2 2,880,000 3, 080,000 2 4, 750,000 2 10,080,000
Solution: M × N = 1, 400 × 2, 200 = 3, 080, 000
v. [2 points] Now consider joining R, S and then joining the result with T. What is the
cost of the merge phase assuming there are no duplicates in the join attribute?
2 1,000 2 2,000 3,000 2 5,000 2 2,000,000
Solution: K + L = 2, 000 + 1, 000 = 3, 000
End of Homework #3