0% found this document useful (0 votes)
35 views10 pages

Midterm 02 Solutions

The exam covers B+ trees, extensible hash indices, relational algebra, and query processing. For B+ trees, students are asked to determine minimum node sizes, provide examples of insertion sequences, and draw the resulting trees. For extensible hash indices, students are asked to show the index after inserting records. For relational algebra, students prove equivalences and derive expressions. For query processing, students optimize queries over relations with foreign keys.

Uploaded by

Hasanul Kabir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views10 pages

Midterm 02 Solutions

The exam covers B+ trees, extensible hash indices, relational algebra, and query processing. For B+ trees, students are asked to determine minimum node sizes, provide examples of insertion sequences, and draw the resulting trees. For extensible hash indices, students are asked to show the index after inserting records. For relational algebra, students prove equivalences and derive expressions. For query processing, students optimize queries over relations with foreign keys.

Uploaded by

Hasanul Kabir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Midterm Exam

CSE232A, Winter 2002

February 21, 2002

Name:

Brief Directions:

² Write clearly: First, you don't want me to spend the whole week grading, do you? Second,
it's good for you to write clearly!

² Open books, notes, even databases...

² Good luck!

1
1 B+ Trees (20 points)
Consider a B+ tree where n = 4, i.e., the maximum number of keys in a node is 4 and the maximum
number of pointers is 5 at internal nodes and 4 at leaf nodes. Assume that the B+ tree initially
consists of a single node, which is both the root and the only leaf, that has the key 1.

1. 2 points What is the minimum number of keys that may appear in a non-root internal node?

Solution: d n+1
2 e¡1=2

2. 2 points What is the minimum number of keys that may appear in a non-root leaf node?

Solution: b n+1
2 c=2

2
3. 8 points1 Consider the set of keys S = f1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13g. Write down a
sequence of inserting the keys of S such that at the end the resulting B+ tree has 3 levels and
is as empty as possible, i.e., as many nodes as possible have the minimum number of nodes.
Provide the B+ tree snapshots that correspond to the points right after node splits. (Use the
white space at the end of the page and the back side of this page.)

Solution: The sequence (1); 13; 12; 11; 10; 9; 8; 7; 6; 5; 4; 3; 2 produces the sequence of splits
and the resulting B+ tree shown below, assuming the splits are \3 nodes to the left, 2 nodes
to the right".

1 10 12 13

12

1 10 11 12 13
10 12

1 8 9 10 11 12 13

8 10 12

1 6 7 8 9 10 11 12 13

6 8 10 12

1 4 5 6 7 8 9 10 11 12 13

4 6 10 12

1 2 3 4 5 6 7 8 9 10 11 12 13

This is just one of the possible solutions. However, the resulting tree is the only one that
is possible if the splits are \3 nodes to the left, 2 nodes to the right". There is eactly one
possible resulting tree in the case of \2 nodes to the left, 3 nodes to the right".
1
May be time consuming.

3
4. 8 points2 Consider the set of keys S 0 = f1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; 20g.
Write down a sequence of inserting the keys of S 0 such that at the end the resulting B+ tree
is as full as possible, i.e., as many nodes as possible have the maximum number of nodes.
Provide the B+ tree snapshots that correspond to the points right after a node split. (Use
the next blank page.)

Remarks

² Be consistent in the way you split nodes.

² Assume that there are no duplicate nodes in the internal leaves.

Solution: There is exactly one resulting tree that has just two levels and, hence, is as full as
possible. It is produced as follows

1
Insert 2, 3 ,6
(no split)
5
1 2 3 6 Insert 5

1 2 3 5 6 In
se rt 7
, 9, 5 9
10

1 2 3 5 6 7 9 10

Insert
11, 13, 14

5 9 13

1 2 3 5 6 7 9 10 11 13 14

Insert
15, 17, 18
5 9 13 17

1 2 3 5 6 7 9 10 11 13 14 15 17 18

Insert
4, 8, 12, 19, 20
(no extra
splits)

5 9 13 17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

2
May be time consuming

4
2 Extensible Hash Index (10 points)
Assume that each bucket of an extensible hash index can ¯t exactly two records (each record is
a pair of hash key and pointer). Consider the following records, with the corresponding hash key
values.
Key hash key
a 0000
b 0001
c 0010
d 0011
e 0100
f 0110
g 1000

We insert the records in the order given above. Show the extensible hash index after all records
have been inserted.

Solution: As is the case with extensible hash indices, you do not have to consider the sequence
by which the records are inserted. The resulting index is

i=3
0000 3 000
0001
001
0010 3
010
0011
011
0100 32
0110 100

101
1000 1
110

111

5
3 Algebra (10 points)
Consider the following two de¯nitions of the semijoin operator .<, which may or may not be
equivalent.
² Direct (from page 255 of the textbook) The semijoin .< of relations R and S, written R.< S,
is the bag of tuples t in R such that there is at least one tuple in S that agrees with t in all
attributes that R and S have in common.
² Indirect Let us call a(R) the list of attributes of R. Then, it is

R.< S = ¼a(R) (R 1 S)

where 1 stands for the natural join.


1. 5 points Are the two de¯nition equivalent? Assume bag semantics for the algebra. If the
answer is yes provide proof, showing that, given arbitrary R and S, if a tuple t appears k
times in the result of R.< S according to the direct de¯nition, then the tuple t will appear k
times in the result of R.< S according to the indirect de¯nition. If the answer is no, provide
an example with an R table and an S table and show R.< S for the direct and the indirect
de¯nition.

Solution: No, they are not equivalent. Consider the following counterexample with rela-
tions R(A) and S(A; B):

A
R=
1

and
A B
S= 1 2
1 3

Then accoding to the direct de¯nition

A
R.< S =
1

But according to the indirect de¯nition


0 1
A B A
B C
R.< S = ¼A @ 1 2 A = 1
1 3 1

2. 5 points Consider the indirect de¯nition of .<. Assume that the schema of P is P (A; B; C; D)
and the schema of T is T (C; E). Prove the following, using the notation shown in the Ap-
pendix.

¼A;B ¾D=5^E=6 (P 1 T ) = ¼A;B [(¼A;B;C ¾D=5 P ).< (¼C ¾E=6 T )]

6
Solution: Exercising transformation rules from the notes and the book we have

¼A;B ¾D=5^E=6 (P 1 T ) =
¼A;B ¾D=5 ¾E=6 (P 1 T ) =
¼A;B ¾D=5 (P 1 ¾E=6 T ) =
¼A;B (¾D=5 (P ) 1 ¾E=6 (T )) =
¼A;B (¼A;B;C ¾D=5 (P ) 1 ¼C ¾E=6 (T )) =
¼A;B ¼A;B;C (¼A;B;C ¾D=5 (P ) 1 ¼C ¾E=6 (T )) =
¼A;B ((¼A;B;C ¾D=5 P ).< (¼C ¾E=6 T ))

4 Query Processing (32 points)


Consider the relations Cust(CID; N ame; City), Order(OID; CID; Date), and
LineItem(LID; OID; P roduct; Amount), where CID is a customer id and is a key for Cust, OID
is an order id and is a key for Order, and LID is a line item id and is a key for LineItem. In
addition the attribute CID of Order is a foreign key referring to the CID of Cust, that is, for
each CID c of Order there is exactly one tuple of Cust whose CID attribute is c. The OID of
LineItem is a foreign key referring to the OID of Order.
Assume the following statistics, where all numbers, except for product, correspond to millions.

T (Cust) = 1 V (Cust; CID) = 1


V (Cust; N ame) = 0:5
T (Order) = 20 V (Order; OID) = 20
V (Order; CID) = 1
T (LineItem) = 100 V (LineItem; LID) = 100
V (LineItem; OID) = 20
V (LineItem; P roduct) = 1000

Consider the following SQL query, which returns the total amount for each product that \Jones"
bought.

SELECT Product, SUM(Amount) AS Total


FROM Cust, Order, LineItem
WHERE Cust.CID = Order.CID AND Order.OID = LineItem.OID AND Cust.Name = 'Jones'
GROUPBY Product

1. 5 points Write an algebra expression that uses two cartesian products £, exactly one selection
!T otal operator and computes the SQL query.
¾ and the SU MP roduct;Amount7

Solution: Let us denote \Customer" by C, \Order" by O, and \LineItem" by L. Then it


is

!T otal ¾C:CID=O:CID^O:OID=L:OID^C:N ame=0 Jones0 ((C £ O) £ L)


SU MP roduct;Amount7

7
2. 5 points Show the series of transformations that transform the algebra expression of the
previous question into an expression where the cartesian products have been replaced by
joins and the selections are pushed as down (early) as possible.
Assume that the equation ¾R:A=S:A (R £ S) = R 1R:A=S:A S is one of the transformation
rules.

Solution:
!T otal ¾C:CID=O:CID^O:OID=L:OID^C:N ame=0 Jones0 ((C £ O) £ L) =
SU MP roduct;Amount7
!T otal ¾C:CID=O:CID ¾O:OID=L:OID^C:N ame=0 Jones0 ((C £ O) £ L) =
SU MP roduct;Amount7
!T otal ¾C:CID=O:CID ¾O:OID=L:OID ¾C:N ame=0 Jones0 ((C £ O) £ L) =
SU MP roduct;Amount7
!T otal ¾C:CID=O:CID ¾O:OID=L:OID (¾C:N ame=0 Jones0 (C £ O) £ L) =
SU MP roduct;Amount7
!T otal ¾C:CID=O:CID ¾O:OID=L:OID ((¾C:N ame=0 Jones0 (C) £ O) £ L) =
SU MP roduct;Amount7
!T otal ¾C:CID=O:CID ((¾C:N ame=0 Jones0 (C) £ O) 1O:OID=L:OID L) =
SU MP roduct;Amount7
!T otal (¾C:CID=O:CID (¾C:N ame=0 Jones0 (C) £ O) 1O:OID=L:OID L) =
SU MP roduct;Amount7
!T otal ((¾C:N ame=0 Jones0 (C) 1C:CID=O:CID O) 1O:OID=L:OID L) =
SU MP roduct;Amount7

3. 5 points Provide an additional (non-trivial) expression where the selections have been pushed
as down (early) as possible, but the join order is di®erent. No need to show the series of
transformations that led you to this expression.

Solution:

!T otal (¾C:N ame=0 Jones0 (C) 1C:CID=O:CID (O 1O:OID=L:OID L)) =


SU MP roduct;Amount7

4. 10 points So far you must have provided 2 algebra expressions with di®erent join orders.
For each one of them provide an estimate of the size of its intermediate results. Also, provide
an estimate of the size of the ¯nal result. To save time, just put the size numbers next to the
edges in the algebra expressions.

Solution: Let's start with the ¯rst plan


1M
T (¾C:N ame=0 Jones0 C) = 0:5M =2
T (¾C:N ame=0 Jones0 (C) 1C:CID=O:CID O) = 2 20M
1M = 40
T ((¾C:N ame=0 Jones0 (C) 1C:CID=O:CID O) 1O:OID=L:OID L) = 40 100M
20M = 200

And now the second plan


1M
T (¾C:N ame=0 Jones0 C) = 0:5M =2
T (O 1O:OID=L:OID L) = T (L) = 100M
V (O 1O:OID=L:OID L; O:CID) = 1M
T ((¾C:N ame=0 Jones0 C) 1C:CID=O:CID (O 1O:OID=L:OID L)) = 2 100M
1M = 200

If you reached that point you have got full points. So, what about the SU MP roduct;Amount7
!T otal
? The book provides upper and lower bound numbers as well as some pretty arbitrary esti-
mates for the duplicate elimination operation, which is identical to the problem at hand. If
you are a combinatorics freak you will recognize Stirling's numbers as the solution. More in
class!

8
5. 7 points Assume that you have got indices on all attributes. Consider the following simpli-
¯cation of the query:

SELECT Product, Amount


FROM Cust, Order, LineItem
WHERE Cust.CID = Order.CID AND Order.OID = LineItem.OID AND Cust.Name = 'Jones'

Apply the INGRES algorithm to get a join order. Indicate which is the small relation hyper-
edge that you pick in each step of the algorithm and show the resulting plan.

9
A What will get you full points in proving equivalence of algebraic
expressions
Assume that the exercise is to prove that

¾p^q (P 1 T ) = (¾p P ) 1 (¾q T )

where p refers to attributes of P only and q refers to attributes of T only.


The cleanest proofs are the ones where each step corresponds to application of one of the rules
in the notes. In our example, it goes as follows:

¾p^q (P 1 T ) =
¾p ¾q (P 1 T ) =
¾p (P 1 ¾q T ) =
(¾p P ) 1 (¾q T )

10

You might also like