Final
Final
You have from 1:00 to 3:00pm to complete the following questions. The exam is closed-note and closed-book.
Good luck!
1. If the SQL statement SELECT C1, C2, C3 FROM T4 WHERE C2=’Smith’ is frequently executed,
which column(s) should be considered for indexing based only on the statement itself?
(a) C1 only
(b) C2 only
(c) C3 only
(d) C1 and C2
(e) C1, C2, and C3
2. If the SQL statement SELECT R1.A, R2.B FROM R1, R2 WHERE R1.K = R2.F AND R2.K = 10
is frequently executed, which indexes will prove most useful?
(a) Index on R1.K and index on R2.K
(b) Index on R1.A and index on R2.B
(c) Index on R1.K and index on R2.F
(d) Composite index on (R2.K, R2.F)
3. What can indexes accomplish (compared to having no index)?
(a) Reduce the total number of disk pages read during query processing.
(b) Reduce the total disk seek time during query processing.
(c) Reduce the total number of disk pages written during updates.
(d) a and b
(e) All of the above
(a) using an index on the table scanned by the outer loop to find needed records
(b) using an index on the table scanned by the inner loop to find needed records
(c) scanning the larger table in the outer loop
(d) scanning the smaller table in the outer loop
(e) a and c
(f) b and d
10. After translating a query into a relational expression, the expression can be optimized by
(a) using equivalence rules to find alternative expressions that might cost less
(b) using equivalence rules to make the expression tree more balanced
(c) using equivalence rules to do the cheaper operations first when possible
(d) a and c
(e) none of the above
11. Heuristics for query optimization include:
(a) performing selection operations as early as possible
(b) performing projections early
(c) restricting multiple join expressions to simple forms such as left-deep trees
(d) a and b
(e) all of the above
12. If you can build an index that supports index-only execution plans, is there any reason not to?
(a) The queries for which index-only plans would be available are rare.
(b) The index would include almost all fields in the table itself.
(c) The expected cost of keeping the index updated outweights the savings in query processing.
(d) All of the above.
(e) None of the above—I would always support index-only execution plans if possible.
15. What does ACID stand for in the context of DBMS transactions?
(a) Atomicity, Consistency, Isolation, and Durability
(b) Analysis Console for Intrusion Databases
(c) Atomicity, Consistency, Isolation, and Data
(d) Automatic Classification and Interpretation of Data
(e) Advanced Computing Information Database
16. Enforcing serializability in concurrent schedules ensures which two of the four desired properties for
transactions?
(a) Atomicity and consistency
(b) Atomicity and isolation
(c) Atomicity and durability
(d) Consistency and isolation
(e) Consistency and durability
(f) Isolation and durability
17. The recovery manager ensures which two of the four desired properties for transactions?
19. Checkpointing is a technique that can reduce recovery time after a crash. Which of the following is
true?
(a) When recovering, the log only needs to be scanned back to the most recent checkpoint.
(b) Checkpointing is automatically performed after every transaction commit.
(c) Checkpoints are saved after every update to the database.
(d) a and c
(e) a, b, and c
20. Which of the following is true about updated pages in the buffer pool?
r1 (A); w1 (A); r2 (A); w2 (A); r2 (B); w2 (B); commit2 ; r1 (B); w1 (B); commit1
Also suppose there are hash indexes on Students.sid, Enrolled.sid, and Enrolled.course.
Describe how you would compute each operation in the query execution tree below. For each operation,
provide an algorithm name or write “on the fly”. Indicate whether the results of the operation will be
materialized to disk or pipelined to the next operation. Be sure to indicate where indexes are used.
2. Query optimization (8 points) Suppose a user wants to obtain a list of students with GPAs under 3
that have taken 4-credit hour courses at any point in their academic career. The following relational
algebra expressions are equivalent for the Students/Courses/Enrolled schema defined in problem 1.
Which one is most likely to provide the most efficient query processing time? Justify your answer by
explaining why none of the other expressions could be more efficient than the one you choose. (You do
not have to compute total execution costs, though you may if you wish. Assume all joins are natural
joins.)
• πname,course,semester (σcredits=4∧gpa<3 (Students ./ (Enrolled ./ Courses)))
• πname,course,semester ((σcredits=4 (Courses) ./ Enrolled) ./ σgpa<3 (Students))
• πname,course,semester (σgpa<3 (Students) ./ (σcredits=4 (Courses) ./ Enrolled))
<START T1>
<T1,A,10>
<START T2>
<T2,B,20>
<T1,C,30>
<T2,D,40>
<COMMIT T2>
<T1,E,50>
<COMMIT T1>
Suppose the last record that appears on disk at the time of a crash is <COMMIT U>. What will the
recovery manager do to recover from this crash in terms of updates to the disk and to the log?
4. Serializability (8 points) Suppose the transaction manager produces the follow schedule for transac-
tions T1 , T2 , T3 access data objects A, B, C:
(ri (O) indicates a read by transaction Ti on data object O; wi (O) indicates a write by transaction Ti
on data object O.)
(a) Identify and list any conflicts between transactions.
(b) Is this schedule serializable? If so, give the equivalent serial schedule. If not, explain why not.
T1 :r1 (X); w1 (Y );
T2 :r2 (Y ); w2 (X);
For each of the following schedules, indicate whether it can be generated by 2PL, strict 2PL, both, or
neither by circling your answer from the choices below.
• `1 (X); r1 (X); `1 (Y ); u1 (X); `2 (X); w1 (Y ); u1 (Y ); `2 (Y ); r2 (Y ); w2 (X); u2 (Y ); u2 (X)
2PL / S-2PL / both / neither
• `2 (Y ); r2 (Y ); u2 (Y ); `2 (X); w2 (X); u2 (X); `1 (X); r1 (X); `1 (Y ); w1 (Y ); u2 (X); u2 (Y )
2PL / S-2PL / both / neither
• `1 (X); `1 (Y ); r1 (X); w1 (Y ); u1 (X); u1 (Y ); `2 (Y ); `2 (X); r2 (Y ); w2 (X); u2 (Y ); u2 (X)
2PL / S-2PL / both / neither
6. Parallel query processing (8 points) Consider the following tables:
Suppose there is a parallel database system operating on two loosely-coupled (share-nothing) proces-
sors. Processor P1 ’s disk stores rows 1–3 of Lawyers and rows 4–6 of Firms. Processor P2 ’s disk stores
rows 4–5 of Lawyers and row 1–3 of Firms. Describe the steps that need to be taken to compute a
parallel join of the two tables on firmName and firmLoc. Be sure your description allows both proces-
sors to be working simultaneously (even though in this particular case it might be more efficient for
one processor to do all the work).
7. XML (6 points; extra credit) Write out a well-formed, valid XML file that uses the DTD below.
Include at least 10 elements and at least one attribute.
8. Association rules (6 points; extra credit) The table below provides lists of items that a customer
purchased together.
Sale Items
t1 Bread, Jelly, PeanutButter
t2 Bread, PeanutButter
t3 Bread, Milk, PeanutButter
t4 Beer, Bread
t5 Beer, Milk
• Provide two association rules with support greater than or equal to 60%. (Hint: X → Y and
Y → X are two different rules.)
• Provide four association rules with confidence greater than or equal to 50%.