0% found this document useful (0 votes)
200 views10 pages

Final

The document provides details for a final exam for a database systems course. It includes 52 multiple choice questions and short answer questions covering topics like indexing, query optimization, transactions, and distributed databases. Students have 2 hours to complete the exam, which is closed book and note. Good luck is wished to students taking the exam.

Uploaded by

mimisbhatu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
200 views10 pages

Final

The document provides details for a final exam for a database systems course. It includes 52 multiple choice questions and short answer questions covering topics like indexing, query optimization, transactions, and distributed databases. Students have 2 hours to complete the exam, which is closed book and note. Good luck is wished to students taking the exam.

Uploaded by

mimisbhatu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

CISC437/637 Database Systems Final Exam

You have from 1:00 to 3:00pm to complete the following questions. The exam is closed-note and closed-book.
Good luck!

Multiple Choice (2 points each; 52 total)

1. If the SQL statement SELECT C1, C2, C3 FROM T4 WHERE C2=’Smith’ is frequently executed,
which column(s) should be considered for indexing based only on the statement itself?

(a) C1 only
(b) C2 only
(c) C3 only
(d) C1 and C2
(e) C1, C2, and C3
2. If the SQL statement SELECT R1.A, R2.B FROM R1, R2 WHERE R1.K = R2.F AND R2.K = 10
is frequently executed, which indexes will prove most useful?
(a) Index on R1.K and index on R2.K
(b) Index on R1.A and index on R2.B
(c) Index on R1.K and index on R2.F
(d) Composite index on (R2.K, R2.F)
3. What can indexes accomplish (compared to having no index)?

(a) Reduce the total number of disk pages read during query processing.
(b) Reduce the total disk seek time during query processing.
(c) Reduce the total number of disk pages written during updates.
(d) a and b
(e) All of the above

4. Which of the following is true?


(a) Primary indexes are always on primary keys
(b) Secondary indexes are always on foreign keys
(c) Primary indexes are always clustered
(d) a and b
(e) none of the above
5. Which index or storage type is ideal for the query SELECT C1 FROM R1 WHERE C2 BETWEEN
12 AND 20? You may assume most records have values of C2 outside of the range 12–20.

(a) Hash index on C2


(b) Clustered B+-tree index on C2
(c) Unclustered B+-tree index on C2
(d) Sorted file on C2
(e) Composite B+-tree index on C1, C2
6. What if all of the records in problem 5 had C2 values between 12 and 20?
(a) Hash index on C2
(b) Clustered B+-tree index on C2
(c) Unclustered B+-tree index on C2
(d) Sorted file on C2
(e) Composite B+-tree index on C1, C2
7. Consider the SQL query SELECT * FROM R1 WHERE C1=4 AND C2=10 AND C3=11. Suppose
R1 has one million records stored in 100 disk pages, C1 has 10,000 unique values, C2 has 100,000
unique values, and C3 has 1,000 unique values, and values are distributed uniformly. Which relational
algebra expression leads to the most efficient query execution plan?
(a) σC1=4 (σC2=10 (σC3=11 (R1)))
(b) σC3=11 (σC1=4 (σC2=10 (R1)))
(c) σC2=10 (σC1=4 (σC3=11 (R1)))
(d) σC3=11 (σC2=10 (σC1=4 (R1)))
(e) all four are equally efficient
8. Does your answer to number 7 change if there is a clustered B+-tree index on C3?
(a) σC1=4 (σC2=10 (σC3=11 (R1)))
(b) σC3=11 (σC1=4 (σC2=10 (R1)))
(c) σC2=10 (σC1=4 (σC3=11 (R1)))
(d) σC3=11 (σC2=10 (σC1=4 (R1)))
(e) all four are equally efficient
9. The efficiency of the nested-loop algorithm for computing natural or equi-joins can be improved by:

(a) using an index on the table scanned by the outer loop to find needed records
(b) using an index on the table scanned by the inner loop to find needed records
(c) scanning the larger table in the outer loop
(d) scanning the smaller table in the outer loop
(e) a and c
(f) b and d
10. After translating a query into a relational expression, the expression can be optimized by
(a) using equivalence rules to find alternative expressions that might cost less
(b) using equivalence rules to make the expression tree more balanced
(c) using equivalence rules to do the cheaper operations first when possible
(d) a and c
(e) none of the above
11. Heuristics for query optimization include:
(a) performing selection operations as early as possible
(b) performing projections early
(c) restricting multiple join expressions to simple forms such as left-deep trees
(d) a and b
(e) all of the above
12. If you can build an index that supports index-only execution plans, is there any reason not to?
(a) The queries for which index-only plans would be available are rare.
(b) The index would include almost all fields in the table itself.
(c) The expected cost of keeping the index updated outweights the savings in query processing.
(d) All of the above.
(e) None of the above—I would always support index-only execution plans if possible.

13. In SQL, users get privileges


(a) only from the database administrator
(b) from users with the same privileges and GRANT OPTION privilege
(c) by means of a GRANT statement
(d) a and c
(e) b and c
14. Which of the following is not true about granted privileges in SQL?
(a) If user A grants a privilege to user B and user A subsequently loses that privilege, user B may
still have that privilege.
(b) Privileges can be granted to users before the users are known to the DBMS.
(c) A user can be granted a privilege without giving that user authorization to grant that privilege
to others.
(d) If privileges are revoked from one user, similar privileges may be automatically revoked from other
users as well.

15. What does ACID stand for in the context of DBMS transactions?
(a) Atomicity, Consistency, Isolation, and Durability
(b) Analysis Console for Intrusion Databases
(c) Atomicity, Consistency, Isolation, and Data
(d) Automatic Classification and Interpretation of Data
(e) Advanced Computing Information Database
16. Enforcing serializability in concurrent schedules ensures which two of the four desired properties for
transactions?
(a) Atomicity and consistency
(b) Atomicity and isolation
(c) Atomicity and durability
(d) Consistency and isolation
(e) Consistency and durability
(f) Isolation and durability
17. The recovery manager ensures which two of the four desired properties for transactions?

(a) Atomicity and consistency


(b) Atomicity and isolation
(c) Atomicity and durability
(d) Consistency and isolation
(e) Consistency and durability
(f) Isolation and durability
18. Suppose a database is read-only—no transactions change any data in the database. If serializability
must be supported, which of the following is true?

(a) No locking is necessary.


(b) Only read locks are necessary and they need to be held until end of transaction.
(c) Only read locks are necessary, but they can be released as soon as the read is complete.
(d) Both read and write locks are necessary and locking must be done in two phases.
(e) None of the above.

19. Checkpointing is a technique that can reduce recovery time after a crash. Which of the following is
true?
(a) When recovering, the log only needs to be scanned back to the most recent checkpoint.
(b) Checkpointing is automatically performed after every transaction commit.
(c) Checkpoints are saved after every update to the database.
(d) a and c
(e) a, b, and c
20. Which of the following is true about updated pages in the buffer pool?

(a) Updated pages must be written immediately after the update.


(b) Updated pages must be written after a transaction commits but before the transaction log is
written to disk.
(c) Updated pages must be written after a transaction commits but after the transaction log is written
to disk.
(d) An updated page must be written when it is swapped out of the buffer pool.
21. Which of the following is true of a distributed DBMS?
(a) There is always one central server that distributes processing to other servers.
(b) Table columns may be stored on servers in different physical locations.
(c) Data can always be redistributed for optimal query processing.
(d) All of the above.
22. What is the purpose of classification algorithms?
(a) Determine the most likely value of an unknown categorical field of a record.
(b) Determine the most likely value of an unknown numerical field of a record.
(c) Use data in which all field values are known to produce classification rules to be applied to new
data.
(d) a and c
(e) b and c
(f) a, b, and c
23. Which property is guaranteed by the two-phase locking protocol?
(a) serial schedules
(b) serializable schedules
(c) recoverable schedules
(d) avoiding cascading rollback
24. What information can an inverted index contain?
(a) the presence of words in documents
(b) the number of times words appear in documents
(c) the relative positions at which words appear in documents
(d) the number of documents a word appears in
(e) some of the above
(f) all of the above
25. What kind of conflict (if any) is present in the transaction schedule below?

r1 (A); w1 (A); r2 (A); w2 (A); r2 (B); w2 (B); commit2 ; r1 (B); w1 (B); commit1

(a) read-write conflict


(b) write-read conflict
(c) write-write conflict
(d) no conflict
26. With regards to pages in the buffer pool, which pair of buffer management policies give the most
efficient operation of the recovery manager?
(a) stealing and forcing
(b) stealing and no forcing
(c) no forcing and stealing
(d) no forcing and no stealing
Short Answer (48 points+12 extra credit) Answer the following questions.

1. Query processing (8 points) Suppose we have the following relational schema:

Students(sid:integer, name:string, gpa:real)


Courses(course:string, credits:integer)
Enrolled(sid:integer, course:string, semester:string)

Also suppose there are hash indexes on Students.sid, Enrolled.sid, and Enrolled.course.
Describe how you would compute each operation in the query execution tree below. For each operation,
provide an algorithm name or write “on the fly”. Indicate whether the results of the operation will be
materialized to disk or pipelined to the next operation. Be sure to indicate where indexes are used.
2. Query optimization (8 points) Suppose a user wants to obtain a list of students with GPAs under 3
that have taken 4-credit hour courses at any point in their academic career. The following relational
algebra expressions are equivalent for the Students/Courses/Enrolled schema defined in problem 1.
Which one is most likely to provide the most efficient query processing time? Justify your answer by
explaining why none of the other expressions could be more efficient than the one you choose. (You do
not have to compute total execution costs, though you may if you wish. Assume all joins are natural
joins.)
• πname,course,semester (σcredits=4∧gpa<3 (Students ./ (Enrolled ./ Courses)))
• πname,course,semester ((σcredits=4 (Courses) ./ Enrolled) ./ σgpa<3 (Students))
• πname,course,semester (σgpa<3 (Students) ./ (σcredits=4 (Courses) ./ Enrolled))

3. Recovery management (8 points) Transactions T1 and T2 access data objects A, B, C, D, E. Con-


sider the following sequence of log records for UNDO logging:

<START T1>
<T1,A,10>
<START T2>
<T2,B,20>
<T1,C,30>
<T2,D,40>
<COMMIT T2>
<T1,E,50>
<COMMIT T1>

Suppose the last record that appears on disk at the time of a crash is <COMMIT U>. What will the
recovery manager do to recover from this crash in terms of updates to the disk and to the log?
4. Serializability (8 points) Suppose the transaction manager produces the follow schedule for transac-
tions T1 , T2 , T3 access data objects A, B, C:

r1 (A); r2 (A); r3 (B); w1 (A); r2 (C); r2 (B); w2 (B); w1 (C)

(ri (O) indicates a read by transaction Ti on data object O; wi (O) indicates a write by transaction Ti
on data object O.)
(a) Identify and list any conflicts between transactions.

(b) Is this schedule serializable? If so, give the equivalent serial schedule. If not, explain why not.

5. Concurrency control (8 points) Consider the following transactions:

T1 :r1 (X); w1 (Y );
T2 :r2 (Y ); w2 (X);

For each of the following schedules, indicate whether it can be generated by 2PL, strict 2PL, both, or
neither by circling your answer from the choices below.
• `1 (X); r1 (X); `1 (Y ); u1 (X); `2 (X); w1 (Y ); u1 (Y ); `2 (Y ); r2 (Y ); w2 (X); u2 (Y ); u2 (X)
2PL / S-2PL / both / neither
• `2 (Y ); r2 (Y ); u2 (Y ); `2 (X); w2 (X); u2 (X); `1 (X); r1 (X); `1 (Y ); w1 (Y ); u2 (X); u2 (Y )
2PL / S-2PL / both / neither
• `1 (X); `1 (Y ); r1 (X); w1 (Y ); u1 (X); u1 (Y ); `2 (Y ); `2 (X); r2 (Y ); w2 (X); u2 (Y ); u2 (X)
2PL / S-2PL / both / neither
6. Parallel query processing (8 points) Consider the following tables:

ssn name firmName firmLoc


111-11-1111 Bob Loblaw Dewey, Cheatham, and Howe Boston
222-22-2222 Ally McBeal Payne and Feares Los Angeles
222-22-3333 Maury Levy Baker and Launder Baltimore
333-44-5555 Saul Goodman Recht and Greef Albuquerque
555-55-6666 Atticus Finch Baker and Launder Baltimore
(a) Lawyers(ssn:string, name:string, firmName:string, firmLoc:string)

firmName firmLoc employees


Dewey, Cheatham, and Howe Boston 72
Dewey, Cheatham, and Howe San Francisco 95
Payne and Feares Los Angeles 55
Recht and Greef Albuquerque 120
Pope and Gentile Milwaukee 100
Boring and Leach Los Angeles 66
(b) Firms(firmName:string, firmLoc:string, employees:integer)

Suppose there is a parallel database system operating on two loosely-coupled (share-nothing) proces-
sors. Processor P1 ’s disk stores rows 1–3 of Lawyers and rows 4–6 of Firms. Processor P2 ’s disk stores
rows 4–5 of Lawyers and row 1–3 of Firms. Describe the steps that need to be taken to compute a
parallel join of the two tables on firmName and firmLoc. Be sure your description allows both proces-
sors to be working simultaneously (even though in this particular case it might be more efficient for
one processor to do all the work).
7. XML (6 points; extra credit) Write out a well-formed, valid XML file that uses the DTD below.
Include at least 10 elements and at least one attribute.

<!ELEMENT SONG (TITLE, COMPOSER+, PRODUCER*, PUBLISHER*, LENGTH?, YEAR?, ARTIST+)>


<!ELEMENT TITLE (#PCDATA)>
<!ELEMENT COMPOSER (#PCDATA)>
<!ELEMENT PRODUCER (#PCDATA)>
<!ELEMENT PUBLISHER (#PCDATA)>
<!ELEMENT YEAR (#PCDATA)>
<!ELEMENT ARTIST (#PCDATA)>
<!ATTLIST SONG LENGTH CDATA #IMPLIED>

8. Association rules (6 points; extra credit) The table below provides lists of items that a customer
purchased together.

Sale Items
t1 Bread, Jelly, PeanutButter
t2 Bread, PeanutButter
t3 Bread, Milk, PeanutButter
t4 Beer, Bread
t5 Beer, Milk

• Provide two association rules with support greater than or equal to 60%. (Hint: X → Y and
Y → X are two different rules.)

• Provide four association rules with confidence greater than or equal to 50%.

You might also like