Query Optimization: Practice Exercises
Query Optimization: Practice Exercises
16
Query Optimization
Practice Exercises
16.1 Download the university database schema and the large university dataset from
dbbook.com. Create the university schema on your favorite database, and load
the large university dataset. Use the explain feature described in Note 16.1 on
page 746 to view the plan chosen by the database, in different cases as detailed
below.
a. Write a query with an equality condition on student.name (which does
not have an index), and view the plan chosen.
b. Create an index on the attribute student.name, and view the plan chosen
for the above query.
c. Create simple queries joining two relations, or three relations, and view
the plans chosen.
d. Create a query that computes an aggregate with grouping, and view the
plan chosen.
e. Create an SQL query whose chosen plan uses a semijoin operation.
f. Create an SQL query that uses a not in clause, with a subquery using
aggregation. Observe what plan is chosen.
g. Create a query for which the chosen plan uses correlated evaluation (the
way correlated evaluation is represented varies by database, but most
databases would show a filter or a project operator with a subplan or
subquery).
h. Create an SQL update query that updates a single row in a relation. View
the plan chosen for the update query.
55
56 Chapter 16 Query Optimization
i. Create an SQL update query that updates a large number of rows in a re-
lation, using a subquery to compute the new value. View the plan chosen
for the update query.
16.2 Show that the following equivalences hold. Explain how you can apply them
to improve the efficiency of certain queries:
a. E1 ⋈θ (E2 − E3 ) ≡ (E1 ⋈θ E2 − E1 ⋈θ E3 ).
b. σθ ( A γF (E)) ≡ A γF (σθ (E)), where θ uses only attributes from A.
c. σθ (E1 ⟕ E2 ) ≡ σθ (E1 ) ⟕ E2 , where θ uses only attributes from E1 .
16.3 For each of the following pairs of expressions, give instances of relations that
show the expressions are not equivalent.
a. ΠA (r − s) and ΠA (r) − ΠA (s).
b. σB<4 ( A γmax(B) as B (r)) and A γmax(B) as B (σB<4 (r)).
c. In the preceding expressions, if both occurrences of max were replaced
by min, would the expressions be equivalent?
d. (r ⟖ s) ⟖ t and r ⟖(s ⟖ t)
In other words, the natural right outer join is not associative.
e. σθ (E1 ⟕ E2 ) and E1 ⟕ σθ (E2 ), where θ uses only attributes from E2 .
16.4 SQL allows relations with duplicates (Chapter 3), and the multiset version of
the relational algebra is defined in Note 3.1 on page 80, Note 3.2 on page 97,
and Note 3.3 on page 108. Check which of the equivalence rules 1 through 7.b
hold for the multiset version of the relational algebra.
16.5 Consider the relations r1 (A, B, C), r2 (C, D, E), and r3 (E, F ), with primary keys
A, C, and E, respectively. Assume that r1 has 1000 tuples, r2 has 1500 tuples,
and r3 has 750 tuples. Estimate the size of r1 ⋈ r2 ⋈ r3 , and give an efficient
strategy for computing the join.
16.6 Consider the relations r1 (A, B, C), r2 (C, D, E), and r3 (E, F ) of Practice Exer-
cise 16.5. Assume that there are no primary keys, except the entire schema.
Let V (C, r1 ) be 900, V (C, r2 ) be 1100, V (E, r2 ) be 50, and V (E, r3 ) be 100.
Assume that r1 has 1000 tuples, r2 has 1500 tuples, and r3 has 750 tuples. Es-
timate the size of r1 ⋈ r2 ⋈ r3 and give an efficient strategy for computing
the join.
16.7 Suppose that a B+ -tree index on building is available on relation department
and that no other index is available. What would be the best way to handle the
following selections that involve negation?
a. σ¬ (building < “Watson”) (department)
Practice Exercises 57
This number is known as the Catalan number, and its derivation can be found
in any standard textbook on data structures or algorithms.
16.13 Show that the lowest-cost join order can be computed in time O(3n ). Assume
that you can store and look up information about a set of relations (such as
the optimal join order for the set, and the cost of that join order) in constant
time. (If you find this exercise difficult, at least show the looser time bound of
O(22n ).)
16.14 Show that, if only left-deep join trees are considered, as in the System R opti-
mizer, the time taken to find the most efficient join order is around n2n . Assume
that there is only one interesting sort order.
16.15 Consider the bank database of Figure 16.9, where the primary keys are under-
lined. Construct the following SQL queries for this relational database.
a. Write a nested query on the relation account to find, for each branch
with name starting with B, all accounts with the maximum balance at
the branch.
b. Rewrite the preceding query without using a nested subquery; in other
words, decorrelate the query, but in SQL.
c. Give a relational algebra expression using semijoin equivalent to the
query.
d. Give a procedure (similar to that described in Section 16.4.4) for decor-
relating such queries.