Unit 4 Database Design and Query Processing
Unit 4 Database Design and Query Processing
UNIT-IV
A B C A B B C
1 A 1 1 A
2 B 2 2 B
r A,B(r) B,C(r)
A B C
A (r) B (r)
1 A
2 B
First Normal Form
• Domain is atomic if its elements are considered to be indivisible
units
• Examples of non-atomic domains:
• Set of names, composite attributes
• Identification numbers like CS101 that can be broken up into
parts
• A relational schema R is in first normal form if the domains of all
attributes of R are atomic
• Non-atomic values complicate storage and encourage redundant
(repeated) storage of data
• Example: Set of accounts stored with each customer, and set of
owners stored with each account
• We assume all relations are in first normal form
First Normal Form (Cont’d)
1 4
1 5
3 7
• is trivial (i.e., )
• is a superkey for R
F+=F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F +
for each pair of functional dependencies f1and f2 in F +
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F +
until F + does not change any further
result := α;
while (changes to result) do
for each in F do
begin
if result then result := result
end
Example of Attribute Set Closure
• R = (A, B, C, G, H, I)
• F = {A B
AC
CG H
CG I
B H}
• (AG)+
1. result = AG
2. result = ABCG (A C and A B)
3. result = ABCGH(CG H and CG AGBC)
4. result = ABCGHI (CG I and CG AGBCH)
• Is AG a candidate key?
1. Is AG a super key?
1. Does AG R? == Is (AG)+ R
2. Is any subset of AG a superkey?
1. Does A R? == Is (A)+ R
2. Does G R? == Is (G)+ R
Uses of Attribute Closure
There are several uses of the attribute closure algorithm:
• Testing for superkey:
• To test if is a superkey, we compute +, and check if +
contains all attributes of R.
• Testing functional dependencies
• To check if a functional dependency holds (or, in other
words, is in F+), just check if +.
• That is, we compute + by using attribute closure, and then
check if it contains .
• Is a simple and cheap test, and very useful
• Computing closure of F
• For each R, we find the closure +, and for each S +, we
output a functional dependency S.
Canonical Cover
• Sets of functional dependencies may have redundant
dependencies that can be inferred from the others
• For example: A C is redundant in: {A B, B C, A C}
• Parts of a functional dependency may be redundant
• E.g.: on RHS: {A B, B C, A CD} can be simplified
to
{A B, B C, A D}
• E.g.: on LHS: {A B, B C, AC D} can be simplified
to
{A B, B C, A D}
• Intuitively, a canonical cover of F is a “minimal” set of functional
dependencies equivalent to F, having no redundant dependencies
or redundant parts of dependencies
Extraneous Attributes
• Consider a set F of functional dependencies and the functional
dependency in F.
• Attribute A is extraneous in if A
and F logically implies (F – { }) {( – A) }.
• Attribute A is extraneous in if A
and the set of functional dependencies
(F – { }) { ( – A)} logically implies F.
• Note: implication in the opposite direction is trivial in each of the
cases above, since a “stronger” functional dependency always
implies a weaker one
• Example: Given F = {A C, AB C }
• B is extraneous in AB C because {A C, AB C} logically
implies A C (I.e. the result of dropping B from AB C).
• Example: Given F = {A C, AB CD}
• C is extraneous in AB CD since AB C can be inferred
even after deleting C
Testing if an Attribute is Extraneous
• Consider a set F of functional dependencies and the functional
dependency in F.
• To test if attribute A is extraneous in
1. compute ({} – A)+ using the dependencies in F
2. check that ({} – A)+ contains ; if it does, A is extraneous
in
• To test if attribute A is extraneous in
1. compute + using only the dependencies in
F’ = (F – { }) { ( – A)},
2. check that + contains A; if it does, A is extraneous in
Canonical Cover
• A canonical cover for F is a set of dependencies Fc such that
• F logically implies all dependencies in Fc, and
• Fc logically implies all dependencies in F, and
• No functional dependency in Fc contains an extraneous attribute,
and
• Each left side of functional dependency in Fc is unique.
• To compute a canonical cover for F:
repeat
Use the union rule to replace any dependencies in F
1 1 and 1 2 with 1 1 2
Find a functional dependency with an
extraneous attribute either in or in
/* Note: test for extraneous attributes done using Fc,
not F*/
If an extraneous attribute is found, delete it from
until F does not change
Computing a Canonical Cover
• R = (A, B, C)
F = {A BC, B C, A B, AB C}
• Combine A BC and A B into A BC
• Set is now {A BC, B C, AB C}
• A is extraneous in AB C
• Check if the result of deleting A from AB C is implied by the
other dependencies
• Yes: in fact, B C is already present!
• Set is now {A BC, B C}
• C is extraneous in A BC
• Check if A C is logically implied by A B and the other
dependencies
• Yes: using transitivity on A B and B C.
• Can use attribute closure of A in more complex cases
• The canonical cover is: AB
BC
Lossless-join Decomposition
• For the case of R = (R1, R2), we require that for all possible
relations r on schema R
r = R1 (r ) R2 (r )
• A decomposition of R into R1 and R2 is lossless join if at least one
of the following dependencies is in F+:
• R1 R2 R1
• R1 R2 R2
• The above functional dependencies are a sufficient condition for
lossless join decomposition; the dependencies are a necessary
condition only if all constraints are functional dependencies
Example
• R = (A, B, C)
F = {A B, B C)
• Can be decomposed in two different ways
• R1 = (A, B), R2 = (B, C)
• Lossless-join decomposition:
R1 R2 = {B} and B BC
• Dependency preserving
• R1 = (A, B), R2 = (A, C)
• Lossless-join decomposition:
R1 R2 = {A} and A AB
• Not dependency preserving
(cannot check B C without computing R1 R2)
Dependency Preservation
• Let Fi be the set of dependencies F + that include only
attributes in Ri.
• A decomposition is dependency preserving, if
(F1 F2 … Fn )+ = F +
• If it is not, then checking updates for violation of
functional dependencies may require computing
joins, which is expensive.
Testing for Dependency Preservation
• To check if a dependency is preserved in a decomposition
of R into R1, R2, …, Rn we apply the following test (with attribute
closure done with respect to F)
• result =
while (changes to result) do
for each Ri in the decomposition
t = (result Ri)+ Ri
result = result t
• If result contains all attributes in , then the functional
dependency
is preserved.
• We apply the test on all dependencies in F to check if a
decomposition is dependency preserving.
• This procedure takes polynomial time, instead of the exponential
time required to compute F+ and (F1 F2 … Fn)+
Example
• R = (A, B, C )
F = {A B
B C}
Key = {A}
• R is not in BCNF
• Decomposition R1 = (A, B), R2 = (B, C)
• R1 and R2 in BCNF
• Lossless-join decomposition
• Dependency preserving
Basic Steps in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
Basic Steps in Query Processing (Cont.)
• Parsing and translation
• translate the query into its internal form. This is then
translated into relational algebra.
• Parser checks syntax, verifies relations
• Evaluation
• The query-execution engine takes a query-evaluation plan,
executes that plan, and returns the answers to the query.
Basic Steps in Query Processing : Optimization
• A relational algebra expression may have many equivalent
expressions
• E.g., salary75000(salary(instructor)) is equivalent to
salary(salary75000(instructor))
• Each relational algebra operation can be evaluated using one of
several different algorithms
• Correspondingly, a relational-algebra expression can be evaluated
in many ways.
• Annotated expression specifying detailed evaluation strategy is
called an evaluation-plan.
• E.g., can use an index on salary to find instructors with salary <
75000,
• or can perform complete relation scan and discard instructors
with salary 75000
Basic Steps: Optimization
then compute the store its join with instructor, and finally
compute the projection on name.
building "Watson " (department )
Materialization
• Materialized evaluation is always applicable
• Cost of writing results to disk and reading them back can be quite
high
• Our cost formulas for operations ignore cost of writing results
to disk, so
• Overall cost = Sum of costs of individual operations +
cost of writing intermediate results to disk
• Double buffering: use two output buffers for each operation,
when one is full write it to disk while the other is getting filled
• Allows overlap of disk writes with computation and reduces
execution time
Pipelining
• Pipelined evaluation : evaluate several operations simultaneously,
passing the results of one operation on to the next.
• E.g., in previous expression tree, don’t store result of
building "Watson " (department )
• instead, pass tuples directly to the join.. Similarly, don’t store
result of join, pass tuples directly to projection.
• Much cheaper than materialization: no need to store a temporary
relation to disk.
• Pipelining may not always be possible – e.g., sort, hash-join.
• For pipelining to be effective, use evaluation algorithms that
generate output tuples even as tuples are received for inputs to the
operation.
• Pipelines can be executed in two ways: demand driven and
producer driven
Pipelining
• In demand driven or lazy evaluation
• system repeatedly requests next tuple from top level operation
• Each operation requests next tuple from children operations as
required, in order to output its next tuple
• In between calls, operation has to maintain “state” so it knows
what to return next
• In producer-driven or eager pipelining
• Operators produce tuples eagerly and pass them up to their
parents
• Buffer maintained between operators, child puts tuples in
buffer, parent removes tuples from buffer
• if buffer is full, child waits till there is space in the buffer, and
then generates more tuples
• System schedules operations that have space in output buffer and
can process more input tuples
• Alternative name: pull and push models of pipelining
Pipelining
• Implementation of demand-driven pipelining
• Each operation is implemented as an iterator implementing the
following operations
• open()
• E.g. file scan: initialize file scan
• state: pointer to beginning of file
• E.g.merge join: sort relations;
• state: pointers to beginning of sorted relations
• next()
• E.g. for file scan: Output next tuple, and advance and store file
pointer
• E.g. for merge join: continue with merge from earlier state till
next output tuple is found. Save pointers as iterator state.
• close()
Evaluation Algorithms for Pipelining
• Some algorithms are not able to output results even as they get
input tuples
• E.g. merge join, or hash join
• intermediate results written to disk and then read back
• Algorithm variants to generate (at least some) results on the fly, as
input tuples are read in
• E.g. hybrid hash join generates output tuples even as probe
relation tuples in the in-memory partition (partition 0) are read in
• Double-pipelined join technique: Hybrid hash join, modified to
buffer partition 0 tuples of both relations in-memory, reading
them as they become available, and output results of any
matches between partition 0 tuples
• When a new r0 tuple is found, match it with existing s0 tuples,
output matches, and save it in r0
• Symmetrically for s0 tuples
Accredited ‘A’ Grade By NAAC
Thank You !