Chapter 4
Chapter 4
1
Chapter Outline
1. Informal Design Guidelines for Relational Databases
1.1 Semantics of the Relation Attributes
1.2 Redundant Information in Tuples and Update Anomalies
1.3 Null Values in Tuples
1.4 Spurious Tuples
2. Functional Dependencies (FDs)
2.1 Definition of FD
2.2 Inference Rules for FDs
2.3 Equivalence of Sets of FDs
2.4 Minimal Sets of FDs
2
Chapter Outline
3. Normal Forms Based on Primary Keys
3.1 Normalization of Relations
3.2 Practical Use of Normal Forms
3.3 Definitions of Keys and Attributes Participating in Keys
3.4 First Normal Form
3.5 Second Normal Form
3.6 Third Normal Form
4. General Normal Form Definitions (For Multiple Keys)
5. BCNF (Boyce-Codd Normal Form)
3
1 Informal Design Guidelines for Relational Databases
4
1.1 Semantics of the Relation Attributes
GUIDELINE 1: Informally, each tuple in a relation should represent one entity or relationship
instance. (Applies to individual relations and their attributes).
• Bottom Line:
5
Figure: A simplified COMPANY relational database schema
6
1.2 Redundant Information in Tuples and Update Anomalies
– Wastes storage
• update anomalies
• Insertion anomalies
• Deletion anomalies
• Modification anomalies
7
EXAMPLE OF AN UPDATE ANOMALY
• Update Anomaly:
8
EXAMPLE OF AN INSERT ANOMALY
• Insert Anomaly:
• Conversely
9
EXAMPLE OF DELETE ANOMALY
• Consider the relation:
• Delete Anomaly:
– When a project is deleted, it will result in deleting all the employees who work on that
project. Alternately, if an employee is the sole employee on a project, deleting that
employee would result in deleting the corresponding project.
10
Figure: Two relation schemas suffering from update anomalies
11
Guideline to Redundant Information in Tuples and Update Anomalies
• GUIDELINE 2:
– Design a schema that does not suffer from the insertion, deletion and update
anomalies.
– If there are any anomalies present, then note them so that applications can be made
to take them into account.
12
1.3 Null Values in Tuples
• GUIDELINE 3:
– Relations should be designed such that their tuples will have as few NULL values as
possible
– Attributes that are NULL frequently could be placed in separate relations (with the
primary key)
13
1.4 Spurious Tuples
• Bad designs for a relational database may result in erroneous results for certain JOIN
operations
• The "lossless join" property is used to guarantee meaningful results for join operations
• GUIDELINE 4:
14
2.1 Functional Dependencies (1)
– Are constraints that are derived from the meaning and interrelationships of the data
attributes
– It plays a vital role to find the difference between good and bad database design.
16
2.1 Functional Dependencies (2)
• The attribute set on the left side of the arrow, X is called Determinant, while on the right
side, Y is called the Dependent.
17
Functional Dependencies (3)
• X -> Y holds if whenever two tuples have the same value for X, they must have the same
value for Y
– For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then
t1[Y]=t2[Y]
18
Examples of FD constraints (1)
• Employee SSN and PROJECT NUMBER determines the HOURS PER WEEK that the
employee works on the project
19
2.2 Inference Rules for FDs (1)
Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold
• IR1, IR2, IR3 form a sound and complete set of inference rules
– These are rules hold and all other rules that hold can be deduced from these
21
Inference Rules for FDs (2)
• The last three inference rules, as well as any other inference rules, can be deduced from IR1,
IR2, and IR3 (completeness property)
22
Inference Rules for FDs (3)
• Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F
• Closure of a set of attributes X with respect to F is the set X+ of all attributes that are
functionally determined by X
• X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F
23
2.3 Equivalence of Sets of FDs
• Two sets of FDs F and G are equivalent if:
• Definition (Covers):
24
2.4 Minimal Sets of FDs (1)
• A set of FDs is minimal if it satisfies the following conditions:
2. We cannot remove any dependency from F and have a set of dependencies that is
equivalent to F.
25
3 Normal Forms Based on Primary Keys
3.1 Normalization of Relations
27
3.1 Normalization of Relations (1)
• Normalization:
• Normal form:
– Condition using keys and FDs of a relation to certify whether a relation schema is in a
particular normal form
28
Normalization of Relations (2)
• There are three stages of normal forms known as first normal form (or 1NF), second
normal form (or 2NF), and third normal form (or 3NF).
• 4NF
• 5NF
29
3.2 Practical Use of Normal Forms
• Normalization is carried out in practice so that the resulting designs are of high quality
and meet the desirable properties
• The practical utility of these normal forms becomes questionable when the constraints
on which they are based are hard to understand or to detect
• The database designers need not normalize to the highest possible normal form
– (usually up to 3NF, BCNF or 4NF)
• Denormalization:
– The process of storing the join of higher normal form relations as a base relation—
which is in a lower normal form.
30
3.3 Definitions of Keys and Attributes Participating in Keys (1)
• A superkey of a relation schema R = {A1, A2, ...., An} is a set of attributes S subset-of R
with the property that no two tuples t1 and t2 in any legal relation state r of R will have
t1[S] = t2[S]
• A key K is a superkey with the additional property that removal of any attribute from K
will cause K not to be a superkey any more.
31
Definitions of Keys and Attributes Participating in Keys (2)
• If a relation schema has more than one key, each is called a candidate key.
– One of the candidate keys is arbitrarily designated to be the primary key, and the others
are called secondary keys.
• A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate
key.
32
33
34
3.2 First Normal Form
• Disallows
– composite attributes
– multivalued attributes
– nested relations; attributes whose values for an individual tuple are non-atomic
35
Figure: Normalization into 1NF
36
Figure: Normalization nested relations into 1NF
37
3.3 Second Normal Form (1)
• Definitions
– Full functional dependency: a FD Y -> Z where removal of any attribute from Y means the
FD does not hold any more
• Examples:
– {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS
hold
– {SSN, PNUMBER} -> ENAME is not a full FD (it is called a) since SSN -> ENAME also holds
partial dependency
38
3.3 Second Normal Form (2)
• A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is
fully functionally dependent on the primary key.
• R can be decomposed into 2NF relations via the process of 2NF normalization
39
Figure: Normalizing into 2NF and 3NF
40
Figure: Normalization into 2NF and 3NF
41
3.4 Third Normal Form (1)
• Definition:
– Transitive functional dependency: a FD X -> Z that can be derived from two FDs X ->
Y and Y -> Z
• Examples:
42
Third Normal Form (2)
• A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A
in R is transitively dependent on the primary key
• R can be decomposed into 3NF relations via the process of 3NF normalization
• NOTE:
– In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a
candidate key.
• Here, SSN -> Emp# -> Salary and Emp# is a candidate key.
43
Normal Forms Defined Informally
• 1st normal form
44
4 General Normal Form Definitions (For Multiple Keys) (1)
• The above definitions consider the primary key only
• The following more general definitions take into account relations with multiple
candidate keys
45
General Normal Form Definitions (2)
• Definition:
• (a) X is a superkey of R, or
46
5 BCNF (Boyce-Codd Normal Form)
• A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X -> A
holds in R, then X is a superkey of R
47
Figure: Boyce-Codd normal form
48
Figure: a relation TEACH that is in 3NF but not in BCNF
49
Achieving the BCNF by Decomposition (1)
• Two FDs exist in the relation TEACH:
• {student, course} is a candidate key for this relation and that the dependencies shown
follow the pattern in Figure 10.12 (b).
• A relation NOT in BCNF should be decomposed so as to meet this property, while possibly
forgoing the preservation of all functional dependencies in the decomposed relations.
50
Achieving the BCNF by Decomposition (2)
• Three possible decompositions for relation TEACH
– {student, instructor} and {student, course}
– {course, instructor } and {course, student}
– {instructor, course } and {instructor, student}
• All three decompositions will lose fd1.
– We have to settle for sacrificing the functional dependency preservation. But we cannot sacrifice the non-
additivity property after decomposition.
• Out of the above three, only the 3rd decomposition will not generate spurious tuples after
join.(and hence has the non-additivity property).
• A test to determine whether a binary decomposition (decomposition into two relations) is non-
additive (lossless) is discussed in section 11.1.4 under Property LJ1. Verify that the third
decomposition above meets the property.
51