Chapter - 7-Functional Dependencies - Normalization For Relational DBs
Chapter - 7-Functional Dependencies - Normalization For Relational DBs
1 Introduction
2 Functional dependencies (FDs)
3 Normalization
4 Relational database schema design algorithms
5 Key finding algorithms
Jan - 2015 2
Contents
1 Introduction
2 Functional dependencies (FDs)
3 Normalization
4 Relational database dchema design algorithms
5 Key finding algorithms
Jan - 2015 3
Top-Down Database Design
Mini-world
Requirements
E1
R
Conceptual schema
Relation schemas E2
Jan - 2015 4
Introduction
◼ Each relation schema consists of a number of
attributes and the relational database schema
consists of a number of relation schemas.
◼ Attributes are grouped to form a relation
schema.
◼ Need some formal measure of why one
grouping of attributes into a relation schema
may be better than another.
Jan - 2015 5
Introduction
◼ “Goodness” measures:
❑ Redundant information in tuples.
❑ Update anomalies: modification, deletion,
insertion.
❑ Reducing the NULL values in tuples.
❑ Disallowing the possibility of generating spurious
tuples.
Jan - 2015 6
Redundant information
◼ The attribute values pertaining to a particular
department (DNUMBER, DNAME, DMGRSSN)
are repeated for every employee who works for
that department.
Jan - 2015 7
Update anomalies
◼ Update anomalies: modification, deletion,
insertion
❑ Modification
◼ As the manager of a dept. changes we have to update many
values according to employees working for that dept.
◼ Easy to make the DB inconsistent.
Jan - 2015 8
Update anomalies
◼ Update anomalies: modification, deletion,
insertion
❑ Deletion: if Borg James E. leaves, we delete his tuple
and lose the existing of dept. 1, the name of dept. 1,
and who is the manager of dept. 1.
Jan - 2015 9
Update anomalies
◼ Update anomalies: modification, deletion,
insertion
❑ Insertion:
◼ How can we create a department before any employees
are assigned to it ?
Jan - 2015 10
Reducing NULL values
◼ Employees not assigned to any dept.: waste the
storage space.
◼ Other difficulties: aggregation operations (e.g.,
COUNT, SUM) and joins.
Jan - 2015 11
Generation spurious tuples
◼ Disallowing the possibility of generating spurious
tuples.
EMP_PROJ(SSN, PNUMBER, HOURS, ENAME,
PNAME, PLOCATION)
EMP_LOCS(ENAME, PLOCATION)
EMP_PROJ1(SSN, PNUMBER, HOURS, PNAME,
PLOCATION)
Jan - 2015 12
Generation spurious tuples
Jan - 2015 13
Generation spurious tuples
Jan - 2015 14
Generation spurious tuples
Jan - 2015 15
Summary of Design Guidelines
◼ “Goodness” measures:
❑ Redundant information in tuples
❑ Update anomalies: modification, deletion, insertion
❑ Reducing the NULL values in tuples
❑ Disallowing the possibility of generating spurious tuples
Normalization
◼ It helps DB designers determine the best relation
schemas.
❑ A formal framework for analyzing relation schemas based on their
keys and on the functional dependencies among their attributes.
❑ A series of normal form tests that can be carried out on individual
relation schemas so that the relational database can be
normalized to any desired degree.
◼ It is based on the concept of normal form 1NF, 2NF,
3NF, BCNF, 4NF, 5 NF.
Jan - 2015 16
Contents
1 Introduction
2 Functional dependencies (FDs)
3 Normalization
4 Relational database schema design algorithms
5 Key finding algorithms
Jan - 2015 17
Functional Dependencies (FDs)
◼ Definition of FD
◼ Direct, indirect, partial dependencies
◼ Inference Rules for FDs
◼ Equivalence of Sets of FDs
◼ Minimal Sets of FDs
Jan - 2015 18
Definition of Functional dependencies
◼ Functional dependencies (FDs) are used to
specify formal measures of the "goodness"
of relational designs.
◼ FDs and keys are used to define normal
forms for relations.
◼ FDs are constraints that are derived from
the meaning and interrelationships of the
data attributes.
◼ A set of attributes X functionally determines
a set of attributes Y if the value of X
determines a unique value for Y.
Jan - 2015 19
Definition of Functional dependencies
◼ X -> Y holds if whenever two tuples have the same value
for X, they must have the same value for Y
◼ For any two tuples t1 and t2 in any relation instance r(R):
If t1[X]=t2[X], then t1[Y]=t2[Y]
◼ X -> Y in R specifies a constraint on all relation
instances r(R)
◼ Examples:
❑ social security number determines employee name:
SSN -> ENAME
❑ project number determines project name and location:
PNUMBER -> {PNAME, PLOCATION}
❑ employee ssn and project number determines the hours
per week that the employee works on the project:
{SSN, PNUMBER} -> HOURS
Jan - 2015 20
Definition of Functional dependencies
◼ If K is a key of R, then K functionally
determines all attributes in R (since we never
have two distinct tuples with t1[K]=t2[K]).
Jan - 2015 21
Functional Dependencies (FDs)
◼ Definition of FD
◼ Direct, indirect, partial dependencies
◼ Inference Rules for FDs
◼ Equivalence of Sets of FDs
◼ Minimal Sets of FDs
Jan - 2015 22
Direct, indirect, partial dependencies
◼ Direct dependency (fully functional
dependency): All attributes in a R must be fully
functionally dependent on the primary key (or
the PK is a determinant of all attributes in R).
Performer-
Performer-id name
Performer-
type
Performer-
location
Jan - 2015 23
Direct, indirect, partial dependencies
◼ Indirect dependency (transitive dependency):
Value of an attribute is not determined directly
by the primary key.
Performer-
Performer-id name
Performer- Fee
type
Performer-
location
Jan - 2015 24
Direct, indirect, partial dependencies
◼ Partial dependency
❑ Composite determinant: more than one value is required to
determine the value of another attribute, the combination of
values is called a composite determinant.
EMP_PROJ(SSN, PNUMBER, HOURS, ENAME, PNAME, PLOCATION)
{SSN, PNUMBER} -> HOURS
Jan - 2015 25
Direct, indirect, partial dependencies
◼ Partial dependency
Performer-name
Performer-id
Performer-type
Performer-location
Fee
Agent-id Agent-name
Agent-location
Jan - 2015 26
Functional Dependencies (FDs)
◼ Definition of FD
◼ Direct, indirect, partial dependencies
◼ Inference Rules for FDs
◼ Equivalence of Sets of FDs
◼ Minimal Sets of FDs
Jan - 2015 27
Inference Rules for FDs
◼ Given a set of FDs F, we can infer additional
FDs that hold whenever the FDs in F hold.
Armstrong's inference rules:
IR1. (Reflexive) If Y X, then X -> Y.
IR2. (Augmentation) If X -> Y, then XZ -> YZ.
(Notation: XZ stands for X U Z)
IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z.
Jan - 2015 28
Inference Rules for FDs
◼ Some additional inference rules that are
useful:
(Decomposition) If X -> YZ, then X -> Y and X -> Z
(Union) If X -> Y and X -> Z, then X -> YZ
(Psuedotransitivity) If X -> Y and WY -> Z, then WX -> Z
Jan - 2015 29
Inference Rules for FDs
◼ Closure of a set F of FDs is the set F+ of
all FDs that can be inferred from F.
◼ Closure of a set of attributes X with
respect to F is the set X + of all attributes
that are functionally determined by X.
◼ X + can be calculated by repeatedly
applying IR1, IR2, IR3 using the FDs in F.
Jan - 2015 30
Inference Rules for FDs
Algorithm 16.1. Determining X+, the Closure of X under F
Input: A set F of FDs on a relation schema R, and a set of
attributes X, which is a subset of R.
X+ := X;
repeat
oldX+ := X+;
for each functional dependency Y → Z in F do
if X+ ⊇ Y then X+ := X+ ∪ Z;
until (X+ = oldX+);
Jan - 2015 31
Inference Rules for FDs
◼ Consider a relation R(A, B, C, D, E) with the
following dependencies F:
◼ (1) AB → C,
◼ (2) CD → E,
◼ (3) DE → B
◼ Find {A,B}+ ?
Jan - 2015 32
Functional Dependencies (FDs)
◼ Definition of FD
◼ Direct, indirect, partial dependencies
◼ Inference Rules for FDs
◼ Equivalence of Sets of FDs
◼ Minimal Sets of FDs
Jan - 2015 33
Equivalence of Sets of FDs
◼ Two sets of FDs F and G are equivalent if
F+ = G+.
◼ Definition: F covers G if G+ F+. F and G are
equivalent if F covers G and G covers F.
◼ There is an algorithm for checking equivalence
of sets of FDs.
Jan - 2015 34
Functional Dependencies (FDs)
◼ Definition of FD
◼ Direct, indirect, partial dependencies
◼ Inference Rules for FDs
◼ Equivalence of Sets of FDs
◼ Minimal Sets of FDs
Jan - 2015 35
Minimal Sets of FDs
◼ A set of FDs is minimal if it satisfies the
following conditions:
(1) Every dependency in F has a single attribute for its
right-hand side.
(2) We cannot remove any dependency from F and have
a set of dependencies that is equivalent to F.
(3) We cannot replace any dependency X -> A in F with a
dependency Y -> A, where Y proper-subset-of X ( Y
subset-of X) and still have a set of dependencies that
is equivalent to F.
Jan - 2015 36
Minimal Sets of FDs
Algorithm 16.2. Finding a Minimal Cover F for a Set of
Functional Dependencies E
Input: A set of functional dependencies E.
1. Set F := E.
2. Replace each functional dependency X→{A1, A2, ..., An} in F
by the n functional dependencies X→A1, X→A2, ..., X→An.
3. For each functional dependency X→A in F
for each attribute B that is an element of X
if { {F – {X→A} } ∪ { (X – {B} ) →A} } is equivalent to F
then replace X→A with (X – {B} ) →A in F.
4. For each remaining functional dependency X→A in F
if {F – {X→A} } is equivalent to F,
then remove X→A from F.
Jan - 2015 37
Minimal Sets of FDs
◼ Every set of FDs has an equivalent minimal set.
◼ There can be several equivalent minimal sets.
◼ There is no simple algorithm for computing a
minimal set of FDs that is equivalent to a set F of
FDs.
◼ To synthesize a set of relations, we assume that
we start with a set of dependencies that is a
minimal set.
Jan - 2015 38
Contents
1 Introduction
2 Functional dependencies (FDs)
3 Normalization
4 Relational database schema design algorithms
5 Key finding algorithms
Jan - 2015 39
Normalization
◼ Normalization: The process of decomposing
unsatisfactory "bad" relations by breaking up their
attributes into smaller relations.
Jan - 2015 40
Normalization
◼ There are two important properties of
decompositions:
(a) non-additive or losslessness of the
corresponding join.
(b) preservation of the functional dependencies.
Jan - 2015 41
Normalization
◼ Superkey of R: A set of attributes SK of R
such that no two tuples in any valid relation
instance r(R) will have the same value for
SK. That is, for any distinct tuples t1 and t2
in r(R), t1[SK] ≠ t2[SK].
◼ Key of R: A "minimal" superkey; that is, a
superkey K such that removal of any attribute
from K results in a set of attributes that is not
a superkey.
◼ If K is a key of R, then K functionally
determines all attributes in R.
Jan - 2015 42
Normalization
◼ Two new concepts:
❑ A Prime attribute must be a member of some
candidate key.
❑ A Nonprime attribute is not a prime attribute: it is
not a member of any candidate key.
Jan - 2015 43
Normalization
◼ 1NF and dependency problems
◼ 2NF – solves partial dependency
◼ 3NF – solves indirect dependency
◼ BCNF – well-normalized relations
Jan - 2015 44
1NF
◼ First normal form (1NF): there is only one
value at the intersection of each row and
column of a relation - no set valued attributes
in 1 NF → Disallows composite attributes,
multivalued attributes, and nested relations.
Jan - 2015 45
1NF
Jan - 2015 46
1NF
Jan - 2015 47
1NF
Jan - 2015 48
Normalization
◼ 1NF and dependency problems
◼ 2NF – solves partial dependency
◼ 3NF – solves indirect dependency
◼ BCNF – well-normalized relations
Jan - 2015 49
2NF
◼ Second normal form (2NF) – all non-prime
attributes must be fully functionally dependent
on the primary key.
◼ 2NF solves partial dependency problem in 1NF.
◼ 2NF normalized: Decompose and set up a new
relation for each partial key with its dependent
attribute(s).Make sure to keep a relation with the
original primary key and any attributes that are
fully functionally dependent on it.
Jan - 2015 50
2NF
Jan - 2015 51
2NF
Performer-
Performer-
name
id
Performer- Fee
type
Performer-
location
Jan - 2015 52
Normalization
◼ 1NF and dependency problems
◼ 2NF – solves partial dependency
◼ 3NF – solves indirect dependency
◼ BCNF – well-normalized relations
Jan - 2015 53
3NF
◼ A relation schema R is in third normal form
(3NF) if it is in 2NF and no non-prime attribute A
in R is transitively dependent on the primary key.
◼ NOTE:
❑ In X -> Y and Y -> Z, with X as the primary key, we
consider this a problem only if Y is not a candidate
key. When Y is a candidate key, there is no problem
with the transitive dependency .
❑ E.g., Consider EMP (SSN, Emp#, Salary).
❑ Here, SSN → Emp#. Emp# → Salary and Emp# is a
candidate key.
Jan - 2015 54
3NF
◼ 3NF solves indirect (transitive) dependencies
problem in 1NF and 2NF.
Jan - 2015 55
3NF
Jan - 2015 56
3NF
◼ LOCATION ( city, street, zip-code )
◼ F = { city, street -> zip-code,
zip-code -> city
Key1 : city, street (primary key)
Key2 : street, zip-code
city street zip-code
NY 55th 484
NY 56th 484
LA 55th 473
LA 56th 473
Jan - 2015 58
General Normal Form Definitions
◼ The above definitions consider the primary
key only.
◼ The following more general definitions take
into account relations with multiple candidate
keys.
Jan - 2015 59
General Normal Form Definitions
◼ A relation schema R is in second normal form (2NF) if
every non-prime attribute A in R is not partially
functionally dependent on any key of R.
Jan - 2015 60
General Normal Form Example
Jan - 2015 61
General Normal Form Example
Decomposing into
the 2NF relations
Jan - 2015 62
General Normal Form Example
Decomposing LOTS1
into the 3NF relations
Jan - 2015 63
Normalization
◼ 1NF and dependency problems
◼ 2NF – solves partial dependency
◼ 3NF – solves indirect dependency
◼ BCNF – well-normalized relations
Jan - 2015 64
BCNF
◼ A relation schema R is in Boyce-Codd
Normal Form (BCNF) if whenever an FD
X -> A holds in R, then X is a superkey of
R.
Jan - 2015 65
BCNF
Jan - 2015 66
BCNF
◼ TEACH (Student, Course, Instructor)
◼ FD1: {Student, Course} → Instructor
◼ FD2: Instructor → Course
Jan - 2015 67
BCNF
◼ Three possible pairs:
1. {Student, Instructor} and {Student, Course}
2. {Course, Instructor} and {Course, Student}
3. {Instructor, Course} and {Instructor, Student}
Jan - 2015 68
Notes & Suggestions
◼ [1], chapter 15:
❑ 4NF: based on multivalued dependency (MVD)
❑ 5NF: based on join dependency
◼ Such a dependency is very difficult to detect in practice
and therefore, normalization into 5NF is considered very
rarely in practice
❑ Other normal forms & algorithms
❑ ER modeling: top-down database design
◼ Bottom-up database design ??
Jan - 2015 69
Contents
1 Introduction
2 Functional dependencies (FDs)
3 Normalization
4 Relational database schema design algorithms
5 Key finding algorithms
Jan - 2015 70
Dependency-Preserving Decomposition
into 3NF Schemas
Algorithm 16.4. Relational Synthesis into 3NF with Dependency
Preservation
Input: A universal relation R and a set of functional
dependencies F on the attributes of R.
1. Find a minimal cover G for F (use Algorithm 16.2);
2. For each left-hand-side X of a functional dependency that
appears in G, create a relation schema in D with attributes {X ∪
{A1} ∪ {A2} ... ∪ {Ak} }, where X→A1, X→A2, ..., X→Ak are the
only dependencies in G with X as the left-hand-side (X is the key
of this relation);
3. Place any remaining attributes (that have not been placed in
any relation) in a single relation schema to ensure the attribute
preservation property.
Jan - 2015 71
Nonadditive Join Decomposition into
BCNF Schemas
Algorithm 16.5. Relational Decomposition into BCNF with
Nonadditive Join Property
Input: A universal relation R and a set of functional
dependencies F on the attributes of R.
1. Set D := {R} ;
2. While there is a relation schema Q in D that is not in BCNF
do
{
choose a relation schema Q in D that is not in BCNF;
find a functional dependency X→Y in Q that violates BCNF;
replace Q in D by two relation schemas (Q – Y) and (X ∪
Y);
} ;- 2015
Jan 72
Dependency-Preserving and Nonadditive
(Lossless) Join Decomposition into 3NF Schemas
Algorithm 16.6. Relational Synthesis into 3NF with Dependency
Preservation and Nonadditive Join Property
Input: A universal relation R and a set of functional dependencies F on
the attributes of R.
1. Find a minimal cover G for F (use Algorithm 16.2).
2. For each left-hand-side X of a functional dependency that appears in
G, create a relation schema in D with attributes {X ∪ {A1} ∪ {A2} ... ∪
{Ak} }, where X→A1, X→A2, ..., X→Ak are the only dependencies in G
with X as left-hand-side (X is the key of this relation).
3. If none of the relation schemas in D contains a key of R, then create
one more relation schema in D that contains attributes that form a key
of R.
4. Eliminate redundant relations from the resulting set of relations in the
relational database schema. A relation R is considered redundant if R
is a projection of another relation S in the schema; alternately, R is
subsumed by S
Jan - 2015 73
Dependency-Preserving and Nonadditive
(Lossless) Join Decomposition into 3NF Schemas
◼ Algorithm 16.6:
❑ Preserves dependencies.
❑ Has the nonadditive join property.
❑ Is such that each resulting relation schema in the
decomposition is in 3NF.
Jan - 2015 74
Contents
1 Introduction
2 Functional dependencies (FDs)
3 Normalization
4 Relational database schema design algorithms
5 Key finding algorithms
Jan - 2015 75
Key-finding algorithm (1)
By Elmasri and Navathe
Jan - 2015 77
Key-finding algorithm (2)
By Hossein Saiedian and Thomas Spencer
Input: A relation R and a set of functional dependencies F
on the attributes of R.
Output: all candidate keys of R
Let:
◼ U contain all attributes of R.
Jan - 2015 78
Key-finding algorithm (2)
By Hossein Saiedian and Thomas Spencer
Note:
◼ UL∩ UR = ф, UL ∩ UB = ф and UR ∩ UB = ф
◼ UL ∪ UR ∪ U B = U
Jan - 2015 79
Key-finding algorithm (2)
By Hossein Saiedian and Thomas Spencer
Input: A relation R and a set of functional dependencies F on
the attributes of R.
Output: all candidate keys of R
Jan - 2015 80
Key-finding algorithm (2)
By Hossein Saiedian and Thomas Spencer
Jan - 2015 81
Contents
1 Introduction
2 Functional dependencies (FDs)
3 Normalization
4 Relational database schema design algorithms
5 Key finding algorithms
Jan - 2015 82
MULTIVALUED DEPENDENCY
Jan - 2015 83
Jan - 2015 84
Exercise 1
Consider the universal relation R = {A, B, C, D,
E, F} and the set of functional dependencies:
1) A → B
2) C, D → A
3) B, C → D
4) A, E → F
5) C, E → D
Jan - 2015 85
Exercise 2
Consider the universal relation R = {A, B, C, D,
E, F} and the set of functional dependencies:
1) A, D → B
2) A, B → E
3) C → D
4) B → C
5) A, C → F
2) C → A, D
3) A, F → C, E
Jan - 2015 87
Exercise 4
Consider the universal relation R = {A, B, C, D,
E, F, G, H, I, J} and the set of functional
dependencies:
1) A, B → C
2) B, D → E, F
3) A, D → G, H
4) A → I
5) H → J
Jan - 2015 89
Finding Key
◼ R = (A, B, C, D, E, F)
◼ F = {A → D, C→ AF, AB → EC}
◼ What is the primary key of R?
Jan - 2015 90