Chapter 14
Chapter 14
Chapter 14
CHAPTER 14
Basics of Functional Dependencies and
Normalization for Relational Databases
Dr. Najma Ismat
[email protected]
Figure 14.1 A
simplified COMPANY
relational database
schema.
Figure 14.3
Two relation schemas
suffering from update
anomalies. (a)
EMP_DEPT and (b)
EMP_PROJ.
anomalies present,
then note them so
that applications can
be made to take them
into account.
◼ Bad designs for a relational database may result in erroneous results for
certain JOIN operations
◼ The "lossless join" property is used to guarantee meaningful results for
join operations
◼ GUIDELINE 4:
◼ The relations should be designed to satisfy the lossless join condition.
designs
◼ And keys are used to define normal forms of relations
t1[K]=t2[K])
◼ SSN ENAME
SSN.
2. Suppose that a company assigned each employee a unique
EmpNo. Each employee has a number and a name. Names
might be the same for two different employees, but their
employee numbers would always be different and unique
because the company defined them that way.
◼ It would be inconsistent in the database if there were two
occurrences of the same employee number with different
names.
Eg:
EmpNo Job Name
101 President Herbert
104 Programmer Fred
◼ Is there a problem here? No 103 Designer Beryl
◼ Because we have FD that 103 Programmer Beryl
◼ EmpNo → Name.
◼ This means that every time we find 104, we find the name, Fred.
◼ Just because something is on the left-hand side of an FD, it does
not imply that you have a key or that it will be unique in the
database. i.e the FD X → Y only means that for every occurrence of
X you will get the same value of Y.
.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 24
Defining FDs from instances
◼ Note that to define the FDs, we need to understand the meaning of
the attributes involved and the relationship between them.
◼ An FD is a property of the attributes in the schema R
◼ Given the instance (population) of a relation, all we can conclude is
that an FD may exist between certain attributes.
◼ What we can conclude is – that certain FDs do not exist because
there are tuples that show a violation of those dependencies.
4 JKL
2 BCD SE C Valid FD
3 XYZ CS H
4 XYZ SE C
5 EFG IT H
6 HIJ BI D
In this case, the values of
7 KLM MT B
determinant and dependent are
different
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 26
Example
Student ID Name Dept_name Dept_building Dept_name→ Dept_building
1 ABC CE B
• This is a Valid FD
2 BCD SE C
• Although data is redundant, the
3 XYZ CS H
output of the dependency is correct
4 XYZ SE C • This means that the same
5 EFG IT H determinants and dependents are
6 HIJ BI D valid FD
7 KLM MT B
• Also, there is a possibility that the
determinant is different, but the
dependents are identical, then it is a Valid
Student ID Name Dept_name Dept_building FD
1 ABC CE B
Student ID→ Name
2 BCD SE C
• This is a valid FD
3 XYZ CS H • Redundant dependent & unique
4 XYZ SE C determinant
5 EFG IT H
A→B A→B
◼ The rule, which seems obvious, says if I give you the combination
<Kaitlyn, New Orleans>, what is this person’s Name? What is this
person's City City?
◼ While this rule seems obvious enough, it is necessary to derive other
functional dependencies.
◼ SSN → Name
◼ SSN → School
◼ School → Location
◼ The following dependencies are derived:
◼ SSN → Name (given)
◼ SSN → School (given)
◼ SSN → Location (derived by the
transitive rule)
◼ SSN → SSN (reflexive rule (obvious))
◼ SSN → SSN, Name, School, Location
(union rule)
◼ SSN can be a candidate key and primary
key as well.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 38
Keys and FDs
◼ Keys should be a minimal set of attributes whose closure is all the
attributes in the relation — "minimal" in the sense that you want the
fewest attributes on the LHS of the FD that you choose
◼ as a key.
◼ Like in a previous example, SSN will be minimal (one attribute), whose
closure includes all the other attributes.
◼ Once a set of candidate keys has been found (which is the only one the
example we have considered), one of the candidate keys as the primary
key and move on to normal forms.
◼ Normalization:
◼ The process of decomposing unsatisfactory "bad" relations by
In Figure 14.1 WORKS_ON schema, both Ssn and Pnumber are prime
attributes of WORKS_ON, whereas other attributes of WORKS_ON are
nonprime.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 47
Recall
Keys
◼ Primary Key: It is the first key used to identify one and only one instance of an
entity uniquely. An entity can contain multiple keys, and the key that is most
suitable from those lists becomes a primary key.
• Candidate Key: It is an attribute or set of attributes that can uniquely identify a
tuple. Except for the primary key, the remaining attributes are considered a
candidate key. The candidate keys are as strong as the primary key.
• Super keys are a superset of Candidate keys. Candidate keys are a subset of
Super keys.
◼ Super Key: Super key is an attribute set that can uniquely identify a tuple. A
super key is a superset of a candidate key.
◼ Foreign Key: Foreign Key is used to establish relationships between two tables. A
foreign key will require each value in a column or set of columns to match the
Primary Key of the referential table. Foreign keys help to maintain data and
referential integrity.
Employee table
Employee_ID Super key
Employee_name
Employee_address
Department_name
License_number
Department_ID
Passport_number
SSN
Foreign key
Department_ID
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 49
Recall
Keys
◼ Alternate Key: There may be multiple unique keys in a relation. The combination
of unique attributes is a candidate key, one unique in multiple unique keys is
chosen as the primary key, and the remaining keys are known as alternate
key(s).
◼ Composite Key: When a primary key is a combination of multiple attributes, it is
known as a composite key.
Employee table
Primary
key
Employee_ID
Employee_ID Project_ID Composite
SSN Alternate Project_loc key
key
Candidate key
◼ multivalued attributes
are non-atomic
◼ Considered to be part of the definition of a relation
◼ Most RDBMSs allow only those relations to be defined that are in
First Normal Form
Figure 14.10
Normalizing nested relations into 1NF. (a) Schema of the EMP_PROJ relation with a nested relation
attribute PROJS. (b) Sample extension of the EMP_PROJ relation showing nested relations within
each tuple. (c) Decomposition of EMP_PROJ into relations EMP_PROJ1 and EMP_PROJ2 by
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe propagating the primary
Slide 14-key.
55
3.5 Second Normal Form (1)
◼ Uses the concepts of FDs, primary key
◼ Definitions
◼ Prime attribute: An attribute that is a member of the primary key K
Figure 14.11
Normalizing into 2NF and 3NF.
(a) Normalizing EMP_PROJ into
2NF relations. (b) Normalizing
EMP_DEPT into 3NF relations.
FD3: ??
FD4: ??
Figure 14.12
Normalization into 2NF and
3NF. (a) The LOTS relation
with its functional
dependencies
FD1 Copyright
through FD4.
© 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 61
• Figure 14.12(a), FD3 and FD4 relational schema violates the 2NF because Tax_rate is
partially dependent on the candidate key {County_name, Lot#}, due to FD3.
Figure 14.12
Normalization into 2NF and 3NF. (a)
The LOTS relation with its
functional dependencies
FD1 through FD4.
• To normalize LOTS into 2NF, we decompose it into the two relations LOTS1 and LOTS2,
shown in Figure 14.12(b).
Figure 14.12 b)
Decomposing into
the 2NF relations
LOTS1 and LOTS2.
• Remove the attribute Price that violates 3NF from LOTS1 and place it with Area (the L.H.S
of FD4 that causes the transitive dependency) into another relation LOTS1B.
• Both LOTS1A and LOTS1B are in 3NF.
◼ Definition:
◼ Superkey of relation schema R - a set of attributes
S of R that contains a key of R
◼ A relation schema R is in third normal form (3NF)
if whenever a FD X → A holds in R, then either:
◼ (a) X is a superkey of R, or
◼ (b) A is a prime attribute of R
◼ LOTS1 relation violates 3NF because
Area → Price ; and Area is not a superkey in
LOTS1. (see Figure 14.12).
Figure 14.12
Normalization into 2NF and
3NF. (a) The LOTS relation
with its functional
dependencies
FD1 through FD4.
• most relation schemas that are in 3NF are also in BCNF, Only if there some f.d. X → A that
holds in a relation schema R with X not being a superkey and A being a prime attribute will R
be in 3NF but not in BCNF
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 73
Figure 14.14 A relation TEACH that is in
3NF but not in BCNF
◼ Two FDs exist in relation TEACH:
◼ fd1: { student, course} -> instructor
◼ fd2: instructor -> course
◼ {student, course} is a candidate key for this
relation, and the dependencies shown follow
the pattern in Figure 14.13 (b).
◼ So, this relation is in 3NF but not
in BCNF
◼ A relation NOT in BCNF should be
decomposed to meet this property, while
possibly forgoing the preservation of all
functional dependencies in the decomposed
Figure 14.14 relations.
A relation TEACH that is in 3NF but not
BCNF.
◼ (See Algorithm 15.3)
Figure 14.13
(b) A schematic relation with FDs; it
is in 3NF, but not in BCNF due to the
f.d. C → B.