Chapter 14

Download as pdf or txt
Download as pdf or txt
You are on page 1of 79

CE 301T

CHAPTER 14
Basics of Functional Dependencies and
Normalization for Relational Databases
Dr. Najma Ismat
[email protected]

Computer Engineering Departme


Chapter Outline
◼ 1 Informal Design Guidelines for Relational Databases
◼ 1.1 Semantics of the Relation Attributes

◼ 1.2 Redundant Information in Tuples and Update Anomalies

◼ 1.3 Null Values in Tuples

◼ 1.4 Spurious Tuples

◼ 2 Functional Dependencies (FDs)


◼ 2.1 Definition of Functional Dependency

◼ 3 Normal Forms Based on Primary Keys


◼ 3.1 Normalization of Relations

◼ 3.2 Practical Use of Normal Forms

◼ 3.3 Definitions of Keys and Attributes Participating in Keys

◼ 3.4 First Normal Form

◼ 3.5 Second Normal Form

◼ 3.6 Third Normal Form

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 2


Chapter Outline

◼ 4 General Normal Form Definitions for 2NF and 3NF (For


Multiple Candidate Keys)

◼ 5 BCNF (Boyce-Codd Normal Form)


◼ 6 Multivalued Dependency and Fourth Normal Form

◼ 7 Join Dependencies and Fifth Normal Form

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 3


1. Informal Design Guidelines for
Relational Databases (1)

◼ What is relational database design?


◼ The grouping of attributes to form "good" relation
schemas
◼ Two levels of relation schemas
◼ The logical "user view" level
◼ The storage "base relation" level
◼ Design is concerned mainly with base relations
◼ What are the criteria for "good" base relations?

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 4


Informal Design Guidelines for Relational
Databases (2)
◼ We first discuss informal guidelines for good relational
design
◼ Then we discuss formal concepts of functional
dependencies and normal forms
◼ - 1NF (First Normal Form)
◼ - 2NF (Second Normal Form)
◼ - 3NF (Third Noferferferfewrmal Form)
◼ - BCNF (Boyce-Codd Normal Form)
◼ Additional types of dependencies, further normal forms,
relational design algorithms by synthesis are discussed in
Chapter 15

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 5


1.1 Semantics of the Relational
Attributes must be clear
◼ GUIDELINE 1: Informally, each tuple in a relation should
represent one entity or relationship instance. (Applies to
individual relations and their attributes).
◼ Attributes of different entities (EMPLOYEEs,
DEPARTMENTs, PROJECTs) should not be mixed in the
same relation
◼ Only foreign keys should be used to refer to other entities
◼ Entity and relationship attributes should be kept apart as
much as possible.
◼ Bottom Line: Design a schema that can be explained
easily relation by relation. The semantics of attributes
should be easy to interpret.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 6


Figure 14.1 A simplified COMPANY
relational database schema

Figure 14.1 A
simplified COMPANY
relational database
schema.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 7


1.2 Redundant Information in Tuples and
Update Anomalies

◼ Information is stored redundantly


◼ Wastes storage
◼ Causes problems with update anomalies
◼ Insertion anomalies
◼ Deletion anomalies
◼ Modification anomalies

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 8


EXAMPLE OF AN UPDATE ANOMALY

◼ Consider the relation:


◼ EMP_PROJ(Emp#, Proj#, Ename, Pname,
No_hours)
◼ Update Anomaly:
◼ Changing the name of project number P1 from
“Billing” to “Customer-Accounting” may cause this
update to be made for all 100 employees working
on project P1.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 9


EXAMPLE OF AN INSERT ANOMALY

◼ Consider the relation:


◼ EMP_PROJ(Emp#, Proj#, Ename, Pname,
No_hours)
◼ Insert Anomaly:
◼ Cannot insert a project unless an employee is
assigned to it.
◼ Conversely
◼ Cannot insert an employee unless an he/she is
assigned to a project.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 10


EXAMPLE OF A DELETE ANOMALY

◼ Consider the relation:


◼ EMP_PROJ(Emp#, Proj#, Ename, Pname,
No_hours)
◼ Delete Anomaly:
◼ When a project is deleted, it will result in deleting
all the employees who work on that project.
◼ Alternately, if an employee is the sole employee
on a project, deleting that employee would result in
deleting the corresponding project.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 11


Figure 14.3 Two relation schemas
suffering from update anomalies

Figure 14.3
Two relation schemas
suffering from update
anomalies. (a)
EMP_DEPT and (b)
EMP_PROJ.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 12


Figure 14.4 Sample states for
EMP_DEPT and EMP_PROJ
Figure 14.4
Sample states for EMP_DEPT
and EMP_PROJ resulting from
applying NATURAL JOIN to the
relations in Figure 14.2. These
may be stored as base
relations for performance
reasons.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 13


Guideline for Redundant Information in
Tuples and Update Anomalies
◼ GUIDELINE 2:
◼ Design a schema that

does not suffer from


the insertion, deletion
and update
anomalies.
◼ If there are any

anomalies present,
then note them so
that applications can
be made to take them
into account.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 14


1.3 Null Values in Tuples
◼ GUIDELINE 3:
◼ Relations should be designed such that their tuples will have as

few NULL values as possible


◼ Attributes that are NULL, frequently could be placed in separate

relations (with the primary key)


◼ Reasons for nulls:
◼ Attribute not applicable or invalid

◼ Attribute value unknown (may exist)

◼ Value known to exist, but unavailable

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 15


1.4 Generation of Spurious Tuples – avoid
at any cost

◼ Bad designs for a relational database may result in erroneous results for
certain JOIN operations
◼ The "lossless join" property is used to guarantee meaningful results for
join operations

◼ GUIDELINE 4:
◼ The relations should be designed to satisfy the lossless join condition.

◼ No spurious tuples should be generated by doing a natural join of any


relations.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 16


Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 17
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 18
2. Functional Dependencies
• Functional dependency is a formal tool to deal with anomalies like
redundancy, update, and insertion in database design
• It is a relationship of one attribute or field in a record to another.
• It is a constraint that specifies the relationship of one attribute
with other attributes in a relation
◼ Functional dependencies (FDs)
◼ Are used to specify formal measures of the "goodness" of relational

designs
◼ And keys are used to define normal forms of relations

◼ Are constraints that are derived from the meaning and

interrelationships of the data attributes


◼ A set of attributes X functionally determines a set of attributes Y if the
value of X determines a unique value for Y

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 19


2.1 Defining Functional Dependencies
◼ X  Y holds if whenever two tuples have the same value for X,
they must have the same value for Y
◼ For any two tuples t1 and t2 in any relation instance r(R): If

t1[X]=t2[X], then t1[Y]=t2[Y]


◼ This means that the values of the Y component of a tuple in r depend
on, or are determined by, the values of the X component;
alternatively, the values of the X component of a tuple uniquely (or
functionally) determine the values of the Y component.
◼ X  Y in R specifies a constraint on all relation instances r(R)
◼ Written as X  Y; can be displayed graphically on a relation
schema as in Figures. ( denoted by the arrow: ).
◼ FDs are derived from the real-world constraints on the
attributes
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 20
Examples of FD constraints (2)
◼ An FD is a property of the attributes in the schema R
◼ The constraint must hold on every relation instance r(R)
◼ If K is a key of R, then K functionally determines all
attributes in R
◼ (since we never have two distinct tuples with

t1[K]=t2[K])

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 21


Examples of FD constraints (1)
1. Social security number determines employee name
◼ If I know someone's SSN, then I can find their name.

◼ SSN  ENAME

◼ So, name is defined name as being functionally dependent on

SSN.
2. Suppose that a company assigned each employee a unique
EmpNo. Each employee has a number and a name. Names
might be the same for two different employees, but their
employee numbers would always be different and unique
because the company defined them that way.
◼ It would be inconsistent in the database if there were two
occurrences of the same employee number with different
names.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 22


Examples of FD constraints
◼ We write a functional dependency (FD) connection with an arrow:
◼ SSN → Name
◼ EmpNo → Name.
◼ Expression SSN → Name is read as "SSN defines Name“ or
"SSN implies Name."
◼ Similarly,
◼ Project number determines project name and location
◼ PNUMBER  {PNAME, PLOCATION}

◼ Employee ssn and project number determines the hours per


week that the employee works on the project
◼ {SSN, PNUMBER}  HOURS

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 23


Examples of FD constraints

Eg:
EmpNo Job Name
101 President Herbert
104 Programmer Fred
◼ Is there a problem here? No 103 Designer Beryl
◼ Because we have FD that 103 Programmer Beryl
◼ EmpNo → Name.
◼ This means that every time we find 104, we find the name, Fred.
◼ Just because something is on the left-hand side of an FD, it does
not imply that you have a key or that it will be unique in the
database. i.e the FD X → Y only means that for every occurrence of
X you will get the same value of Y.

.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 24
Defining FDs from instances
◼ Note that to define the FDs, we need to understand the meaning of
the attributes involved and the relationship between them.
◼ An FD is a property of the attributes in the schema R
◼ Given the instance (population) of a relation, all we can conclude is
that an FD may exist between certain attributes.
◼ What we can conclude is – that certain FDs do not exist because
there are tuples that show a violation of those dependencies.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 25


Example determinant
dependent
A B A → B
1 ABC • Knowing the values of A, we can get the value of B from a
relation R
2 DEF • A determines B or B is determined by A which means B is
3 GHI functionally dependent on A

4 JKL

Student ID Name Dept_name Dept_building


Student_ID → {Name, Dept_name, Dept_building}
1 ABC CE B

2 BCD SE C Valid FD
3 XYZ CS H

4 XYZ SE C

5 EFG IT H

6 HIJ BI D
In this case, the values of
7 KLM MT B
determinant and dependent are
different
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 26
Example
Student ID Name Dept_name Dept_building Dept_name→ Dept_building
1 ABC CE B
• This is a Valid FD
2 BCD SE C
• Although data is redundant, the
3 XYZ CS H
output of the dependency is correct
4 XYZ SE C • This means that the same
5 EFG IT H determinants and dependents are
6 HIJ BI D valid FD
7 KLM MT B
• Also, there is a possibility that the
determinant is different, but the
dependents are identical, then it is a Valid
Student ID Name Dept_name Dept_building FD
1 ABC CE B
Student ID→ Name
2 BCD SE C
• This is a valid FD
3 XYZ CS H • Redundant dependent & unique
4 XYZ SE C determinant
5 EFG IT H

6 HIJ BI D Name → Dept_name


7 KLM MT B • This is an invalid FD
• Redundant determinant & different dependents
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 27
Figure 14.7 Ruling Out FDs
Note that given the state of the TEACH relation, we can
say that the FD: Text → Course may exist. However, the
FDs Teacher → Course, Teacher → Text and
Couse → Text are ruled out.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 28


Figure 14.8 What FDs may exist?

◼ A relation R(A, B, C, D) with its extension.


◼ Which FDs may exist in this relation?

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 29


Properties of Functional Dependency
◼ Trivial FD: A → B is said to be a trivial functional dependency if B is the
subset of A and A∩B is not NULL.
◼ Such a functional dependency is always valid. An attribute determining itself
(A → A) is also a trivial functional dependency.

A→B A→B

{Student ID, Name}→ Name {Student ID, Name}→


Dept_name

◼ Non-trivial FD: A → B is said to be a non-trivial functional dependency if


B is not a subset of A and A∩B is NULL.
◼ Such a functional dependency may or may not be valid

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 30


Inference Rules
◼ An inference rule asserts that a user can apply to a set of functional
dependencies to derive other FD (functional dependencies).
◼ William w. Armstrong developed these axioms in the database
management system in 1974.
◼ Some of the inference rules are:
1. Reflexive rule
2. Augmentation rule
3. Decomposition rule
4. Union rule
5. Transitive rule

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 31


Inference Rules (Reflexive Rule)
◼ If X is a composite, composed of A and B, then X→ A and X→ B.
◼ Eg: X= Name, City. Then we are saying that X → Name and X →
City.
Name City
David New York
Kaitlyn New Orleans
Chrissy Baton Rogue

◼ The rule, which seems obvious, says if I give you the combination
<Kaitlyn, New Orleans>, what is this person’s Name? What is this
person's City City?
◼ While this rule seems obvious enough, it is necessary to derive other
functional dependencies.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 32


Inference Rules (Augmentation Rule)
◼ If X→ Y, then XZ→ Y. You might call this rule, "more information is not
needed, but it doesn’t hurt."
◼ Suppose we use the same data as before with Names and Cities, and
define the FD Name → City.
◼ Now, suppose we add a column, Shoe Size:
Name City Shoe Size Name→ City,
◼ Name→
David City, New York 10 that Name + Shoe Size → City
(i.e., we augmented Name with Shoe Size).
◼ that Name + Shoe
Kaitlyn NewSize → City6
Orleans
◼ (i.e.,
Chrissywe augmented
BatonName
Roguewith3Shoe Size).
◼ Will there be a contradiction here, ever?
◼ No, because we defined Name → City, Name plus more information will
always identify the unique City for that individual.
◼ We can always add information to the LHS of an FD and still have the FD be
true.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 33


Inference Rules (Decomposition Rule)
◼ The decomposition rule says that if it is given that X → YZ (that is, X
defines both Y and Z), then X → Y and X → Z.
Name City Shoe Size • Suppose Name→ City, Shoe Size
• This means for every occurrence of Name,
David New York 10 we have a unique value of City and a
Kaitlyn New Orleans 6 unique value of Shoe Size.

Chrissy Baton Rogue 3


◼ The rule says that given Name → City and Shoe Size together, then
Name → City and Name → Shoe Size.
◼ A partial proof using the reflexive rule would be:
◼ Name → City, Shoe Size (given)
◼ City, Shoe Size → City (by the reflexive rule)
◼ Name → City (using steps 1 and 2 and the transitivity rule)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 34


Inference Rules (Union Rule)
◼ The union rule is the reverse of the decomposition rule in that if X → Y
and X → Z, then X → YZ.
◼ The same example of Name, City, and Shoe Size illustrates the rule.
◼ If we found independently or were given that Name →City and Name →
Show Size, we can immediately write Name→ City, Shoe Size.
◼ In this specific scenario we assume Name is unique but in general Name
is not a unique value, so the determinant attribute must be unique when
applying these rules.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 35


Full and Partial Functional Dependency

◼ In a relation, the attribute(s) B is fully functional


dependent on A if B is functionally dependent on A,
but not on any proper subset of A.
{ supplier_id , item_id } -> price

◼ Partial Dependency A type of functional dependency


where an attribute is functionally dependent on only
part of the primary key (primary key must be a
composite key). Eg: SalesOrderNo, ItemNo, Qty,
UnitPrice
{ name , roll_no } -> course

◼ Transitive Dependency In a relation, if an attribute(s)


A→B and B→C, then C is transitively dependent on
A via B (if A is not functionally dependent on B or C)
◼ Eg: Staff_No→Branch_No and
Branch_No→BAddress
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 36
Keys and FDs
◼ The main reason we identify the FDs and inference rules is to be able
to find keys and develop normal forms for relational databases.
◼ In any relational table, we want to find out which, if any attribute(s),
will identify the rest of the attributes.
◼ The attribute that will identify all the other attributes in a row is called
a "candidate key." A key means a ‘unique identifier’ for a row of
inform
◼ Hence, if an attribute or some combination of attributes will always
identify all the other attributes in a row, it is a "candidate" to be
"named" a key.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 37


SSN Name School Location

101 David Alabama Tuscaloosa

102 Chrissy MSU Starkville


Keys and FDs 103 Kaitlyn LSU Baton Rouge

104 Stephanie MSU Starkville

◼ Suppose we have defined the following 105 Lindsay Alabama Tuscaloosa

FDs: 106 Chloe Alabama Tuscaloosa

◼ SSN → Name
◼ SSN → School
◼ School → Location
◼ The following dependencies are derived:
◼ SSN → Name (given)
◼ SSN → School (given)
◼ SSN → Location (derived by the
transitive rule)
◼ SSN → SSN (reflexive rule (obvious))
◼ SSN → SSN, Name, School, Location
(union rule)
◼ SSN can be a candidate key and primary
key as well.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 38
Keys and FDs
◼ Keys should be a minimal set of attributes whose closure is all the
attributes in the relation — "minimal" in the sense that you want the
fewest attributes on the LHS of the FD that you choose
◼ as a key.
◼ Like in a previous example, SSN will be minimal (one attribute), whose
closure includes all the other attributes.
◼ Once a set of candidate keys has been found (which is the only one the
example we have considered), one of the candidate keys as the primary
key and move on to normal forms.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 39


3.1 Normalization of Relations
◼ Dr. Codd discovered that unnormalized relations presented certain
problems when attempts were made to update the data.
◼ Information is stored redundantly and wastage of storage, errors,

and/or inconsistencies will appear because of the redundant data


◼ Named as anomalies (insertion, deletion, updation)

◼ Normalization:
◼ The process of decomposing unsatisfactory "bad" relations by

breaking up their attributes into smaller relations


◼ Normal form:
◼ Condition using keys and FDs of a relation to certify whether a

relation schema is in a particular normal form

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 40


Normalization
◼ Normalization is the process of organizing data in a database.
◼ This includes:
◼ creating tables and establishing relationships between those tables
according to rules designed both to protect the data and to make the
database more flexible by eliminating two factors: redundancy and
inconsistent dependency.
◼ Normalization is the analysis of FDs between attributes.
◼ It is the process of decomposing relations with anomalies to produce
well-structured relations.
◼ Well-structured relation contains minimal redundancy
◼ allows insertion, modification, and deletion without errors or
inconsistencies.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 41


Normalization of Relations
◼ Normalization theory is based on the concepts of normal
forms.
◼ A relational table is said to be a particular normal form if it
satisfies a certain set of constraints.
◼ Edgar F. Codd originally established three normal forms:
1NF, 2NF, and 3NF. There are now others that are
generally accepted, but 3NF is widely considered to be
sufficient for most applications.
◼ Most tables when reaching 3NF are also in BCNF (Boyce-
Codd Normal Form).

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 42


Normalization

Table with multivalued attribute


Removing multivalued attributes

First Normal Form


Removing partial dependencies

Second Normal Form


Removing transitive dependencies

Third Normal Form


Removing remaining anomalies
Boyce-Cod Normal Form

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 43


Normalization of Relations

◼ 2NF, 3NF, BCNF


◼ based on keys and FDs of a relation schema
◼ 4NF
◼ based on keys, multi-valued dependencies :
MVDs;
◼ 5NF
◼ based on keys, join dependencies : JDs
◼ Additional properties may be needed to ensure a
good relational design (lossless join, dependency
preservation; see Chapter 15)
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 44
3.2 Practical Use of Normal Forms
◼ Normalization is carried out in practice so that the
resulting designs are of high quality and meet the
desirable properties

◼ The practical utility of these normal forms becomes


questionable when the constraints on which they are
based are hard to understand or to detect
◼ The database designers need not normalize to the
highest possible normal form
◼ (usually up to 3NF and BCNF. 4NF rarely used in practice.)
◼ Denormalization:
◼ The process of storing the join of higher normal form
relations as a base relation—which is in a lower normal
form

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 45


3.3 Definitions of Keys and Attributes
Participating in Keys (1)
◼ A superkey of a relation schema R = {A1, A2, ...., An} is a set of attributes S
subset-of R with the property that no two tuples t1 and t2 in any legal relation
state r of R will have t1[S] = t2[S]
◼ A key K is a superkey with the additional property that removing any attribute
from K will cause K not to be a superkey anymore.
◼ In the relational data model a superkey is a set of attributes that uniquely
identifies each tuple of a relation.
◼ The set of all attributes is always a superkey (the trivial superkey).

In Figure 14.1 Employess Schema, {Ssn} is a key for EMPLOYEE, whereas


{Ssn}, {Ssn, Ename}, {Ssn, Ename, Bdate}, and any set of attributes that
includes Ssn are all superkeys.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 46


Definitions of Keys and Attributes
Participating in Keys (2)
◼ If a relation schema has more than one key, each is called a
candidate key.
◼ One of the candidate keys is arbitrarily designated to be the

primary key, and the others are called secondary keys.


◼ A Prime attribute must be a member of some candidate key
◼ A Nonprime attribute is not a prime attribute—that is, it is not a
member of any candidate key.

In Figure 14.1 WORKS_ON schema, both Ssn and Pnumber are prime
attributes of WORKS_ON, whereas other attributes of WORKS_ON are
nonprime.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 47
Recall
Keys
◼ Primary Key: It is the first key used to identify one and only one instance of an
entity uniquely. An entity can contain multiple keys, and the key that is most
suitable from those lists becomes a primary key.
• Candidate Key: It is an attribute or set of attributes that can uniquely identify a
tuple. Except for the primary key, the remaining attributes are considered a
candidate key. The candidate keys are as strong as the primary key.
• Super keys are a superset of Candidate keys. Candidate keys are a subset of
Super keys.

Employee_ID Primary key


Employee_name
Employee_address
License_number
Passport_number
SSN
Candidate keys

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Recall
Keys

◼ Super Key: Super key is an attribute set that can uniquely identify a tuple. A
super key is a superset of a candidate key.
◼ Foreign Key: Foreign Key is used to establish relationships between two tables. A
foreign key will require each value in a column or set of columns to match the
Primary Key of the referential table. Foreign keys help to maintain data and
referential integrity.

Employee table
Employee_ID Super key
Employee_name
Employee_address
Department_name
License_number
Department_ID
Passport_number
SSN
Foreign key
Department_ID
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 49
Recall
Keys

◼ Alternate Key: There may be multiple unique keys in a relation. The combination
of unique attributes is a candidate key, one unique in multiple unique keys is
chosen as the primary key, and the remaining keys are known as alternate
key(s).
◼ Composite Key: When a primary key is a combination of multiple attributes, it is
known as a composite key.

Employee table
Primary
key
Employee_ID
Employee_ID Project_ID Composite
SSN Alternate Project_loc key
key

Candidate key

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 50


Recall
Keys

◼ Artificial/Surrogate Key: A key that aims to uniquely identify each record is


called a surrogate key. These kinds of keys are unique because they are created
when there is no primary key available.
◼ The data values of the artificial keys are usually numbered in a serial order.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 51


3.4 First Normal Form (1NF)
◼ Disallows
◼ composite attributes

◼ multivalued attributes

◼ nested relations; attributes whose values for an individual tuple

are non-atomic
◼ Considered to be part of the definition of a relation
◼ Most RDBMSs allow only those relations to be defined that are in
First Normal Form

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 52


Figure 14.9 Normalization into 1NF
Figure 14.9
Normalization into 1NF. (a)
A relation schema that is not
in 1NF. (b) Sample state of
relation DEPARTMENT. (c)
1NF version of the same
relation with redundancy.

We assume that each department can have several locations

• There are two ways we can look at the Dlocations attribute:


1. The domain of Dlocations contains atomic values, but some
tuples can have a set of these values. In this case, Dlocations
is not functionally dependent on the primary key Dnumber.
2. The domain of Dlocations contains sets of values and hence
is nonatomic.

In either case, the DEPARTMENT relation in Figure 14.9 is not in


1NF Slide 14- 53
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
How to achieve remove composite
attributes in 1NF?
◼ There are three main techniques to achieve the first normal form for such
a relation:
1. Remove the attribute Dlocations that violate 1NF
1. Place it in a separate relation DEPT_LOCATIONS along with the primary key
Dnumber of DEPARTMENT.
2. Expand the key so that there will be a separate tuple in the original
DEPARTMENT relation for each location of a DEPARTMENT.
1. In this case, the primary key becomes the combination {Dnumber, Dlocation}.
3. If a maximum number of values is known for the attribute for example, if it is
known that at most three locations can exist for a department replace the
Dlocations attribute by three atomic attributes: Dlocation1, Dlocation2, and
Dlocation3.
1. This solution has the disadvantage of introducing NULL values in some cases.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 54


Figure 14.10 Normalizing nested relations
into 1NF
how the EMP_PROJ relation could appear if nesting is
allowed.
• INF also disallows multivalued attributes that are
themselves composite.
• each tuple can have a relation within it.
EMP_PROJ(Ssn, Ename, {PROJS(Pnumber, Hours)})

• Each tuple represents an employee entity, and a


relation PROJS(Pnumber,Hours) within each tuple
represents the employee’s projects and the hours
per week that the employee works on each project.
• To normalize this into 1NF, remove the nested
relation attributes into a new relation and propagate
the primary key into it
• Decomposition and primary key propagation yield
the schemas EMP_PROJ1 and EMP_PROJ2

Figure 14.10
Normalizing nested relations into 1NF. (a) Schema of the EMP_PROJ relation with a nested relation
attribute PROJS. (b) Sample extension of the EMP_PROJ relation showing nested relations within
each tuple. (c) Decomposition of EMP_PROJ into relations EMP_PROJ1 and EMP_PROJ2 by
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe propagating the primary
Slide 14-key.
55
3.5 Second Normal Form (1)
◼ Uses the concepts of FDs, primary key
◼ Definitions
◼ Prime attribute: An attribute that is a member of the primary key K

◼ Full functional dependency: a FD Y -> Z where removing any


attribute from Y means the FD no longer holds.
◼ Consider a relation R(A,B,C,D) with FD AB→CD, A→C, holds a
partial dependency because AB is a candidate key.
◼ A and B are the prime attributes and C and D are the non-prime
attributes.
◼ Examples:
◼ {SSN, PNUMBER} -> HOURS is a full FD since neither SSN ->
HOURS nor PNUMBER -> HOURS hold
◼ {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial
dependency ) since SSN -> ENAME also holds anymore

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 56


Second Normal Form (2)
◼ For 2NF to hold:
◼ It is in 1NF

◼ A relation schema R is in second normal form (2NF) if every non-

prime attribute A in R is fully functionally dependent on the primary key


◼ R can be decomposed into 2NF relations via the process of 2NF
normalization or “second normalization”
The test for 2NF involves testing for functional dependencies whose left-hand side
attributes are part of the primary key. If the primary key contains a single attribute the
test need not be applied at all.

• This is 1NF but not in 2NF


• A nonprime attribute Ename violates 2NF because
of FD2
• Nonprime attributes Pname and Plocation violates
FD3
• Attributes Ssn and Pnumber are a part of the
primary key {Ssn, Pnumber} of EMP_PROJ, thus
violating the 2NF test
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 57
Figure 14.11 Normalizing into 2NF and
3NF

Figure 14.11
Normalizing into 2NF and 3NF.
(a) Normalizing EMP_PROJ into
2NF relations. (b) Normalizing
EMP_DEPT into 3NF relations.

To normalize to 2NF, decompose


EMP_PROJ into the three
relation schemas EP1, EP2, and
EP3 (all have full dependency),
each of which is in 2NF.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 58


3.6 Third Normal Form (1)
◼ A relation is said to be in 3NF, if it is in 2NF and there is
◼ no transitive dependency.
◼ Definition:
◼ A Transitive Dependency is a functional dependency between

two or more non-key attributes of a relation.


◼ Transitive functional dependency: a FD X -> Z that can
be derived from two FDs X -> Y and Y -> Z
◼ Examples:
◼ SSN -> DMGRSSN is a transitive FD

◼ Since SSN -> DNUMBER and DNUMBER -> DMGRSSN hold


◼ SSN -> ENAME is non-transitive
◼ Since there is no set of attributes X where SSN -> X and X ->
ENAME
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 59
Third Normal Form (2)
◼ According to Codd’s original definition
◼ A relation schema R is in third normal form (3NF) if it is in
2NF and no non-prime attribute A in R is transitively
dependent on the primary key
◼ R can be decomposed into 3NF relations via the process of
3NF normalization
◼ NOTE:
◼ In X -> Y and Y -> Z, with X as the primary key, we consider
this a problem only if Y is not a candidate key.
◼ When Y is a candidate key, there is no problem with the
transitive dependency.
◼ E.g., Consider EMP (SSN, Emp#, Salary ).
◼ Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 60


Normalization into 2NF and 3NF
• Figure 14.12(a) describes parcels of land for sale in various state counties.
• Suppose there are two candidate keys: Property_id# and {County_name, Lot#};
that is, lot numbers are unique only within each county, but Property_id# numbers
are unique across counties for the entire state.
• Based on the two candidate keys Property_id# and {County_name, Lot#}, the
functional dependencies FD1 and FD2 in Figure 14.12(a) hold.
FD1: property_id# →{County_name, Lot#, Area, Price, Tax_rate}

FD2:{County_name, Lot# → Property_id#, Lot#, Area, Price, Tax_rate}

FD3: ??

FD4: ??

Figure 14.12
Normalization into 2NF and
3NF. (a) The LOTS relation
with its functional
dependencies
FD1 Copyright
through FD4.
© 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 61
• Figure 14.12(a), FD3 and FD4 relational schema violates the 2NF because Tax_rate is
partially dependent on the candidate key {County_name, Lot#}, due to FD3.

FD3: County_name → Tax_rate


FD4: Area → Price

Figure 14.12
Normalization into 2NF and 3NF. (a)
The LOTS relation with its
functional dependencies
FD1 through FD4.

• To normalize LOTS into 2NF, we decompose it into the two relations LOTS1 and LOTS2,
shown in Figure 14.12(b).

Figure 14.12 b)
Decomposing into
the 2NF relations
LOTS1 and LOTS2.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 62


• LOTS2 (Figure 14.12(b)) is in 3NF.
• FD4 in LOTS1 violates 3NF because Area is not a superkey and Price is not a prime attribute
• in LOTS1. FD4: Area → Price
• To normalize LOTS1 into 3NF, we decompose it into the relation schemas LOTS1A and
LOTS1B shown in Figure 14.12(c).
Figure 14.12(c) Decomposing
LOTS1 into the 3NF relations
LOTS1A and LOTS1B. (d)
Progressive normalization of
LOTS into a 3NF design.

• Remove the attribute Price that violates 3NF from LOTS1 and place it with Area (the L.H.S
of FD4 that causes the transitive dependency) into another relation LOTS1B.
• Both LOTS1A and LOTS1B are in 3NF.

Points to note here are:


• LOTS1 violates 3NF because Price is transitively dependent on each of the candidate keys
of LOTS1 via the nonprime attribute Area.
• if a relation passes the general 3NF test, then it automatically passes the 2NF test.
• That means:
If we apply the above 3NF definition to LOTS with the dependencies FD1 through FD4,
we find that both FD3 and FD4 violate 3NF by the general definition because the LHS
County_name in FD3 is not a superkey. Therefore, we could decompose LOTS into
LOTS1A, LOTS1B, and LOTS2 directly.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 63
Figure 14.12 Normalization into 2NF and
3NF
Figure 14.12
Normalization into 2NF
and 3NF. (a) The LOTS
relation with its
functional dependencies
FD1 through FD4.
(b) Decomposing into
the 2NF relations LOTS1
and LOTS2. (c)
Decomposing LOTS1
into the 3NF relations
LOTS1A and LOTS1B.
(d) Progressive
normalization of LOTS
into a 3NF design.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 64


Normal Forms Defined Informally

◼ 1st normal form


◼ All attributes depend on the key
◼ 2nd normal form
◼ All attributes depend on the whole key
◼ 3rd normal form
◼ All attributes depend on nothing but the key

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 65


4. General Normal Form Definitions (For
Multiple Keys) (1)

◼ The above definitions consider the primary key


only
◼ The following more general definitions take into
account relations with multiple candidate keys
◼ Any attribute involved in a candidate key is a
prime attribute
◼ All other attributes are called non-prime
attributes.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 66


4.1 General Definition of 2NF (For
Multiple Candidate Keys)

◼ A relation schema R is in second normal form


(2NF) if every non-prime attribute A in R is fully
functionally dependent on every key of R
◼ In Figure 14.12 the FD
County_name → Tax_rate violates 2NF.

So second normalization converts LOTS into


LOTS1 (Property_id#, County_name, Lot#, Area, Price)
LOTS2 ( County_name, Tax_rate)
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 67
4.2 General Definition of Third Normal
Form

◼ Definition:
◼ Superkey of relation schema R - a set of attributes
S of R that contains a key of R
◼ A relation schema R is in third normal form (3NF)
if whenever a FD X → A holds in R, then either:
◼ (a) X is a superkey of R, or
◼ (b) A is a prime attribute of R
◼ LOTS1 relation violates 3NF because
Area → Price ; and Area is not a superkey in
LOTS1. (see Figure 14.12).

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 68


4.3 Interpreting the General Definition of
Third Normal Form
◼ Consider the 2 conditions in the Definition of 3NF:
A relation schema R is in third normal form (3NF) if
whenever a FD X → A holds in R, then either:
◼ (a) X is a superkey of R, or
◼ (b) A is a prime attribute of R
◼ Condition (a) catches two types of violations :
- one where a prime attribute functionally determines
a non-prime attribute. This catches 2NF violations due to
non-full functional dependencies.
-second, where a non-prime attribute functionally
determines a non-prime attribute. This catches 3NF
violations due to a transitive dependency.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 69
4.3 Interpreting the General Definition of
Third Normal Form (2)
◼ ALTERNATIVE DEFINITION of 3NF: We can restate the definition
as:
A relation schema R is in third normal form (3NF) if
every non-prime attribute in R meets both of these
conditions:
◼ It is fully functionally dependent on every key of R

◼ It is non-transitively dependent on every key of R

Note that stated this way, a relation in 3NF also meets


the requirements for 2NF.
◼ The condition (b) from the last slide takes care of the
dependencies that “slip through” (are allowable to) 3NF
but are “caught by” BCNF which we discuss next.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 70


5. BCNF (Boyce-Codd Normal Form)
◼ A relation schema R is in Boyce-Codd Normal Form
(BCNF) if whenever an FD X → A holds in R, then X is a
superkey of R
◼ Each normal form is strictly stronger than the previous
one
◼ Every 2NF relation is in 1NF
◼ Every 3NF relation is in 2NF
◼ Every BCNF relation is in 3NF
◼ There exist relations that are in 3NF but not in BCNF
◼ Hence BCNF is considered a stronger form of 3NF
◼ The goal is to have each relation in BCNF (or 3NF)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 71


BCNF (Boyce-Codd Normal Form) Example
◼ Again considering the LOTS relation schema in Figure 14.12(a) with its
four functional dependencies FD1 through FD4.
◼ Suppose that we have thousands of lots in the relation but the lots are
from only two counties: DeKalb and Fulton. Suppose also that lot sizes in
DeKalb County are only 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0 acres, whereas lot
sizes in Fulton County are restricted to 1.1, 1.2, … , 1.9, and 2.0 acres.
◼ In such a situation we would have the additional functional dependency
FD5: Area → County_name.

Figure 14.12
Normalization into 2NF and
3NF. (a) The LOTS relation
with its functional
dependencies
FD1 through FD4.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 72


Figure 14.13 Boyce-Codd normal form
• FD5 conforms to clause (b) in the general
definition of 3NF, County_name being a prime
attribute.
• The area of a lot that determines the county,
as specified by FD5, can be represented by
16 tuples in a separate relation R(Area,
County_name), since there are only 16
possible Area values

• FD5 violates BCNF in LOTS1A because Area is not a


superkey of LOTS1A.
• We can decompose LOTS1A into two BCNF relations
LOTS1AX and LOTS1AY, shown in Figure 14.13(a).
Figure 14.13 • This decomposition loses the functional dependency FD2
Boyce-Codd normal form. (a) BCNF normalization
of LOTS1A with the functional dependency FD2 because its attributes no longer coexist in the same
being lost in the decomposition. (b) A schematic
relation with FDs; it is in 3NF, but not in BCNF due relation after decomposition.
to the f.d. C → B.

• most relation schemas that are in 3NF are also in BCNF, Only if there some f.d. X → A that
holds in a relation schema R with X not being a superkey and A being a prime attribute will R
be in 3NF but not in BCNF
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 73
Figure 14.14 A relation TEACH that is in
3NF but not in BCNF
◼ Two FDs exist in relation TEACH:
◼ fd1: { student, course} -> instructor
◼ fd2: instructor -> course
◼ {student, course} is a candidate key for this
relation, and the dependencies shown follow
the pattern in Figure 14.13 (b).
◼ So, this relation is in 3NF but not
in BCNF
◼ A relation NOT in BCNF should be
decomposed to meet this property, while
possibly forgoing the preservation of all
functional dependencies in the decomposed
Figure 14.14 relations.
A relation TEACH that is in 3NF but not
BCNF.
◼ (See Algorithm 15.3)
Figure 14.13
(b) A schematic relation with FDs; it
is in 3NF, but not in BCNF due to the
f.d. C → B.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 74


Achieving the BCNF by Decomposition
◼ Three possible decompositions for relation TEACH
◼ D1: {student, instructor} and {student, course}

◼ D2: {course, instructor } and {course, student}

◼ D3: {instructor, course } and {instructor, student} ✓


◼ All three decompositions will lose fd1.
◼ We have to settle for sacrificing the functional dependency
preservation. But we cannot sacrifice the non-additivity property
after decomposition.
◼ Out of the above three, only the 3rd decomposition will not generate
spurious tuples after join.(and hence has the non-additivity property).

◼ A test to determine whether a binary decomposition (decomposition


into two relations) is non-additive (lossless) is discussed under
Property NJB on the next slide. We then show how the third
decomposition above meets the property.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 75


Test for checking non-additivity of Binary
Relational Decompositions
◼ Testing Binary Decompositions for Lossless
Join (Non-additive Join) Property
◼ Binary Decomposition: Decomposition of a
relation R into two relations.
◼ PROPERTY NJB (non-additive join test for
binary decompositions): A decomposition D =
{R1, R2} of R has the lossless join property with
respect to a set of functional dependencies F on R
if and only if either
◼ The f.d. ((R1 ∩ R2)  (R1- R2)) is in F+, or
◼ The f.d. ((R1 ∩ R2)  (R2 - R1)) is in F+.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 76


Test for checking non-additivity of Binary
Relational Decompositions
If you apply the NJB test to the 3
decompositions of the TEACH relation:
◼ D1 gives Student  Instructor or Student 
Course, none of which is true.
◼ D2 gives Course  Instructor or Course 
Student, none of which is true.
◼ However, in D3 we get Instructor  Course or
Instructor  Student.
Since Instructor  Course is indeed true, the NJB
property is satisfied and D3 is determined as a non-
additive (good) decomposition.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 77
General Procedure for achieving BCNF
when a relation fails BCNF
Here we make use the algorithm from Chapter
15 (Algorithm 15.5):
◼ Let R be the relation not in BCNF, let X be a subset-of R,
and let X → A be the FD that causes a violation of BCNF.
Then R may be decomposed into two relations:
◼ (i) R –A and (ii) X υ A.
◼ If either R –A or X υ A. is not in BCNF, repeat the
process.
Note that the f.d. that violated BCNF in TEACH was Instructor →Course.
Hence its BCNF decomposition would be :
(TEACH – COURSE) and (Instructor υ Course), which gives
the relations: (Instructor, Student) and (Instructor, Course) that we
obtained before in decomposition D3.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14-55
Chapter Summary

◼ Informal Design Guidelines for Relational


Databases
◼ Functional Dependencies (FDs)
◼ Normal Forms (1NF, 2NF, 3NF)Based on Primary
Keys
◼ General Normal Form Definitions of 2NF and 3NF
(For Multiple Keys)
◼ BCNF (Boyce-Codd Normal Form)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 14- 79

You might also like