CS 380 Introduction To Database Systems: King Saud University
CS 380 Introduction To Database Systems: King Saud University
CS 380
Introduction to Database Systems
Functional Dependencies and Normalization for Relational Databases
Outline
Introduction
Informal Design Guidelines For Relation Schemas
Functional Dependencies
Inference Rules for Functional Dependencies
Normalization of Relations
Steps in Data Normalization
First Normal Form
Second Normal Form
Third Normal Form
Boyce-Codd Normal Form (BCNF)
Advantages of Normalization
Disadvantages of Normalization
Conclusion
1
Introduction
Relational database design: is the grouping of attributes to form
good relation schemas.
There are two levels of relation schemas:
The logical user view level.
The storage base relation level
Design is concerned mainly with base relations.
What are the criteria for good base relations?
Insertion Anomalies:
Occurs when it is impossible to store a fact until another fact is
known.
Example:
Cannot insert a project unless an employee is assigned to.
Cannot insert an employee unless he/she is assigned to a
project.
Delete anomalies:
Occurs when the deletion of a fact causes other facts to be
deleted.
Example:
When a project is deleted, it will result in deleting all the
employees who work on that project.
If an employee is the sole employee on a project, deleting
that employee would result in deleting the corresponding
project.
9
Modification Anomalies:
Occurs when a change in a fact causes multiple modifications to
be necessary.
Example: changing the name of project number P1 (for
example)
may cause this update to be made for all employees working on
that project.
10
11
13
Functional Dependencies
Functional dependencies (FDs) are used to specify formal measures
of the goodness of relational designs.
FDs and keys are used to define normal forms for relations.
FDs are constraints that are derived from the meaning and
interrelationships of the data attributes.
A set of attributes X functionally determines a set of attributes Y
if the value of X determines a unique value for Y
16
Functional Dependencies
X
Y holds if whenever two tuples have the same value for X,
they must have the same value for Y
X
17
Functional Dependencies
{SSN, PNUMBER}
HOURS
SSN
ENAME
PNUMBER {PNAME, PLOCATION}
18
Functional Dependencies
TEXT
COURSE
TEACHER
COURSE
19
20
SSN
{ENAME, BDATE, ADDRESS, DNUMBER}
DNUMBER
{DNAME, DMGRSSN}
Some additional functional dependencies that we can infer are:
SSN
{DNAME, DMGRSSN}
DNUMBER
DNAME
21
Normalization of Relations
Normalization is the process of decomposing relations with
anomalies to produce smaller, well structured relations.
Normalization can be accomplished and understood in stages, each
of which corresponds to a normal form.
Normal form is a state of a relation that results from applying
simple rules regarding functional dependencies (or relationships
between attributes) to that relation.
22
Normalization of Relations
Normal forms:
First Normal Form (1NF).
Second Normal Form (2NF).
Third Normal Form (3NF).
Boyce-Codd Normal Form (BCNF).
Fourth Normal Form (4NF).
Fifth Normal Form (5NF).
Normalization of Relations
Normal forms, when considered in isolation from other factors, do
not guarantee a good database design.
The process of normalization through decomposition must also
confirm the existence of additional properties that the relational
schemas, taken together, should process. These include two
properties:
The lossless join or nonadditive join property, which
guaranties that the spurious tuple generation problem does not
occur. Critical
The dependency preservation property, which ensures that
each functional dependency is represented in some individual
relation resulting after decomposition. Desirable
24
26
27
29
Not a member of
any candidate key
30
31
32
33
34
35
A holds in
37
39
40
41
Advantages of Normalization
Greater overall database organization will be gained.
The amount of unnecessary redundant data is reduced.
Data integrity is easily maintained within the database.
The database & application design processes are much more
flexible.
Security is easier to manage.
42
Disadvantages of Normalization
Produces lots of tables with a relatively small number of columns.
Probably requires joins in order to put the information back together
in the way it needs to be used - effectively reversing the
normalization.
Impacts computer performance (CPU, I/O, memory).
43
Conclusion
Data normalization is a bottom-up technique that ensures the basic
properties of the relational model:
No duplicate tuples.
No nested relations.
A more appropriate approach is to complement conceptual
modeling with data normalization.
44