0% found this document useful (0 votes)
82 views45 pages

CS 380 Introduction To Database Systems: King Saud University

This document outlines the concepts of functional dependencies and normalization for relational databases. It discusses informal design guidelines for relation schemas, including semantics of attributes, redundant information, null values, and spurious tuples. Functional dependencies are introduced as a way to formally measure schema quality. Normalization is presented as a process to decompose relations to eliminate anomalies through first, second, third, and Boyce-Codd normal forms. The document provides examples and definitions for key normalization concepts.

Uploaded by

farah_nishu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views45 pages

CS 380 Introduction To Database Systems: King Saud University

This document outlines the concepts of functional dependencies and normalization for relational databases. It discusses informal design guidelines for relation schemas, including semantics of attributes, redundant information, null values, and spurious tuples. Functional dependencies are introduced as a way to formally measure schema quality. Normalization is presented as a process to decompose relations to eliminate anomalies through first, second, third, and Boyce-Codd normal forms. The document provides examples and definitions for key normalization concepts.

Uploaded by

farah_nishu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 45

King Saud University

College of Computer & Information Sciences


Computer Science Department

CS 380
Introduction to Database Systems
Functional Dependencies and Normalization for Relational Databases

Outline
Introduction
Informal Design Guidelines For Relation Schemas
Functional Dependencies
Inference Rules for Functional Dependencies
Normalization of Relations
Steps in Data Normalization
First Normal Form
Second Normal Form
Third Normal Form
Boyce-Codd Normal Form (BCNF)
Advantages of Normalization
Disadvantages of Normalization
Conclusion
1

Introduction
Relational database design: is the grouping of attributes to form
good relation schemas.
There are two levels of relation schemas:
The logical user view level.
The storage base relation level
Design is concerned mainly with base relations.
What are the criteria for good base relations?

Informal Design Guidelines For Relation Schemas


1. Semantics of the Relation Attributes
Whenever attributes are grouped to form a relation schema, it is
assumed that attributes belonging to one relation have certain
real-world meaning and a proper interpretation associated with them.
In general the easier it is to explain the semantics of the relation, the
better the relation schema design will be.

Guideline 1: Design a relation schema so that it is easy to explain


its meaning. Do not combine attributes from multiple entity types
and relationship types into a single relation.
3

Informal Design Guidelines For Relation Schemas


2. Redundant Information in Tuples and Update Anomalies
One goal of schema design is to minimize the storage space used by
the base relations.
Grouping attributes into relation schemas has a significant effect on
storage space.
Mixing attributes of multiple entities may cause problems
Information is stored redundantly wasting storage.

Informal Design Guidelines For Relation Schemas


2. Redundant Information in Tuples and Update Anomalies
DNUMBER

Informal Design Guidelines For Relation Schemas


2. Redundant Information in Tuples and Update Anomalies
It is important to distinguish between redundancy and duplicated
data:
Duplicated data exists when an attribute has two or more
identical values in a table.
Redundancy exists if data can be deleted without any
information being lost.
Redundancy may be viewed as unnecessary duplication.

Informal Design Guidelines For Relation Schemas


2. Redundant Information in Tuples and Update Anomalies
Another serious problem is the problem of update anomalies.
Update anomalies:
Insertion anomalies.
Deletion anomalies.
Modification anomalies.

Informal Design Guidelines For Relation Schemas


2. Redundant Information in Tuples and Update Anomalies
EMP_PROJ
SSN PNumber Hours EName PName PLocation

Insertion Anomalies:
Occurs when it is impossible to store a fact until another fact is
known.
Example:
Cannot insert a project unless an employee is assigned to.
Cannot insert an employee unless he/she is assigned to a
project.

Informal Design Guidelines For Relation Schemas


2. Redundant Information in Tuples and Update Anomalies
EMP_PROJ
SSN PNumber Hours EName PName PLocation

Delete anomalies:
Occurs when the deletion of a fact causes other facts to be
deleted.
Example:
When a project is deleted, it will result in deleting all the
employees who work on that project.
If an employee is the sole employee on a project, deleting
that employee would result in deleting the corresponding
project.
9

Informal Design Guidelines For Relation Schemas


2. Redundant Information in Tuples and Update Anomalies
EMP_PROJ
SSN PNumber Hours EName PName PLocation

Modification Anomalies:
Occurs when a change in a fact causes multiple modifications to
be necessary.
Example: changing the name of project number P1 (for
example)
may cause this update to be made for all employees working on
that project.

10

Informal Design Guidelines For Relation Schemas


2. Redundant Information in Tuples and Update Anomalies
Guideline 2: Design the base relation schemas so that no insertion,
deletion, or modification anomalies are present in the relations. if
any anomalies are present, note them clearly and make sure that
the programs that update the database will operate correctly.

11

Informal Design Guidelines For Relation Schemas


3. Null Values in Tuples
In some schema designs many attributes may be grouped together
into a flat relation.
If many of the attributes do not apply to all tuples in the relation,
many null values will appear in those tuples.

Guideline 3: As far as possible, avoid placing attributes in a base


relation whose values may frequently be null. If nulls are
unavoidable, make sure that they apply in exceptional cases only
and do not apply to a majority of tuples in the relation.
12

Informal Design Guidelines For Relation Schemas


4. Generation of Spurious Tuples
Bad designs for a relational database may result in erroneous results
for certain JOIN operations.

13

Functional Dependencies
Functional dependencies (FDs) are used to specify formal measures
of the goodness of relational designs.
FDs and keys are used to define normal forms for relations.
FDs are constraints that are derived from the meaning and
interrelationships of the data attributes.
A set of attributes X functionally determines a set of attributes Y
if the value of X determines a unique value for Y

16

Functional Dependencies
X
Y holds if whenever two tuples have the same value for X,
they must have the same value for Y
X

Y in R specifies a constraint on all relation instances r(R).

17

Functional Dependencies

{SSN, PNUMBER}
HOURS
SSN
ENAME
PNUMBER {PNAME, PLOCATION}

18

Functional Dependencies

TEXT
COURSE
TEACHER
COURSE

19

Inference Rules for Functional Dependencies


Given a set of FDs F, we can infer additional FDs that hold
whenever the FDs in F hold using the following rules:
IR1 (reflexive rule): If X
Y, then X
Y.
IR2 (augmentation rule): {X
Y} then XZ
YZ.
IR3 (transitive rule): {X
Y, Y
Z} then X
Z.
IR4 (decomposition, or projective, rule): {X
YZ} then X
Y.
IR5 (union, or additive, rule): {X
Y, X
Z} then X
YZ.
IR6 (pseudotransitive rule): {X
Y, WY
Z} then WX
Z.
Form a sound and complete set of inference rules.
The set of all dependencies that include F as well as all dependencies
that can be inferred from F is called the closure of F; denoted by F .
+

20

Inference Rules for Functional Dependencies

SSN
{ENAME, BDATE, ADDRESS, DNUMBER}
DNUMBER
{DNAME, DMGRSSN}
Some additional functional dependencies that we can infer are:
SSN
{DNAME, DMGRSSN}
DNUMBER
DNAME
21

Normalization of Relations
Normalization is the process of decomposing relations with
anomalies to produce smaller, well structured relations.
Normalization can be accomplished and understood in stages, each
of which corresponds to a normal form.
Normal form is a state of a relation that results from applying
simple rules regarding functional dependencies (or relationships
between attributes) to that relation.

22

Normalization of Relations
Normal forms:
First Normal Form (1NF).
Second Normal Form (2NF).
Third Normal Form (3NF).
Boyce-Codd Normal Form (BCNF).
Fourth Normal Form (4NF).
Fifth Normal Form (5NF).

A stronger definition of 3NF

Database design as practiced in industry today pays particular


attention to normalization only up to 3NF, BCNF, or 4NF.
The database designers need not normalize to the highest possible
normal form.
23

Normalization of Relations
Normal forms, when considered in isolation from other factors, do
not guarantee a good database design.
The process of normalization through decomposition must also
confirm the existence of additional properties that the relational
schemas, taken together, should process. These include two
properties:
The lossless join or nonadditive join property, which
guaranties that the spurious tuple generation problem does not
occur. Critical
The dependency preservation property, which ensures that
each functional dependency is represented in some individual
relation resulting after decomposition. Desirable
24

Steps in Data Normalization


UNORMALISED ENTITY
Step 1: remove repeating groups
1st NORMAL FORM
Step 2: remove partial dependencies
2nd NORMAL FORM
Step 3: remove indirect dependencies
3rd NORMAL FORM
Step 4: remove multi-dependencies

Step 4: every determinate a key

4th NORMAL FORM

BOYCE-CODD NORMAL FORM


25

Steps in Data Normalization


1. First Normal Form
1NF is now considered to be part of the formal definition of a
relation in the basic (flat) relational model.
It was defined to disallow multivalued attributes, composite
attributes, and their combinations. (I.e. The only attribute values
permitted by 1NF are single atomic values).

26

Steps in Data Normalization


1. First Normal Form

27

Steps in Data Normalization


1. First Normal Form
There are three main techniques to achieve first normal form for
such a relation:
Remove the attribute DLOCATIONS that violates 1NF and
place it in a separate relation DEPT_LOCATIONS along with
the primary key DNUMBER of DEPARTMENT.
Expand the key so that there will be a separate tuple in the
original DEPARTMENT relation for each location of a
Redundancy
DEPARTMENT.
If a maximum number of values is known for the attribute (e.g. 3)
replace the DLOCATIONS attribute by three atomic attributes:
Null values
DLOCATION1, DLOCATION2, DLOCATION3.
28

Steps in Data Normalization


1. First Normal Form

Steps in Data Normalization


1. First Normal Form

Chapter 10: Functional Dependencies and Normalization for Relational Databases

29

Steps in Data Normalization


2. Second Normal Form

Not a member of
any candidate key

A relation is in 2NF if it is in 1NF and every nonprime attribute is


fully functionally dependent on the primary key.
I.e. remove any attributes which are dependent on part of the
compound key.
These attributes are put into a separate table along with that part of
the compound key.

30

Steps in Data Normalization


2. Second Normal Form

31

Steps in Data Normalization


3. Third Normal Form
A relation is in 3NF if it is in 2NF and no nonprime attribute A in R
is transitively dependent on the primary key.
I.e. Separate attributes which are dependent on another attribute
other than the primary key within the table.

32

Steps in Data Normalization


3. Third Normal Form

33

Normal Forms Defined Informally


1st normal form All attributes depend on the key.
2nd normal form All attributes depend on the whole key.
3rd normal form All attributes depend on nothing but the key.

General Definitions of Second and Third Normal Forms


The previous definitions consider the primary key only.
The following more general definitions take into account relations
with multiple candidate keys.
A relation is in 2NF if it is in 1NF and every nonprime attribute is
fully functionally dependent on every key.
A relation is in 3NF if it is in 2NF and if whenever a FD X
holds in R, then either:
X is a superkey of R, or
A is a prime attribute of R.

34

General Definitions of Second and Third Normal Forms

35

Boyce-Codd Normal Form (BCNF)


BCNF was proposed as a simpler form of 3NF, but it was found to
be stricter than 3NF.
A relation schema R is in BCNF if whenever a FD X
R, then X is a superkey of R.

A holds in

I.e. A relation is in BCNF if every determinant is a key.


Thus:
Every relation in BCNF is also in 3NF.
A relation in 3NF is not necessarily in BCNF.
The goal is to have each relation in BCNF (or 3NF).
36

Boyce-Codd Normal Form (BCNF) - Example 1

37

Boyce-Codd Normal Form (BCNF) - Example 2


DIRECTORY (EmployeeNo, EmployeeName, DepartmentName,
RoomNo, TelNo)
Where:
No employee works for more than one department.
Many employees may occupy one room.
Employee numbers are unique.
No room is shared by between departments.
Two FDs exist in the relation DIRECTORY:
EmployeeNo
{EmployeeName, DepartmentName, RoomNo,
TelNo}
RoomNo
DepartmentName
38

Boyce-Codd Normal Form (BCNF) - Example 2


All attributes are dependent on EmployeeNo (the primary key).
RoomNo is also a determinant, but not a candidate key.
This violates the definition of BCNF and therefore DIRECTOTY
must be decomposed into two relations:
EMP (EmployeeNo, EmployeeName, RoomNo, TelNo).
ALLOC (RoomNo, DepartmentName).

39

Boyce-Codd Normal Form (BCNF) - Example 3


TEACH(Student, Course, Instructor).
Two FDs exist in the relation TEACH:
{Student, Course}
Instructor
Instructor
Course
{Student, Course} is a candidate key for this relation.
This relation is in 3NF but not in BCNF.

40

Boyce-Codd Normal Form (BCNF) - Example 3


Three possible decomposition for relation TEACH:
{Student, instructor} and {Student, Course}
{Course, Instructor} and {Course, Student}
{Instructor, Course} and {instructor, Student}
All three decompositions will lose FD1.
Only the 3rd decomposition will not generate spurious tuples after
join.

41

Advantages of Normalization
Greater overall database organization will be gained.
The amount of unnecessary redundant data is reduced.
Data integrity is easily maintained within the database.
The database & application design processes are much more
flexible.
Security is easier to manage.

42

Disadvantages of Normalization
Produces lots of tables with a relatively small number of columns.
Probably requires joins in order to put the information back together
in the way it needs to be used - effectively reversing the
normalization.
Impacts computer performance (CPU, I/O, memory).

43

Conclusion
Data normalization is a bottom-up technique that ensures the basic
properties of the relational model:
No duplicate tuples.
No nested relations.
A more appropriate approach is to complement conceptual
modeling with data normalization.

44

You might also like