IM Module 3, Lesson 3
IM Module 3, Lesson 3
INFORMATION
MANAGEMEN
MODULE 3, LESSON T
3: NORMALIZATION OF
DATABASE TABLES
• Normalization (continued)
– 2NF is better than 1NF; 3NF is better than 2NF
– For most business database design purposes,
3NF is as high as needed in normalization
– Highest level of normalization is not always most
desirable
• Denormalization produces a lower normal form
– Increased performance but greater data
redundancy
The Need for Normalization
• Example: company that manages building projects (Figure 6.1)
– Each project has its own project number, name, assigned
employees, etc.
– Each employee has an employee number, name, job class
– Charges its clients by billing hours spent on each contract
– Hourly billing rate is dependent on employee’s position
– Total charge is a derived attribute and not stored in the
table
– Periodically, report is generated that contains information
such as displayed in Table 6.1
Tabular representation of the report format
A Sample Report Layout
The Need for Normalization
• Structure of data set in Figure 6.1 does not handle data
very well
• Table structure appears to work; report is generated
with ease
• Report may yield different results depending on what
data anomaly has occurred
– Employee can be assigned to more than one project
but each project includes only a single occurrence of
any one employee
• Relational database environment is suited to help
designer avoid data integrity problems
The Need for Normalization
• PROJECT_NUM, either a PK or part of a PK, contains NULLS
• JOB_CLASS values could be abbreviated differently
• Each time an employee is assigned to a project, all employee
information is duplicated
• Update anomalies – Modifying JOB_CLASS for employee
105 requires alterations in two records
• Insertion anomalies – to insert a new employee who has not
been assigned to a project requires a phantom project
• Deletion anomalies – If a project has only one employee
associated with it and that employee leaves, a phantom
employee must be created
The Normalization Process
Each table represents a single subject
No data item will be unnecessarily stored in more than
one table
All nonprime attributes in a table are dependent on the
primary key
Each table is void of insertion, update, and deletion
anomalies
The Normalization Process
sure that•all tables are in at least 3NF Higher forms are not likely to be encountered in business environment Normaliza
w set of relations
• based on identified dependencies
•
•
The Normalization Process
• Partial dependency
– Exists when there is a functional dependence in which the
determinant is only part of the primary key
– If (A,B)(C,D); BC and (A,B) is the PK
• BC is a partial dependency because only part of the PK, B, is needed
to determine the value of C
• Transitive dependency
– Exists when there are functional dependencies such that X → Y, Y →
Z, and X is the primary key
• XZ is a transitive dependency because X determines the value of Z via
Y
• The existence of a functional dependence among non-prime attributes is
a sign of transitive dependency
Conversion to First Normal Form
• Repeating group
– Group of multiple entries of same type can exist
for any single key attribute occurrence
• Relational table must not contain repeating
groups
• Normalizing table structure will reduce data
redundancies
• Normalization is three-step procedure
Conversion to First Normal Form
• Step 1: Eliminate the Repeating Groups
– Eliminate nulls: each repeating group attribute
contains an appropriate data value
• Step 2: Identify the Primary Key
– Must uniquely identify attribute value
– New key must be composed
• Step 3: Identify All Dependencies
– Dependencies are depicted with a diagram
Conversion to First Normal Form
Conversion to First Normal Form
• Dependency diagram:
– Depicts all dependencies found within given table
structure
– Helpful in getting bird’s-eye view of all relationships
among table’s attributes
– Makes it less likely that you will overlook an important
dependency
– The arrows above the attributes indicate desirable
dependencies (i.e., based on the PK)
– The arrows below the attributes indicate less
desirable dependencies (partial and transitive)
Conversion to First Normal Form
Conversion to First Normal Form
PROJECT(PROJ_NUM, PROJ_NAME,EMP_NUM)