0% found this document useful (0 votes)
21 views23 pages

Normalization of Database Tables

The document discusses database normalization and how to convert database tables to first, second, and third normal forms. It explains the concepts of functional dependencies and different normal forms. The goal of normalization is to minimize data redundancy and anomalies by structuring tables correctly.

Uploaded by

crystaljhoyl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views23 pages

Normalization of Database Tables

The document discusses database normalization and how to convert database tables to first, second, and third normal forms. It explains the concepts of functional dependencies and different normal forms. The goal of normalization is to minimize data redundancy and anomalies by structuring tables correctly.

Uploaded by

crystaljhoyl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

NORMALIZATION OF DATABASE

TABLES
Lesson 5

Rueda Street, Calbayog City, Samar, Philippines | +63 (055) 533 9857 | [email protected] | www.nwssu.edu.ph
Database Tables and Normalization
The table is the basic building block of database design. Consequently, the table’s structure is of great
interest.
It is possible to create poor table structures even in a good database design. How do you recognize a poor
table structure, and how do you produce a good table? The answer to both questions involves
normalization.
Normalization is a process for evaluating and correcting table structures to minimize data redundancies,
thereby reducing the likelihood of data anomalies. The normalization process involves assigning attributes
to tables based on the concept of determination discussed in Lesson 3, The Relational Database Model.
Normalization works through a series of stages called normal forms. The first three stages are described as
first normal form (1NF), second normal form (2NF), and third normal form (3NF). From a structural point of
view, 2NF is better than 1NF, and 3NF is better than 2NF. For most purposes in business database design,
3NF is as high as you need to go in the normalization process. However, you will discover that properly
designed 3NF structures also meet the requirements of fourth normal form (4NF)
.
Although normalization is a very important ingredient in database design, you should not assume that the
highest level of normalization is always the most desirable. Generally, the higher the normal form, the more
relational join operations you need to produce a specified output. Also, more resources are required by the
database system to respond to end-user queries.
A successful design must also consider end-user demand for fast performance. Therefore, you will
occasionally need to denormalize some portions of a database design to meet performance requirements.
Denormalization produces a lower normal form; that is, a 3NF will be converted to a 2NF through
denormalization. However, the price you pay for increased performance through denormalization is greater
data redundancy.
.
The Need for Normalization
Normalization is typically used in conjunction with the entity relationship modeling that you learned in the
previous lessons.
Database designers commonly use normalization in two situations. When designing a new database
structure based on the business requirements of the end users, the database designer will construct a data
model using a technique such as Crow’s Foot notation ERDs.
After the initial design is complete, the designer can use normalization to analyze the relationships among
the attributes within each entity and determine if the structure can be improved through normalization.
 By analyzing relationships among the attributes or fields in the data structure, the database designer can
use the normalization process to improve the existing data structure and create an appropriate database
design. Whether you are designing a new database structure or modifying an existing one, the
normalization process is the same.

.
To get a better idea of the normalization process, consider the simplified database activities of a
construction company that manages several building projects. Each project has its own project number,
name, assigned employees, and so on. Each employee has an employee number, name, and job
classification, such as engineer or computer technician.
The company charges its clients by billing the hours spent on each contract. The hourly billing rate
is dependent on the employee’s position. For example, one hour of computer technician time is billed at a
different rate than one hour of engineer time. Periodically, a report is generated that contains the
information displayed in Table 6.1.
The easiest short-term way to generate the required report might seem to be a table whose contents
correspond to the reporting requirements. (See Figure 6.1.)

.
Unfortunately, the structure of the dataset in Figure 6.1 does not conform to the requirements discussed in
Lesson 3. Consider the following deficiencies:
1. The project number (PROJ_NUM) is apparently intended to be a primary key (PK) or at least a part of
a PK, but it contains nulls. Given the preceding discussion, you know that PROJ_NUM + EMP_NUM
will define each row.
2. The table entries invite data inconsistencies. For example, the JOB_CLASS value “Elect. Engineer”
might be entered as “Elect.Eng.” in some cases, “El. Eng.” in others, and “EE” in still others.
3. The table displays data redundancies that yield the following anomalies:
a. Update anomalies. Modifying the JOB_CLASS for employee number 105 requires many potential
alterations, one for each EMP_NUM = 105.
b. Insertion anomalies. Just to complete a row definition, a new employee must be assigned to a project. If
the employee is not yet assigned, a phantom project must be created to complete the employee data entry.
c. Deletion anomalies. Suppose that only one employee is associated with a given project. If that employee
leaves the company and the employee data is deleted, the project information will also be deleted. To
prevent the loss of the project information, a fictitious employee must be created.
The Normalization Process
The objective of normalization is to ensure that each table conforms to the concept of well-formed
relations—in other words, tables that have the following characteristics:
• Each table represents a single subject. For example, a COURSE table will contain only data that directly
pertain to courses. Similarly, a STUDENT table will contain only student data.
• No data item will be unnecessarily stored in more than one table (in short, tables have minimum
controlled redundancy). The reason for this requirement is to ensure that the data is updated in only
one place.
• All nonprime attributes in a table are dependent on the primary key—the entire primary key and
nothing but the primary key. The reason for this requirement is to ensure that the data is uniquely
identifiable by a primary key value.
• Each table is void of insertion, update, or deletion anomalies, which ensures the integrity and
consistency of the data.

.
To accomplish the objective, the normalization process takes you through the steps that lead to
successively higher normal forms. The most common normal forms and their basic characteristic are listed
in Table 6.2:

The concept of keys is central to the discussion of normalization.


From the data modeler’s point of view, the objective of normalization is to ensure that all tables are at least
in third normal form (3NF).
Functional Dependence. Before outlining the normalization process, it is a good idea to review the
concepts of determination and functional dependence that were covered in detail in Chapter 3. Table 6.3
summarizes the main concepts.

.
Two types of functional dependencies that are of special interest in normalization are partial dependencies
and transitive dependencies. A partial dependency exists when there is a functional dependence in which
the determinant is only part of the primary key (remember the assumption that there is only one candidate
key).
• For example, if (A, B) → (C, D), B → C, and (A, B) is the primary key, then the functional dependence B
→ C is a partial dependency because only part of the primary key (B) is needed to determine the value
of C. Partial dependencies tend to be straightforward and easy to identify.
A transitive dependency exists when there are functional dependencies such that X → Y, Y → Z, and X is
the primary key. In that case, the dependency X → Z is a transitive dependency because X determines the
value of Z via Y. Unlike partial dependencies, transitive dependencies are more difficult to identify among a
set of data. Fortunately, there is an effective way to identify transitive dependencies: they occur only when
a functional dependence exists among nonprime attributes. In the previous example, the actual transitive
dependency is X → Z.
However, the dependency Y → Z signals that a transitive dependency exists. Hence, throughout the
discussion of the normalization process, the existence of a functional dependence among nonprime
attributes will be considered a sign of a transitive dependency. To address the problems related to
transitive dependencies, changes to the table structure are made based on the functional dependence that
signals the transitive dependency’s existence. Therefore, to simplify the description of normalization, from
this point forward the signaling dependency will be called the transitive dependency.
Conversion To First Normal Form
Eliminate repeating group. A repeating group derives its name from the fact that a group of multiple
entries of the same type can exist for any single key attribute occurrence.
In Figure 6.1, note that each single project number (PROJ_NUM) occurrence can reference a
group of related data entries. For example, the Evergreen project (PROJ_NUM = 15) shows
five entries at this point—and those entries are related because they each share the
PROJ_NUM = 15 characteristic. Each time a new record is entered for the Evergreen project,
the number of entries in the group grows by one. .
.

.
Conversion To Second Normal Form
Steps for conversion to the Second Normal Form.
 Step 1: Make New Tables to Eliminate Partial Dependencies
 Step 2: Reassign Corresponding Dependent Attributes

PROJECT (PROJ_NUM, PROJ_NAME)


EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)

.
Conversion To Third Normal Form
Steps for conversion to the Third Normal Form.
Step 1: Make New Tables to Eliminate Transitive Dependencies
Step 2: Reassign Corresponding Dependent Attributes
EMP_NUM → EMP_NAME, JOB_CLASS

.
.

.
In other words, after the 3NF conversion has been completed, your database will contain four
tables:.
PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
JOB (JOB_CLASS, CHG_HOUR)
ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)

.
End of Lesson 5.

You might also like