Data Normalization
Data Normalization
DATA
3NORMALIZATION
CHAPTER 3:
DATA NORMALIZATION
In this chapter, you will Learn:
❑ What normalization is and what role it plays in the database design
process
❑ About the normal forms 1NF, 2NF, 3NF, BCNF and 4NF
❑ How normal forms can be transformed from lower normal forms to
higher normal forms
❑ The normalization and ER modeling are used concurrently to
produce a good database design
❑ That some situations require denormalization to generate
information efficiently
DATABASE TABLE
AND NORMALIZATION
• The table is the basic building block of database design.
• Consequently, the table’s structure is of great interest.
Ideally, the database design process discussed in Entity
Relationship (ER) Modeling, yields good table structures.
Yet it is possible to create poor table structures even in a
good database design.
• So how do you recognize a poor table structure, and how do
you produce a good table?
• The answer to both questions involves normalization.
DATABASE TABLE
AND NORMALIZATION
Normalization is a process for evaluating and correcting table
structures to minimize data redundancies, thereby reducing the
likelihood of data anomalies.
The company charges its clients by billing the hours spent on each contract. The
hourly billing rate is dependent on the employee’s position. For example, one hour
of computer technician time is billed at a different rate than one hour of engineer
time. Periodically, a report is generated that contains the information displayed in
the second Table.
THE
Table name
NORMALIZATION
Tabular representation of the report format
RPT_FORMAT Database name ConstructCo_DB
PROCESS
THE NORMALIZATION
PROCESS
THE NORMALIZATION
PROCESS
Let’s consider the following deficiencies:
1. The project number (PROJ_NUM) is apparently intended to be a primary key or
at least a part of a PK, but it contains nulls. (Given the preceding discussion,
you know that PROJ_NUM + EMP_NUM will define each row.)
2. The table entries invite data inconsistencies. For example, the JOB_CLASS
value “Elect. Engineer” might be entered as “Elect. Eng.” in some cases,
“El. Eng.” in others, and “EE” in still others.
3. The table displays data redundancies. Those data redundancies yield the
following anomalies:
• Update anomalies
• Insertion anomalies
• Deletion anomalies
THE NORMALIZATION
PROCESS
The objective of normalization is to ensure that each table conforms to
the concept of well-formed relations – that is, tables that have the
following characteristics:
• Each table represents a single subject.
• No data item will be unnecessarily stored in more than one table
(in short, tables have minimum controlled redundancy).
• All nonprime attributes in a table are dependent on the primary
key—the entire primary key and nothing but the primary key.
• Each table is void of insertion, update, or deletion anomalies.
THE NORMALIZATION
PROCESS
To accomplish the objective, the normalization process takes you
through the steps that lead to successively higher normal forms. The
most common normal forms and their basic characteristic are listed in
Table.
NORMAL FORM CHARACTERISTICS
First normal form (1NF) Table format, no repeating groups, and PK identified
Second normal form (2NF) 1NF and no partial dependencies
Third normal form (3NF) 2NF and no transitive dependencies
Boyce-Codd normal form (BCNF) Every determinant is a candidate key (special case of 3NF)
Fourth normal form (4NF) 3NF and no independent multivalued dependencies
THE NORMALIZATION
PROCESS
A partial dependency exists when there is a functional dependence in which the
determinant is only part of the primary key (remember we are assuming there is
only one candidate key).
For example:
if (A, B) → (C,D), B → C, and (A, B) is the primary key, then the functional
dependence B → C is a partial dependency because only part of the primary
key (B) is needed to determine the value of C.
A transitive dependency exists when there are functional dependencies such that
X → Y, Y → Z, and X is the primary key. In that case, the dependency X → Z is a
transitive dependency because X determines the value of Z via Y.
CONVERSION TO
FIRST NORMAL FORM
A repeating group derives its name from the fact that a group of multiple
entries of the same type can exist for any single key attribute occurrence.
For example, the Evergreen project (PROJ_NUM = 15) shows five entries at
this point—and those entries are related because they each share the
PROJ_NUM = 15 characteristic.
PROJ_NUM → PROJ_NAME
JOB_CLASS → CHG_HOUR
THE NORMALIZATION
PROCESS
CONVERSION TO
SECOND NORMAL FORM
Converting to 2NF is done only when the 1NF has a composite primary key. If the
1NF has a single-attribute primary key, then the table is automatically in 2NF.
The 1NF-to-2NF conversion is simple. Starting with the 1NF format, you do the
following:
1. Make New Tables to Eliminate Partial Dependencies
For each component of the primary key that acts as a determinant in a partial
dependency, create a new table with a copy of that component as the primary key.
PROJ_NUM (PROJECT)
EMP_NUM (EMPLOYEE)
PROJ_NUM EMP_NUM (ASSIGNMENT)
CONVERSION TO
SECOND NORMAL FORM
2. Reassign Corresponding Dependent Attributes
Any attributes that are not dependent in a partial dependency will remain in the
original table. In other words, the three tables that result from the conversion to
2NF are given appropriate names (PROJECT, EMPLOYEE, and ASSIGNMENT)
and are described by the following relational schemas:
JOB_CLASS
A + B → C, D
A + C → B, D
C→B
HIGHER LEVEL
NORMAL FORMS
To convert the table structure in Figure 6.7 into table structures that are
in 3NF and in BCNF:,
• first change the primary key to A + C. That is an appropriate action
because the dependency C → B means that C is, in effect, a
superset of B.
• At this point, the table is in 1NF because it contains a partial
dependency, C → B.
• Next, follow the standard decomposition procedures to produce the
results
HIGHER LEVEL
NORMAL FORMS
To see how this procedure can be applied to an actual problem, examine the
sample data in Table 6.5.
HIGHER LEVEL
NORMAL FORMS
HIGHER LEVEL
NORMAL FORMS
Table 6.5 reflects the following conditions:
• Each CLASS_CODE identifies a class uniquely. This condition illustrates the
case in which a course might generate many classes. For example, a course
labeled CS 202 might be taught in two classes (sections), each identified by a
unique code to facilitate registration. Thus, the CLASS_CODE 32456 might
identify CS 202, class section 1, while the CLASS_CODE 32457 might identify
CS 202, class section 2. Or the CLASS_CODE 28458 might identify IT 203,
class section 5.
• A student can take many classes. Note, for example, that student 125 has taken
both 21334 and 32456, earning the grades A and C, respectively.
• A staff member can teach many classes, but each class is taught by only one
staff member. Note that staff member 20 teaches the classes identified as 32456
and 28458.
HIGHER LEVEL
NORMAL FORMS
The structure shown in Table 6.5 is reflected in Panel A of Figure 6.9:
CLASS_CODE → STAFF_ID
HIGHER LEVEL
NORMAL FORMS
Conclusion
We covered the basics of data normalization and how
these techniques can help us design better, more
efficient databases. By understanding the principles of
normalization, we can create accurate and
manageable databases that meet the needs of any
business or organization. We hope you found this
presentation informative and helpful!