0% found this document useful (0 votes)
35 views

Data Normalization

This document discusses database normalization. It defines normalization as a process for evaluating and correcting table structures to minimize data redundancies and reduce anomalies. The document outlines the normal forms of 1NF, 2NF, 3NF, BCNF and 4NF. It provides an example of normalizing a table with project and employee data through the stages of 1NF, 2NF and 3NF to eliminate issues like repeating groups, partial and transitive dependencies.

Uploaded by

Aaron Jude Pael
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Data Normalization

This document discusses database normalization. It defines normalization as a process for evaluating and correcting table structures to minimize data redundancies and reduce anomalies. The document outlines the normal forms of 1NF, 2NF, 3NF, BCNF and 4NF. It provides an example of normalizing a table with project and employee data through the stages of 1NF, 2NF and 3NF to eliminate issues like repeating groups, partial and transitive dependencies.

Uploaded by

Aaron Jude Pael
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Chapter

DATA
3NORMALIZATION
CHAPTER 3:
DATA NORMALIZATION
In this chapter, you will Learn:
❑ What normalization is and what role it plays in the database design
process
❑ About the normal forms 1NF, 2NF, 3NF, BCNF and 4NF
❑ How normal forms can be transformed from lower normal forms to
higher normal forms
❑ The normalization and ER modeling are used concurrently to
produce a good database design
❑ That some situations require denormalization to generate
information efficiently
DATABASE TABLE
AND NORMALIZATION
• The table is the basic building block of database design.
• Consequently, the table’s structure is of great interest.
Ideally, the database design process discussed in Entity
Relationship (ER) Modeling, yields good table structures.
Yet it is possible to create poor table structures even in a
good database design.
• So how do you recognize a poor table structure, and how do
you produce a good table?
• The answer to both questions involves normalization.
DATABASE TABLE
AND NORMALIZATION
Normalization is a process for evaluating and correcting table
structures to minimize data redundancies, thereby reducing the
likelihood of data anomalies.

Normalization works through a series of stages called normal forms.


These are:
• First normal form (1NF)
• Second normal form (2NF)
• Third normal form (3NF)
• Fourth normal form (4NF)
DATABASE TABLE
AND NORMALIZATION
• Although normalization is a very important database design
ingredient, you should not assume that the highest level of
normalization is always the most desirable.
• A successful design must also consider end-user demand for fast
performance.
• Therefore, you will occasionally be expected to denormalize some
portions of a database design in order to meet performance
requirements.
Denormalization produces a lower normal form; that is, a 3NF will be
converted to a 2NF through denormalization.
THE NEED FOR
NORMALIZATION
Normalization is typically used in conjunction with the entity relationship
modeling.
There are two common situations in which database designers use
normalization:
1. When designing a new database structure based on the business
requirements of the end users, the database designer will construct a
data model using a technique such as Crow’s Foot notation ERDs.
2. Database designers are often asked to modify existing data structures
that can be in the form of flat files, spreadsheets, or older database
structures.
THE NORMALIZATION
PROCESS
To get a better idea of the normalization process, consider the simplified database
activities of a construction company that manages several building projects.
Each project has its own project number, name, employees assigned to it,
and so on. Each employee has an employee number, name, and job
classification, such as engineer or computer technician, as shown in the next
Table.

The company charges its clients by billing the hours spent on each contract. The
hourly billing rate is dependent on the employee’s position. For example, one hour
of computer technician time is billed at a different rate than one hour of engineer
time. Periodically, a report is generated that contains the information displayed in
the second Table.
THE
Table name
NORMALIZATION
Tabular representation of the report format
RPT_FORMAT Database name ConstructCo_DB
PROCESS
THE NORMALIZATION
PROCESS
THE NORMALIZATION
PROCESS
Let’s consider the following deficiencies:
1. The project number (PROJ_NUM) is apparently intended to be a primary key or
at least a part of a PK, but it contains nulls. (Given the preceding discussion,
you know that PROJ_NUM + EMP_NUM will define each row.)
2. The table entries invite data inconsistencies. For example, the JOB_CLASS
value “Elect. Engineer” might be entered as “Elect. Eng.” in some cases,
“El. Eng.” in others, and “EE” in still others.
3. The table displays data redundancies. Those data redundancies yield the
following anomalies:
• Update anomalies
• Insertion anomalies
• Deletion anomalies
THE NORMALIZATION
PROCESS
The objective of normalization is to ensure that each table conforms to
the concept of well-formed relations – that is, tables that have the
following characteristics:
• Each table represents a single subject.
• No data item will be unnecessarily stored in more than one table
(in short, tables have minimum controlled redundancy).
• All nonprime attributes in a table are dependent on the primary
key—the entire primary key and nothing but the primary key.
• Each table is void of insertion, update, or deletion anomalies.
THE NORMALIZATION
PROCESS
To accomplish the objective, the normalization process takes you
through the steps that lead to successively higher normal forms. The
most common normal forms and their basic characteristic are listed in
Table.
NORMAL FORM CHARACTERISTICS
First normal form (1NF) Table format, no repeating groups, and PK identified
Second normal form (2NF) 1NF and no partial dependencies
Third normal form (3NF) 2NF and no transitive dependencies
Boyce-Codd normal form (BCNF) Every determinant is a candidate key (special case of 3NF)
Fourth normal form (4NF) 3NF and no independent multivalued dependencies
THE NORMALIZATION
PROCESS
A partial dependency exists when there is a functional dependence in which the
determinant is only part of the primary key (remember we are assuming there is
only one candidate key).
For example:
if (A, B) → (C,D), B → C, and (A, B) is the primary key, then the functional
dependence B → C is a partial dependency because only part of the primary
key (B) is needed to determine the value of C.

A transitive dependency exists when there are functional dependencies such that
X → Y, Y → Z, and X is the primary key. In that case, the dependency X → Z is a
transitive dependency because X determines the value of Z via Y.
CONVERSION TO
FIRST NORMAL FORM
A repeating group derives its name from the fact that a group of multiple
entries of the same type can exist for any single key attribute occurrence.

For example, the Evergreen project (PROJ_NUM = 15) shows five entries at
this point—and those entries are related because they each share the
PROJ_NUM = 15 characteristic.

The normalization process starts with a simple three-step procedure.


1. Eliminate the Repeating Groups
2. Identify the Primary Key
3. Identify all Dependencies
CONVERSION TO
FIRST NORMAL FORM
1. Eliminate the Repeating Groups
Start by presenting the data in a tabular format, where each cell has a
single value and there are no repeating groups. To eliminate the
repeating groups, eliminate the nulls by making sure that each
repeating group attribute contains an appropriate data value.
2. Identifying the Primary Key
PROJ_NUM is not an adequate primary key because the project
number does not uniquely identify all of the remaining entity (row)
attributes. To maintain a proper primary key that will uniquely identify
any attribute value, the new key must be composed of a combination of
PROJ_NUM and EMP_NUM.
THE NORMALIZATION
PROCESS
CONVERSION TO
FIRST NORMAL FORM
3. Identifying all Dependencies
The identification of the PK in Step 2 means that you have already identified the
following dependency:

PROJ_NUM, EMP_NUM → PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS

You can write that dependency as:

PROJ_NUM → PROJ_NAME

EMP_NUM → EMP_NAME, JOB_CLASS, CHG_HOUR

JOB_CLASS → CHG_HOUR
THE NORMALIZATION
PROCESS
CONVERSION TO
SECOND NORMAL FORM
Converting to 2NF is done only when the 1NF has a composite primary key. If the
1NF has a single-attribute primary key, then the table is automatically in 2NF.

The 1NF-to-2NF conversion is simple. Starting with the 1NF format, you do the
following:
1. Make New Tables to Eliminate Partial Dependencies
For each component of the primary key that acts as a determinant in a partial
dependency, create a new table with a copy of that component as the primary key.
PROJ_NUM (PROJECT)
EMP_NUM (EMPLOYEE)
PROJ_NUM EMP_NUM (ASSIGNMENT)
CONVERSION TO
SECOND NORMAL FORM
2. Reassign Corresponding Dependent Attributes
Any attributes that are not dependent in a partial dependency will remain in the
original table. In other words, the three tables that result from the conversion to
2NF are given appropriate names (PROJECT, EMPLOYEE, and ASSIGNMENT)
and are described by the following relational schemas:

PROJECT (PROJ_NUM, PROJ_NAME)

EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)

ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)


CONVERSION TO
SECOND NORMAL FORM
CONVERSION TO
THIRD NORMAL FORM
The data anomalies created by the database organization shown in Figure 6.4 are
easily eliminated by completing the following two steps:
1. Make New Tables to Eliminate Transitive Dependencies
• For every transitive dependency, write a copy of its determinant as a primary key
for a new table.
• A determinant is any attribute whose value determines other values within a
row.
• If you have three different transitive dependencies, you will have three different
determinants.
• As with the conversion to 2NF, it is important that the determinant remain in the
original table to serve as a foreign key.
CONVERSION TO
THIRD NORMAL FORM
Therefore, write the determinant for this transitive dependency as:

JOB_CLASS

2. Reassign Corresponding Dependent Attributes


Place the dependent attributes in the new tables with their determinants and
remove them from their original tables.

The EMPLOYEE table dependency definition as:

EMP_NUM → EMP_NAME, JOB_CLASS


CONVERSION TO
THIRD NORMAL FORM
In other words, after the 3NF conversion has been completed, your database will
contain four tables:

PROJECT (PROJ_NUM, PROJ_NAME)

EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)

JOB (JOB_CLASS, CHG_HOUR)

ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)


CONVERSION TO
THIRD NORMAL FORM
IMPROVING THE
DESIGN
The table structures are cleaned up to eliminate the troublesome
partial and transitive dependencies. You can now focus on improving
the database’s ability to provide information and on enhancing its
operational characteristics.

Remember that normalization cannot, by itself, be relied on to make


good designs. Instead, normalization is valuable because its use helps
eliminate data redundancies.
IMPROVING THE
DESIGN
❑ Evaluate PK Assignments
Each time a new employee is entered into the EMPLOYEE table, a JOB_CLASS
value must be entered. Unfortunately, it is too easy to make data-entry errors that
lead to referential integrity violations.

For example, entering DB Designer instead of Database Designer for the


JOB_CLASS attribute in the EMPLOYEE table will trigger such a violation.

The addition of a JOB_CODE attribute produces the dependency:

JOB_CODE → JOB_CLASS, CHG_HOUR


IMPROVING THE
DESIGN
❑ Evaluate Naming Conventions
• Change CHG_HOUR to JOB_CHG_HOUR to indicate its association with the
JOB table.
• Replace JOB_CLASS to JOB_DESCRIPTION to fit the entries better.

❑ Refine Attribute Atomicity


It is generally good practice to pay attention to the atomicity requirement.

An atomic attribute is one that cannot be further subdivided. Such an attribute is


said to display atomicity.

EMP_NAME → EMP_LNAME, EMP_FNAME, EMP_INITIAL


IMPROVING THE
DESIGN
❑ Identify New Attributes
• In EMPLOYEE table you can add EMP_HIREDATE and other attributes.
❑ Identify New Relationship
• Each project has a project manager so relationship between PROJECT and
EMPLOYEE can be implemented by using the EMP_NUM as a foreign key in
PROJECT.
❑ Refine Primary Keys as Required for Data Granularity
Granularity refers to the level of detail represented by the values stored in a
table’s row. Data stored at their lowest level of granularity are said to be atomic
data.
ASSIGN_HOURS attribute to represent the hours worked by a given employee on
a given project. However, are those values recorded at their lowest level of
granularity?
IMPROVING THE
DESIGN
❑ Maintain Historical Accuracy
• Writing the job charge per hour (ASSIGN_CHG_HOUR) into the
ASSIGNMENT table is crucial to maintaining the historical accuracy of the
data in the ASSIGNMENT table.
❑ Evaluate using Derived Attributes
• Finally, you can use a derived attribute in the ASSIGNMENT table to store
the actual charge made to a project. That derived attribute, to be named
ASSIGN_CHARGE, is the result of multiplying ASSIGN_HOURS by
ASSIGN_CHG_HOUR.
IMPROVING THE
DESIGN
HIGHER LEVEL
NORMAL FORMS
Tables in 3NF will perform suitably in business transactional databases. However,
there are occasions when higher normal forms are useful. In this section, you will
learn about a special case of 3NF, known as Boyce-Codd normal form (BCNF),
and about fourth normal form (4NF).

❖ The Boyce-Codd Normal Form (BCNF)


A table is in Boyce-Codd normal form (BCNF) when every determinant in the table
is a candidate key. Clearly, when a table contains only one candidate key, the 3NF
and the BCNF are equivalent.
HIGHER LEVEL
NORMAL FORMS
• In other words, a table is in 3NF when it is in 2NF and there
are no transitive dependencies.
• But what about a case in which a non-key attribute is the
determinant of a key attribute?
• That condition does not violate 3NF, yet it fails to meet the
BCNF requirements because BCNF requires that every
determinant in the table be a candidate key.
HIGHER LEVEL
NORMAL FORMS
The situation just described (a 3NF table that fails to meet BCNF requirements) is
shown in Figure 6.7.

Note these functional dependencies in Figure 6.7:

A + B → C, D

A + C → B, D

C→B
HIGHER LEVEL
NORMAL FORMS
To convert the table structure in Figure 6.7 into table structures that are
in 3NF and in BCNF:,
• first change the primary key to A + C. That is an appropriate action
because the dependency C → B means that C is, in effect, a
superset of B.
• At this point, the table is in 1NF because it contains a partial
dependency, C → B.
• Next, follow the standard decomposition procedures to produce the
results
HIGHER LEVEL
NORMAL FORMS
To see how this procedure can be applied to an actual problem, examine the
sample data in Table 6.5.
HIGHER LEVEL
NORMAL FORMS
HIGHER LEVEL
NORMAL FORMS
Table 6.5 reflects the following conditions:
• Each CLASS_CODE identifies a class uniquely. This condition illustrates the
case in which a course might generate many classes. For example, a course
labeled CS 202 might be taught in two classes (sections), each identified by a
unique code to facilitate registration. Thus, the CLASS_CODE 32456 might
identify CS 202, class section 1, while the CLASS_CODE 32457 might identify
CS 202, class section 2. Or the CLASS_CODE 28458 might identify IT 203,
class section 5.
• A student can take many classes. Note, for example, that student 125 has taken
both 21334 and 32456, earning the grades A and C, respectively.
• A staff member can teach many classes, but each class is taught by only one
staff member. Note that staff member 20 teaches the classes identified as 32456
and 28458.
HIGHER LEVEL
NORMAL FORMS
The structure shown in Table 6.5 is reflected in Panel A of Figure 6.9:

STU_ID + STAFF_ID → CLASS_CODE, ENROLL_GRADE

CLASS_CODE → STAFF_ID
HIGHER LEVEL
NORMAL FORMS
Conclusion
We covered the basics of data normalization and how
these techniques can help us design better, more
efficient databases. By understanding the principles of
normalization, we can create accurate and
manageable databases that meet the needs of any
business or organization. We hope you found this
presentation informative and helpful!

You might also like