Lesson5-NORMALIZATION(midtrem)
Lesson5-NORMALIZATION(midtrem)
Definition
Normalization:
is the process of efficiently organizing data in
a database
is the process of organizing the fields and
tables of a relational database to minimize
redundancy and ensuring data dependency.
History
Edgar F. Codd, the inventor of the
relational model, introduced the concept of
normalization and what we now know as the
First Normal Form (1NF) in 1970.
Second Normal Form (2NF) in 1971.
Third Normal Form (3NF) in 1971
Boyce-Codd Normal Form (BCNF) in 1974.
(Codd and Raymond F. Boyce)
Sixth Normal Form (6NF) in 2002 (introduced
by Chris Date, Hugh Darwen, and Nikos
Lorentzos)
Goals of Normalization
There are two goals of the normalization
process:
eliminating redundant data (for example,
storing the same data in more than one
table)
ensuring data dependencies make sense
(only storing related data in a table).
Both of these are worthy goals as they
reduce the amount of space a database
consumes and ensure that data is logically
stored.
Normalization
a relational database table (the
computerized representation of a relation)
is often described as "normalized" if it is in
the Third Normal Form.
Most 3NF tables are:
free of insertion
update
and deletion anomalies
i.e. in most cases 3NF tables adhere to
BCNF, 4NF, and 5NF (but typically not 6NF).
Objectives of normalization
Free database of modification anomalies
Minimize redesign when extending the
database structure
Make the data model for informative to
users
Avoid bias towards any particular patterns
of querying (neutral)
Objectives of normalization
Free the database of modification
anomalies
When an attempt is made to modify (update,
insert into, or delete from) a table, undesired
side-effects may follow.
Not all tables can suffer from these side-effects;
rather, the side-effects can only arise in tables
that have not been sufficiently normalized.
An insufficiently normalized table might have one
or more of the following characteristics:
Update anomaly
Insertion anomaly
Deletion anomaly
Update anomaly
The same information can be expressed on multiple
rows; therefore updates to the table may result in
logical inconsistencies.
For example:
each record in an "Employees' Skills" table might contain
an Employee ID, Employee Address, and Skill;
thus a change of address for a particular employee will
potentially need to be applied to multiple records (one for
each of his skills).
If the update is not carried through successfully—if, that
is, the employee's address is updated on some records but
not others—then the table is left in an inconsistent state.
Specifically, the table provides conflicting answers to
the question of what this particular employee's address
is.
This phenomenon is known as an update anomaly.
Insertion Anomaly
There are circumstances in which certain facts
cannot be recorded at all.
For example:
each record in a "Faculty and Their Courses" table
might contain a Faculty ID, Faculty Name, Faculty
Hire Date, and Course Code
thus we can record the details of any faculty
member who teaches at least one course, but we
cannot record the details of a newly hired faculty
member who has not yet been assigned to teach any
courses except by setting the Course Code to null.
This phenomenon is known as an insertion
anomaly.
Deletion anomaly
Under certain circumstances, deletion of data
representing certain facts necessitates deletion
of data representing completely different facts.
The "Faculty and Their Courses" table described in
the previous example suffers from this type of
anomaly, for if a faculty member temporarily
ceases to be assigned to any courses, we must
delete the last of the records on which that faculty
member appears, effectively also deleting the
faculty member.
This phenomenon is known as a deletion
anomaly.
Objectives of normalization
Minimize redesign when extending the
database structure
When a fully normalized database structure
is extended to allow it to accommodate new
types of data, the pre-existing aspects of the
database structure can remain largely or
entirely unchanged. As a result, applications
interacting with the database are minimally
affected.
Objectives of normalization
Make the data model more informative
to users
Normalized tables, and the relationship
between one normalized table and another,
mirror real-world concepts and their
interrelationships.
Objectives of normalization
Avoid bias towards any particular
pattern of querying
Normalized tables are suitable for general-
purpose querying.
This means any queries against these
tables, including future queries whose
details cannot be anticipated, are
supported. In contrast, tables that are not
normalized lend themselves to some types
of queries, but not others.
For example:
consider an online bookseller whose customers
maintain wishlists of books they'd like to have.
For the obvious, anticipated query—what books does
this customer want?—it's enough to store the
customer's wishlist in the table as, say, a
homogeneous string of authors and titles.
With this design, though, the database can answer
only that one single query. It cannot by itself answer
interesting but unanticipated queries:
What is the most-wished-for book?
Which customers are interested in WWII espionage?
How does Lord Byron stack up against his
contemporary poets?
Answers to these questions must come from special
adaptive tools completely separate from the database.
One tool might be software written especially to
handle such queries.
This special adaptive software has just one single
purpose: in effect to normalize the non-normalized
field.
Unforeseen queries can be answered trivially, and
entirely within the database framework, with a
normalized table.
Normal forms
The database community has developed a
series of guidelines for ensuring that
databases are normalized these are referred
to as normal forms.
are numbered from one (the lowest form of
normalization, referred to as first normal
form or 1NF) through five (fifth normal form
or 5NF).
In practical applications, you'll often see 1NF,
2NF, and 3NF along with the occasional 4NF.
First Normal Form (1nf)
First normal form (1NF) sets the very basic
rules for an organized database:
Eliminate duplicative columns from the same
table.
Create separate tables for each group of
related data and identify each row with a
unique column or set of columns (the
primary key).
The primary key for the repeating group is
usually a composite key.
Example (Unnormalized)
Table: SalesOrders (Un-normalized)
SalesOrderNo
Date
CustomerNo
CustomerName
CutomerAddress
ClerkNo
ClerkName
Item1Description
Item1Quantity
Item1UnitPrice
Item2Description
Item2Quantity
Item2UnitPrice
Item3Description
Item3Quantity
Item3UnitPrice
Total
Example (1NF)
Table: SalesOrders Table: OrderItems
SalesOrderNo SalesOrderNo
Date ItemNo
CustomerNo ItemDescription
CustomerName ItemQuantity
CustomerAddress ItemUnitPrice
ClerkNo
ClerkName
Total
Second normal form (2nf)
Second normal form (2NF) further addresses the
concept of removing duplicative data:
Meet all the requirements of the first normal form.
Remove subsets of data that apply to multiple rows of a
table and place them in separate tables. (Remove
partial dependencies)
Functional dependency: The value of one attribute
depends entirely on the value of another.
Partial dependency: An attribute depends on only part of
the primary key. (The primary key must be a composite
key.)
Transitive dependency: An attribute depends on an
attribute other than the primary key.
Create relationships between these new tables and
their predecessors through the use of foreign keys.
Example (2NF)
Table: OrderItems Table:
SalesOrderNo InventoryItems
ItemNo ItemNo
ItemQuantity ItemDescription
ItemUnitPrice
Third normal form (3nf)
Third normal form (3NF) goes one large
step further:
Meet all the requirements of the second
normal form.
Remove columns that are not dependent
upon the primary key. (Remove transitive
dependencies)
Start a new table for the transitively
dependent attribute and the attribute it
depends on.
Keep a copy of the key attribute in the
original table.
Example (3NF)
Table: SalesOrders Table: Customers
SalesOrderNo CustomerNo
Date CustomerName
CustomerNo CustomerAddress
ClerkNo
Total Table: Clerks
ClerkNo
ClerkName
Example (Final Tables)
Table: SalesOrders Table: InventoryItems
SalesOrderNo ItemNo
Date ItemDescription
CustomerNo
ClerkNo Table: Customers
Total CustomerNo
CustomerName
Table: OrderItems CustomerAddress
SalesOrderNo
ItemNo Table: Clerks
ItemQuantity ClerkNo
ItemUnitPrice ClerkName
Boyce-Codd Normal Form
(BCNF or 3.5NF)