Database Normalization
Database Normalization
Normalization
Normalization removes duplication and
minimizes redundant chunks of data.
Normalization is a set of formal conditions
that assure that a database is maintainable.
The results of a well executed normalization
process are the same as those of a well
planned E-R model
PROCESS OF DATA
NORMALIZATION
ELIMINATE REPEATING GROUPS
Make a separate table for each set of related
attributes and give each table a primary key.
ELIMINATE REDUNDANT DATA
If an attribute depends on only part of a
multivalued key, remove it to a separate table.
ELIMINATE COLUMNS NOT DEPENDENT ON KEY
If attributes do not contribute to a description of
the key, remove them to a separate table.
Database Programming and Design
PROCESS OF DATA
NORMALIZATION
ISOLATE INDEPENDENT MULTIPLE RELATIONSHIPS
No table may contain two or more 1:n or n:m
relationships that are not directly related.
ISOLATE SEMANTICALLY RELATED MULTIPLE
RELATIONSHIPS
There may be practical constraints on information
that justify separating logically related many-tomany relationships.
Database Programming and Design
Anomalies
An anomaly is essentially an
erroneous change to data, more
specifically to a single record.
Anomalies
A table anomaly is a structure for which
a normal database operation cannot
be executed without information loss
or full search of the data table
Insertion Anomaly
Deletion Anomaly
Update or Modification Anomaly
Anomalies
Insert anomalyCaused when a record is
added to a detail table, with no related
record existing in a master table.
Anomalies
Delete anomalyCaused when a record
is deleted from a master table, without
first deleting all sibling records, in a
detail table. The exception is a cascade
deletion, occurring when deletion of a
master record automatically deletes all
child records in all related detail tables,
before deleting the parent record in the
master table.
Anomalies
Delete anomalyCaused when a record
is deleted from a master table, without
first deleting all sibling records, in a
detail table. The exception is a cascade
deletion, occurring when deletion of a
master record automatically deletes all
child records in all related detail tables,
before deleting the parent record in the
master table.
Anomalies
Update anomalyThis anomaly is
similar to deletion in that both
master and detail records must be
updated to avoid orphaned detail
records. When cascading, ensure
that any primary key updates are
propagated to related child table
foreign keys.
Normal Forms
Relational theory defines a number of
structure conditions called Normal
Forms that assure that certain data
anomalies do not occur in a
database.
Normal Forms
1st Normal Form (1NF)Eliminate
repeating groups such that all records
in all tables can be identified uniquely
by a primary key in each table. In
other words, all fields other than the
primary key must depend on the
primary key.
Normal Forms
2nd Normal Form (2NF)All non-key
values must be fully functionally
dependent on the
primary key. No partial dependencies
are allowed. A partial dependency
exists when a field is
fully dependent on a part of a
composite primary key.
Normal Forms
3rd Normal Form (3NF)Eliminate
transitive dependencies, meaning that
a field is indirectly
determined by the primary key. This is
because the field is functionally
dependent on another
field, whereas the other field is
dependent on the primary key.
Normal Forms
Boyce-Codd Normal Form (BCNF)
Every determinant in a table is a
candidate key. If there is
only one candidate key, 3NF and BCNF
are one and the same.
Normal Forms
4th Normal Form (4NF)Eliminate
multiple sets of multivalued
dependencies.
Normal Forms
5th Normal Form (5NF)Eliminate
cyclic dependencies. 5NF is also
known as Projection
Normal Form (PJNF).
Normal Forms
Domain Key Normal Form (DKNF)
DKNF is the ultimate application of
normalization and is
more a measurement of conceptual
state, as opposed to a transformation
process in itself.
Normal Forms
1NF
2NF
3NF
BCNF
keys
4NF
primary keys are created on both tables where the detail table has a
composite primary
key. The composite primary key contains the master table primary
key field as the prefix field of its
primary key. Therefore, the prefix field AUTHOR on the BOOK
table is the foreign key pointing back to the
master table AUTHOR.
shows what the data looks like in the altered AUTHOR table and
the new BOOK table, previously
the AUTHORSBOOKS table.
primary keys are created on both the PUBLISHER and SUBJECT tables to uniquely identify
individual publishers and subjects within their two respective tables. Identifying relationships as
BOOK
related to PUBLISHER and BOOK related to SUBJECT causes the publisher and subject primary key
values to be included in the composite primary key of the BOOK table.
changing
the relationships between dynamic and static tables from
identifying to non-identifying.
what the data looks like in the altered BOOK table with the new PUBLISHER and SUBJECT
tables shown as well. Multiple fields of publisher and subject field information previously duplicated
on the BOOK table (as shown in Figure 4-15) is now separated into the two new PUBLISHER and
SUBJECT tables, with duplicate publishers and subjects removed from the new tables.
Figure 4-24 shows employees and tasks from the 2NF version on the left of the diagram in Figure 4-23.
Employees perform tasks in their daily routines, doing their jobs. If you were searching for the
employee
Columbia, three tasks would always be returned. Similarly, if searching for the third task shown in
Figure 4-24, two employees would always be returned. A problem would arise with this situation when
searching for an attribute specific to a particular assignment where an assignment is a single task
assigned to a single employee.
The transformation in Figure 4-25 could be conceived as being two 2NF transformations because a
many-to-one relationship is creating a more static table by creating the FOREIGN_EXCHANGE table.
A transitive dependency occurs where one field depends on another, which in turn depends on a
third
fieldthe third field typically being the primary key. A state of transitive dependency can also be
interpreted
as a field not being entirely dependent on the primary key.
There is usually a good reason for including calculated fieldsusually performance denormalization.
(Denormalization is explained as a concept in a later chapter.) In a data warehouse, calculated fields
are sometimes stored in materialized views. Data warehouse database modeling is also covered in a
later chapter.
Figure 4-30 shows removal of two often to be NULL valued fields from a table called EDITION,
creating
the new table called RANK. The result is a zero or one-to-one relationship between the RANK
and EDITION
tables. This implies that if a RANK record exists, then a corresponding EDITION record must
exist as well.
In the opposite case, however, an EDITION record can exist where a RANK record does not
have to exist.
1.
2.
3.
Description
Code Qty
Price Amount
Footballs
Sweat Shirts
Shorts
21
44
37
25.00 150
15.00 300
12.00 120
Total
6
20
10
570
1.
2.
3.
Description
Code Qty
Price Amount
Footballs
Sweat Shirts
Shorts
21
44
37
25.00 150
15.00 300
12.00 120
Total
6
20
10
570
0 Normal Form
Add keys
Remove repeating groups
ORDER
PRODUCT
1NF
{ ATTRIBUTES}
TABLE
??
ATTR-TABLE
No partial dependencies
(an attribute has a partial
dependency if it depends
on part of a concatenated
key)
STUDENT
SECTION
STUDENT-SECTION
2NF
No partial dependencies
Table has data from several connected tables.
TABLE
??
TABLE
??
No transitive dependencies
(a transitive dependency is
an attribute that depends
on other non-key
attributes)
3NF
No transitive dependencies
Table contains data from an embedded entity with
non-key attributes.
TABLE
TABLE
SUB-TABLE
??
SUB-TABLE
Every determinant is a
candidate key
BCNF
BCNF dependenceies are like 3nf
dependencies but they involve some
key attributes
Note: BCNF often arises when a 1:m
relationship is modeled as a m:n
relationship
BCNF
SALESMAN-CUST(SalesID, CustID,
Commission)
SALESMAN(SalesID, Commission)
CUSTOMER(CustID, SalesID)
No multi-valued
dependencies
Book
Class
Price
Inro Comp
MIS 2003
Parker
Intro Comp
MIS 2003
Kemp
Data in Action
MIS 4533
Kemp
Warner
Data in Action
Warner
MIS 4533
COURSE-BOOK(CourseID, Book)
COURSE-INSTR(CourseID, InstrID)
4NF
TABLE
TABLE
TABLE
TABLE
TABLE