Database Normalization 10
Database Normalization 10
Database Normalization 10
10
o
o
o
o
o
o
o
Textbook Resources
Connolly, Begg,
Holowczak
Ch. 8
Database
Systems:
Conolly&Begg
5th Ed: 13 and 14
6th Ed: 14 and 15
Ch. 4
Ch. 14 and 15
Ch. 5
Ch. 5 and A
B
What is Normalization?
1.
2.
3.
4.
5.
6.
A relation is a set of attributes with values for each attribute such that:
Each attribute (column) value must be a single value only.
All values for a given attribute (column ) must be of the same data type.
Each attribute (column) name must be unique.
The order of attributes (columns) is insignificant
No two tuples (rows) in a relation can be identical.
The order of the tuples (rows) is insignificant.
From our discussion of E-R Modeling, we know that an Entity typically
corresponds to a relation and that the Entitys attributes become attributes of the relation.
We also discussed how, depending on the relationships between entities, copies
of attributes (the identifiers) were placed in related relations as foreign keys.
The next step is to identify functional dependencies within each relation. Click on
the __Next Page link below to learn more about the normalization process.
Functional Dependencies
The attributes listed on the left hand side of the are called determinants.
One can read A B as, A determines B. Or more specifically: Given a value for A, we
can uniquely determine one value for B.
Key: One or more attributes that uniquely identify a tuple (row) in a relation.
The selection of keys will depend on the particular application being considered.
In most cases the key for a relation will already be specified during the conversion
from the E-R model to a set of relations.
Users can also offer some guidance as to what would make an appropriate key.
Recall that no two relations should have exactly the same values, thus a
candidate key would consist of all of the attributes in a relation.
Modification Anomalies
Once our E-R model has been converted into relations, we may find that some
relations are not properly specified. There can be a number of problems:
o
Deletion Anomaly: Deleting one fact or data point from a relation results
Anomaly Example 1
Name
Street
City
State PostalCo
C101
Bill Smith
New Brunswick
NJ
07101
C102
Mary Green
11 Birch St.
Old Bridge
NJ
07066
C103
Ted Jones
3 Academy St.
Old Bridge
NJ
07066
C104
Sally Taylor
New Brunswick
NJ
07101
C105
Mary Miller
44 Toga Ct.
Farmingdale
NY
11735
Deletion Anomaly: What happens if we delete customer C105: Then we not only
remove the customer information but we also remove (lose) the fact that Farmingdale,
NY has postal code 11735.
Modification Anomaly: It is possible that when a town grows in population, the zip
code will be split into two (or more) new zip codes.
For example, if Old Bridge, NJ splits its zip code, then we will have to update many
different tuples even though we are only changing one fact about Old Bridges zip code.
Anomaly Example 2
Our dutiful consultant creates the E-R Model directly matching the purchase
order:
When we follow the steps to convert to a set of relations this results in two
relations (keys are underlined):
PO_HEADER (PO_Number, PODate, Vendor, Ship_To, ...)
ItemNum
PartNum
Description
Price
O101
I01
P99
Plate
$3.00
O101
I02
P98
Cup
$1.00
O101
I03
P77
Bowl
$2.00
O102
I01
P99
Plate
$3.00
O102
I02
P77
Bowl
$2.00
O103
I01
P33
Fork
$2.50
1.
What happens if we want to add the fact that Order O103 has quantity 5 of
part P99 ?
2.
3.
Normalization Process
o
o
o
o
o
o
o
1.
2.
3.
4.
5.
Relations can fall into one or more categories (or classes) called Normal Forms
Normal Form: A class of relations free from a certain set of modification
anomalies.
Normal forms are given names such as:
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Boyce-Codd normal form (BCNF)
Fourth normal form (4NF)
Fifth normal form (5NF)
Domain-Key normal form (DK/NF)
These forms are cumulative. A relation in Third normal form is also in 2NF and
1NF.
The Normalization Process for a given relation consists of:
Specify the Key of the relation
Specify the functional dependencies of the relation.
Sample data (tuples) for the relation can assist with this step.
Apply the definition of each normal form (starting with 1NF).
If a relation fails to meet the definition of a normal form, change the relation (most often
by splitting the relation into two new relations) until it meets the definition.
Re-test the modified/new relations to ensure they meet the definitions of each normal
form.
In the next set of notes, each of the normal forms will be defined along with an example
of the normalization steps.
1.
2.
3.
4.
5.
6.
Company
Symbol
Headquarters
Date
Close Price
Microsoft
MSFT
Redmond, WA
09/07/2013
23.96
Microsoft
MSFT
Redmond, WA
09/08/2013
23.93
Microsoft
MSFT
Redmond, WA
09/09/2013
24.01
Oracle
ORCL
Redwood Shores, CA
09/07/2013
24.27
Oracle
ORCL
Redwood Shores, CA
09/08/2013
24.14
Oracle
ORCL
Redwood Shores, CA
09/09/2013
24.33
Note that the key (which consists of the Symbol and the Date) can uniquely determine the
Company, headquarters and Close Price of the stock. Here was assume that Symbol must be
unique but Company, Headquarters, Date and Price are not unique
A relation is in second normal form (2NF) if all of its non-key attributes are
dependent on all of the key.
Another way to say this: A relation is in second normal form if it is free from
partial-key dependencies
Relations that have a single attribute for a key are automatically in 2NF.
This is one reason why we often use artificial identifiers (non-composite keys) as
keys.
Company
Symbol
Headquarters
Date
Microsoft
MSFT
Redmond, WA
09/07/2013
23.96
Microsoft
MSFT
Redmond, WA
09/08/2013
23.93
Microsoft
MSFT
Redmond, WA
09/09/2013
24.01
Oracle
ORCL
Redwood Shores, CA
09/07/2013
24.27
Oracle
ORCL
Redwood Shores, CA
09/08/2013
24.14
Oracle
ORCL
Redwood Shores, CA
09/09/2013
24.33
Close Pri
At this point we have two new relations in our relational model. The original
STOCKS relation we started with is removed form the model.
Sample data and functional dependencies for the two new relations:
COMPANY Relation:
Company
Symbol
Headquarters
Microsoft
MSFT
Redmond, WA
Oracle
ORCL
Redwood Shores, CA
STOCK_PRICES relation:
Symbol
Date
Close Price
MSFT
09/07/2013
23.96
MSFT
09/08/2013
23.93
MSFT
09/09/2013
24.01
ORCL
09/07/2013
24.27
ORCL
09/08/2013
24.14
ORCL
09/09/2013
24.33
In checking these new relations we can confirm that they meet the definition of
1NF (each one has well defined unique keys) and 2NF (no partial key dependencies).
Consider one of the new relations we created in the STOCKS example for 2nd
normal form:
Company
Symbol
Headquarters
Microsoft
MSFT
Redmond, WA
Oracle
ORCL
Redwood Shores, CA
The solution again is to split this relation up into two new relations:
STOCK_SYMBOLS(Company, Symbol)
COMPANY_HEADQUARTERS(Company, Headquarters)
This gives us the following sample data and FD for the new relations
Company
Symbol
Microsoft
MSFT
Oracle
ORCL
Company
Microsoft
Headquarters
Redmond, WA
Oracle
FD1:
Redwood Shores, CA
Company
Headquarters
Again, each of these new relations should be checked to ensure they meet the
definition of 1NF, 2NF and now 3NF.
o
o
o
o
FundID
InvestmentType
Manager
99
Common Stock
Smith
99
Municipal Bonds
Jones
33
Common Stock
Green
22
Growth Stocks
Brown
11
Common Stock
Smith
FD1:
FD2:
FD3:
o
o
o
2.
However consider what happens if we delete the tuple with FundID 22. We loose
the fact that Brown manages the InvestmentType Growth Stocks.
Therefore, while FUNDS relation is in 1NF, 2NF and 3NF, it is in BCNF because
not all determinants (Manager in FD3) are candidate keys.
The following are steps to normalize a relation into BCNF:
List all of the determinants.
See if each determinant can act as a key (candidate keys).
For any determinant that is not a candidate key, create a new relation from
the functional dependency. Retain the determinant in the original relation.
For our example:
FUNDS (FundID, InvestmentType, Manager)
3. Each of the new relations sould be checked to ensure they meet the definitions of 1NF,
2NF, 3NF and BCNF
Major
Activities
100
CIS
Baseball
100
CIS
Volleyball
100
Accounting
Baseball
100
Accounting
Volleyball
200
Marketing
Swimming
Stock Fund
Bond Fund
1.
2.
3.
4.
999
Janus Fund
Municipal Bonds
999
Janus Fund
999
Municipal Bonds
999
888
Kaufmann Fund
A few characteristics:
No regular functional dependencies
All three attributes taken together form the key.
Latter two attributes are independent of one another.
Insertion anomaly: Cannot add a stock fund without adding a bond fund
(NULL Value). Must always maintain the combinations to preserve the meaning.
Stock Fund and Bond Fund form a multivalued dependency on Portfolio ID.
PortfolioID
Stock Fund
PortfolioID
Bond Fund
Portfolio ID
Stock Fund
999
Janus Fund
999
888
Kaufmann Fund
Portfolio ID
Bond Fund
999
Municipal Bonds
999
888
There are certain conditions under which after decomposing a relation, it cannot
be reassembled back into its original form.
We dont consider these issues here.