Boyce
Boyce
5NF)
BCNF stands for “Boyce-Codd Normal Form”. This normal form also known as
the 3.5 Normal form of database normalization. To achieve BCNF, the database
must be already achieved to third normal form. Then following steps should be
done to achieve the BCNF.
Database must be in third normal form, before normalizing it to the fourth normal
form. If the database is already in third normal form, next step should be to remove
the multi-valued dependencies. (If one or more rows imply the presence of one or
more other rows in the same table, it is called multi-valued dependency.)
What is the difference between BCNF and 4NF (Fourth Normal Form)?
Introduction.
example:
EMPLOYEE( EmpID, EmpName, Department, DateHired )
Note that the "..." simply means there could be additional fields.
Relational Keys:
o Primary Key: column (or combination of columns) that uniquely
identifies a row. Weunderline the primary key.
o Composite Key: a primary key that includes more than one column.
o Foreign Key: a field that links one table to another table. A table can
have an unlimited number of foreign keys linking to other tables. We
use a dashed underlinefor foreign keys.
Properties of Tables (Relations):
o Each table in a database has a unique name.
o Each entry in a table at the intersection of a row/column can only
store a single value (principle of atomicity). No multivalued fields
are allowed in a table.
o Each row is unique.
o Each column has a unique field name.
o The sequence of rows as well as columns is insignificant and can be
interchanged without changing the data.
Steps in Conversion:
o 1. Represent entities.
o 2. Represent relationships.
o 3. Normalize the tables.
o 4. Merge tables with the same primary key (based on the existing
database schema for the firm - the system you are currently designing
will probably add to an existing schema).
Entity Types.
In transforming entities into tables, recall there are three types of entities:
o 1. Regular Entities: have independent existence and generally
represent real-world objects such as customers, products, departments,
etc.
o 2. Weak Entities: cannot exist except with identifying relationships
with an owner (regular) entity type.
o 3. Associate entities: formed from many-to-many relationships
between other entity types. These may be gerunds.
Transforming Entities.
One problem with the CUSTOMER table structure given above is that
theCustomerPhone column is a multivalued attribute.
A table with sample data might look like the following:
Note that this table violates the principle of atomicity - there is an attempt to
store more than one CustomerPhone in a field - and this is impossible.
The solution is to remove the multivalued attribute to a separate table along
with theprimary key (to link back to the original table). The attribute
becomes part of a composite key. The new solution consists of two tables
with sample data as shown here.
CUSTOMER_PHONE(CustomerNumber, CustomerPhone)
CustomerNumber CustomerPhone
14112 555-1212
14112 555-3434
15267 555-2424
34589 555-6774
34589 555-3443
The image shown below gives both 1:N and N:N relationships. We will
focus first on the1:N relationship named PlacesOrder.
First, you transform each entity to a table (or set of tables if there is a
multivalued attribute(s)) as described above. Each entity becomes a single
table.
Here we show the CUSTOMER and CUSTOMER_PHONE tables that we
created above, assuming the attributes remain unchanged. We also show
the CUSTOMER_ORDERtable assuming the attributes shown now as
columns for the table.
CUSTOMER(CustomerNumber, CustomerName, AccountBalance)
CUSTOMER_PHONE(CustomerNumber, CustomerPhone)
TREATMENT(TreatmentID, TreatmentDescription)
These relationships do not arise very often. As before, each entity becomes
a table.
The selection of a Foreign Key to link the two tables is somewhat arbitrary,
and should be made based upon an analysis of the primary type of query that
will be used to access the data -- termed access path analysis.
Consider the example shown in this ER diagram.
Some employees will not have an assigned parking place, but this has no
impact on the table structure. Given below are table structures for
the EMPLOYEE andPARKING_PLACE tables that show two alternative
table structures to link the tables to represent the AssignedTo relationship
by using Foreign Keys.
OR
EMPLOYEE(SSN, EmpName, EmpJobTitle, DateHired)
Compare the two solutions. Which is better? Depends upon the path
analysis.
Consider the example 1:1 and 1:N Unary relationships shown here.
We cover the 1:1 Marriage relationship first. The PERSON entity
becomes a table, and the PersonID field is the primary key. A foreign key
field is created namedMarriedToPersonID. The foreign key and primary
key share the same domain of valid values.
The 1:N Unary relationship is modeled exactly the same way as the 1:1
Unary relationship. Consider the Supervise relationship shown in the
figure above. The primary key field is EmployeeID. The relationship is
implemented by creating aManagerEmployeeID field that has the same
domain of valid values as the EmployeeIDfield.
EMPLOYEE(EmployeeID, EmployeeName, BirthDate,
ManagerEmployeeID)
BILL_OF_MATERIALS(PartNumber, ComponentPartNumber,
QtyToManufacture)
VendorNumber VendorName
10 ABC Company
20 XYZ Company
.... ....
WarehouseId Location
100 Chicago
200 St. Louis
Note the inclusion of the DateTimeShipped column as part of the key. This
is to guarantee uniqueness since the combination
of ItemNumber, WarehouseNumber, andVendorNumber are not unique.
VendorNumber VendorName
10 ABC Company
20 XYZ Company
.... ....
WarehouseId Location
100 Chicago
200 St. Louis
... ...
Consider the Associative Relationship shown below. Recall that this type
of relationship may be modeled as a Gerund with two 1:N relationships
instead of one N:Nrelationship.
Each entity becomes a table and the primary keys are as indicated in the ER
diagram.
The 1:N relationships are implemented with foreign keys in the
CERTIFICATE table.
EmployeeID EmpName
0001 Tom Jones
0002 Barry Manilow
0003 Sandra Bullock
HOURLY(EmployeeID, Wage)
Introduction.
This is a formal method to check tables for potential data storage problems
termed anomalies.
In order to clarify the discussion, we will formalize the definition of several
terms.
o KEYS: The term KEY is often confusing because it has different
meanings during design and implementation of a system.
o DESIGN: During design, KEY means a combination of one or more
attributes (columns) of a relational table that uniquely identify rows in
the table.
o KEY guarantees uniqueness; no two rows can be identical.
o IMPLEMENTATION: During implementation, the term KEY is a
column on which the DBMS builds an index or other data structure, to
allow quick access to rows. Such keys need not be unique - they may
be secondary keys enabling access to a SET of rows.
o Sometimes the terms Logical Key and Physical Key are used to
distinguish between these two meanings.
o INDEXES: Since a physical key is usually an index, we often use the
term Index for a physical key.
o Indexes are created to:
Allow quick access.
Facilitate sorting or sorted order access.
Insure no duplicates if the keyword UNIQUE is used when
defining an index.
Functional Dependency.
Assuming employees only take a course once (no time dependencies), then
uniqueness for the rows in the table is ensured by a composite key of EmpId
+ Course.
Data Anomalies.
Data Anomalies are problems with data storage caused by poorly structured
tables. Refer to the table above.
Insertion Anomaly. If the primary key is EmpId + Course, to add a new
employee, the employee must first be enrolled in a course. If an employee is
not enrolled in a course, then the COURSE column that is part of the
composite primary key will be null, and null key values are not allowed.
Deletion Anomaly. Deleting data for Employee #425 (Bill) causes us to lose
data about Algebra and the course fee for Algebra because Bill is the only
employee who has enrolled in Algebra.
Modification Anomaly. If the fee for Calculus is increased, the data must
be updated for more than one row.
Note there is also a time-sensitivity between EmpId and Course since an
employee could take a course many times, but the table does not track this
fact.
First Normal Form (1NF). Remove Repeating Groups.
Consider the following table where redundant data are eliminated from the
view:
In order to store information for employees who take more than one course,
a possible table structure is:
An obvious problem associated with the above solution is the storage of a lot
of redundant data. We can eliminate this problem by further normalizing the
table.
Second Normal Form (2NF). Remove Partial Dependencies.
There are many potential problems associated with the data redundancy.
Theseanomalies occur because not all attributes are fully dependent on the
primary key. Note that this type of problem only arises when the key is
a composite key.
The FDs here are:
Solution: Divide the table into two or more tables. Do this by removing
attributes dependent on part of the key to a separate table with just that part
of the key as the primary key. This results in three tables for this particular
modeling problem:
Course Fee
Calculus 150
Biology 200
Algebra 200
Salesperson Region
Smith South
Hicks West
Hernandez East
If a table has more than one candidate key, data storage anomalies may
result even if the table is in 3NF.
This situation arises when the candidate key is not identified as part of or
the primary key, and two of the candidate keys
have overlapping attributes. Consider the following table.
Each player may play more than one position (note Earl).
Each position can have more than one coach (note Fullback position).
For each position a player is assigned only one coach.
Each coach coaches only a single position.
Each coach can coach several players who play a position.
Anomalies: If John leaves the team, we lose the fact
that Ed coaches Guards. If we have Mel as a Quarterback coach, we
cannot insert this information into the table until a player is assigned to
play Quarterback because the Player column would be null and that is not
allowed for a column that is part of the primary key.
No single attribute is a candidate key (determines the other two attributes).
FDs for the table are:
Player Coach
Earl Joe
John Ed
Tony Pete
Earl Jim
Mack Joe
Coach Position
Joe FB
Ed G
Pete FB
Jim T
Consider the situation where each course can have several instructors and
eachcourse uses several textbooks, but the instructors do not select
the textbooks (they are independent of one another).
INSTRUCTOR(Course, Instructor)
Course Instructor
Mngt White
Mngt Green
Finance Gray
TEXT(Course, Textbook)
Course Textbook
Mngt Drucker
Finance Weston
Finance Gilford
Mngt Peters
When information about the Peters book for the Mngt course is added, only
a single row is added to the TEXT table.
This problem arises if a table is divided into two or more tables, but it is not
possible to join the tables together again to create the original view of the
data without inaccurate join results.
This problem arises in the case of ternary relationships and is covered in
detail with examples in my article, "Entity Relationship Modeling and
Normalization Errors."
Basically, this occurs when a relationship that must be modeled as
a ternary relationship is incorrectly modeled as a set of two or
more binary relationships. This is essentially the opposite problem that
arises with 4NF.
We will study this problem in more detail by reading the article on
normalization errors.