Lab 4
Lab 4
Lab 4
October 5, 2007
Introduction
Normal forms are a theoretical framework for formalizing good and bad design of databases. These lab exercises provide practice in the mechanical aspects of normal forms: functional dependencies, candidate keys and normal forms themselves, and then tie this all together by exploring the Wine Society database introduced in lab 4. You will probably gain most from this lab by working with a partner or in a small group.
Functional dependencies
From the sets of functional dependencies given below, identify whether the set is minimal or not. If not, give an equivalent minimal set. (1) (2) (3) (4) (5) {A B, A C}. {A B, A C, B C}. {A B, B A, B C}. {P Q, P Q R}. {L S, M T , S T , LM K, K L, K M , K T }.
For the diagrams in Figure 2, write out the corresponding functional dependency list. Determine whether these FD lists are minimal, and if not produce an equivalent minimal FD list. Draw diagrams for exercises 1 to 4. 1
A B A B
(b)
D E F
(c)
A C
(d)
B D
P Q
(e)
R S
A C
B D
(f)
Keys
Correctly identifying the candidate keys of a relation is critical to determining the highest normal form, and is of great practical concern when designing a database. It is generally good practice to choose the most natural primary key, and then enforce unique constraints on alternate keys in a table. (9) Identify all candidate keys for the relations in the diagrams.
Normal Forms
Now that we have identied the candidate keys of the relations we can determine their highest normal form. For each diagram in Figure 2, (10) Identify the prime and non-prime attributes (11) Determine its highest normal form. The relation in Figure 2c represents a common practical situation, where the natural key of a relation is comprised of several attributes. If this relation is used in foreign keys from other relations, the database can become unwieldly. The standard x for this is to introduce an articial key, such as a numeric ID eld. (12) Introduce an articial key, K to this relation. Redraw the diagram and write down the functional dependencies that arise from the introduction of this key. (13) Find the highest normal form of the resulting relation.
Putting it together
Lab 4 introduced the Wine Society schema, and explored some of its problems. For reference, the denition is as follows: The Wine Society has built a database for their ordering system. Their customers are identied by a name and an address, and their stock is identied by a name. Each wine has a quantity in stock, and a price per bottle. Customers can place at most one order per day, for a quantity of any number of wines. Orders of 12 bottles or more attract a 10% discount. Customers pay a deposit on ordering of any amount up to the 2
full price of the order. The database records the order date, the shipping date, and the deposit and amount paid. The analyst who designed the database considered several possible data models, and eventually decided on one that minimized the number of entities. The resulting class diagram looked like this:
The le wine.sql denes the schema and provides some sample data.
5.1
Queries
In lab 4, we looked at the following queries. How many orders has Fred Bloggs placed ? (Answer: 3) This query is slightly tricky, because there is no entity in the database corresponding to an order. In order to identify unique orders we need to use DISTINCT or GROUP BY to identify distinct orders. Which customer has ordered the most bottles of wine ? (Answer: Fred Bloggs) This query is straightforward. Which customer has spent the most money on wine this year ? (Answer: Jane Doe) This query is tricky because of the discount rules in the requirements. These rules apply when an entire order contains a certain number of bottles, yet the database holds one row per wine in an order. In order to calculate this, we need to group by customer and date (to identify an order), then sum over the prices of these groups. Which orders are ready to ship (ie there are sucient bottles in stock to ship the whole order). This is a little dicult, because our criteria for shipping an entire order depends on conditions that apply to individual parts of the order. The Wine Society has received payment of $66.40 from Fred Bloggs. Update the database to reect this. The trick here is to update all rows that correspond to a given order. If the database had 2 open orders, the situation would be even more dicult.
5.2
Functional Dependencies
To start to analyse the problem, we rst need to look for functional dependencies among the data items. For this, we need to look at the problem statement in the rst paragraph of Section 5. Look at the sentence Each wine has a quantity in stock, and a price per bottle. From this, we can extract the functional dependencies wine name wine name price per bottle qty in stock
(14) Work through the rest of the description and extract as many functional dependencies as you can. (15) Eliminate the trivial, wasteful and transitive dependencies, to make this a minimal list of dependencies.
5.3
Normal Forms
We now focus on the wine order relation. Discard any dependencies from the list produced in step 15 above that involve attributes not in this relation. (16) Identify the key(s) of this relation. (17) Divide the attributes into prime and non-prime. (18) Determine its highest normal form. Identify the functional dependency or dependencies that violate the rst normal form it fails to satisfy.
5.4
Normalization
Normalization is the process of taking a schema that violates one or more normal forms, and producing a better one. Elmasri and Navathe1 list the following remedy for a violation of 2nd normal form: Decompose and set up a new relation for each partial key with its dependent attribute(s). Make sure to keep a relation with the original primary key and any attributes that are fully functionally dependent on it. How could we redesign this relation so that it doesnt violate 2nd normal form ? (19) Draw the resulting UML diagram. (20) Dene new tables that implement the normalised schema. Use names for the new relation(s) that dont conict with the existing tables. (21) Write SQL queries to populate the new tables from the old table. (22) Rewrite the queries for exercises 5.1 through 5.1 to run against the new schema.
References
References
[1] R. Elmasri and S. B. Navathe. Fundamentals of Database Systems. Pearson, 4th edition, 2003. [2] R. Elmasri and S. B. Navathe. Fundamentals of Database Systems. Pearson, 5th edition, 2006.
1 [1]
Denitions
The denitions given here are equivalent to the E&N denitions given in lectures.
A.1
Functional Dependencies
Denition 1 (Functional Dependency) A Functional Dependency X Y , on a relation R, is a relationship between two sets of attributes, X, Y R, such that if two tuples agree on the attributes of X, then they also agree on the attributes of Y , ie X (t1 ) = X (t2 ) Y (t1 ) = Y (t2 ) Informally, the attributes X, determine the values of the attributes Y . Denition 2 (Minimal Dependency Set) A set of dependencies F is minimal if 1. Every dependency in F has a single attribute on its right-hand side. 2. We cannot replace any dependency X A with a dependency Y A, where Y X. 3. We cannot remove any dependency from F and still have a set of dependencies that is equivalent to F.
A.2
Keys
Denition 3 (Superkey) A superkey S is a set of attributes of R, such that S R. Denition 4 (Candidate Key) A candidate key K is a minimal superkey. That is, no proper subset L K is a superkey of R. Denition 5 (Primary Key) A primary key K is one of the candidate keys. We generally choose a primary key for some practical reason.
A.3
Normal Forms
Denition 6 (Prime Attribute) An attribute that is part of one or more keys. Denition 7 (Non-prime Attribute) An attribute that is not prime, ie is not part of any key. Denition 8 (First Normal Form (1NF)) A relation with no duplicate rows, and where all attributes are single valued. Denition 9 (Second Normal Form (2NF)) A relation that is in First Normal Form, and where no non-prime attribute is functionally dependent on a proper subset of a key. Denition 10 (Third Normal Form (3NF)) A relation that is in Second Normal Form, and where no attribute is functionally dependent on a non-prime attribute. Denition 11 (Boyce-Codd Normal Form (BCNF)) A relation is in BCNF if it is in 3NF, and where there are no attributes that are functionally dependent on a subset of a key that they are not a member of. Denition 12 (Highest Normal Form) Order the normal forms {1NF,2NF,3NF,BCNF}. If a relation satises the requirements of the k th normal form, but fails to satisfy the requirements of the k + 1st , the Highest Normal Form (HNF) of a relation is the k th . For example, a relation with a transitive dependency but no partial dependencies has an HNF of 2NF.