Normalization Example
Normalization Example
I. Unnormalized Table
A. A data table is unnormalized if:
1. We look at a logical record (for example, each student record below), and there are repeating groups—in
this case, each student record has repeating groups of class information
2. We look at each physical row, and find non-unique rows—in this case, there are two rows with no information
for SSS, Lname, Major, Advisor, but with identical values for the remaining attributes (shaded);
SSN LNAME Major Advisor CrsCode Course# Section CrsName Credits Grade Faculty FacDept
888665555 Borg CIS Patel 1111 CIS3400 PR13 DBMS I 3 A Burns CIS
453453453 Park ACCT Adams 5432 ACC3300 TV24 Acct. Info Sys 4 B Lewin CIS
2345 STA3154 MW6 Sampling & Audit 3 C Figueroa Stat
987987987 Jabbar FIN Al-Safir 4321 ECO4000 TR73 Econometrics 3 D Han ECO
1111 CIS3400 PR13 DBMS I 3 A Burns CIS
666884444 Narayan CIS Patel 1212 CIS5800 PR24 Analysis & Design 3 A Burns CIS
123456789 Smith ACCT Adams 5432 ACC3300 TV24 Acct. Info Sys 4 B Lewin CIS
1212 CIS5800 PR24 Analysis & Design 3 B Burns CIS
+
987654321 Wallace CIS Brathwaite 3412 CIS3400 XZ13 DBMS I 3 D Lewin CIS
1212 CIS5800 PR24 Analysis & Design 3 B Burns CIS
+
333445555 Wong ACCT Zafiris 2345 STA3154 MW6 Sampling & Audit 3 B Figueroa Stat
1212 CIS5800 PR24 Analysis & Design 3 A Burns CIS
4321 ECO4000 TR73 Econometrics 3 B Han ECO
3412 CIS3400 XZ13 DBMS I 3 C Lewin CIS
+
999887777 Zelaya ACCT Zafiris 5432 ACC3300 TV24 Acct. Info Sys 4 B Lewin CIS
2345 STA3154 MW6 Sampling & Audit 3 A Figueroa Stat
Note: Fac_Dept refers to the faculty member's department, not the Coursecode's department. For example, a faculty member can be in CIS but teach and
Accounting course.
Credits refers to how many credits an individual course is, not the number of credits a student has accumulated.
2
II. First Normal Form (1NF)
A. A relation is in first normal form if it meets the definition of a relation:
1. Each column (attribute) value must be a single value only.
2. All values for a given column (attribute) must be of the same type.
3. Each column (attribute) name must be unique.
4. The order of columns is insignificant.
5. No two rows (tuples) in a relation can be identical.
6. The order of the rows (tuples) is insignificant.
B. If you have a key defined for the relation, then you can meet the unique row requirement.
C. We put the unnormalized table into 1NF by creating separate tuples for each repeating item.
SSN LNAME Major Advisor CrsCode Course# Section CrsName Credits Grd Faculty FacDept
888665555 Borg CIS Patel 1111 CIS3400 PR13 DBMS I 3 A Burns CIS
453453453 Park ACCT Adams 5432 ACC3300 TV24 Acct. Info Sys 4 B Lewin CIS
453453453 Park ACCT Adams 2345 STA3154 MW6 Sampling & Audit 3 C Figueroa Stat
987987987 Jabbar FIN Al-Safir 4321 ECO4000 TR73 Econometrics 3 D Han ECO
987987987 Jabbar FIN Al-Safir 1111 CIS3400 PR13 DBMS I 3 A Burns CIS
666884444 Narayan CIS Patel 1212 CIS5800 PR24 Analysis & Design 3 A Burns CIS
123456789 Smith ACCT Adams 5432 ACC3300 TV24 Acct. Info Sys 4 B Lewin CIS
123456789 Smith ACCT Adams 1212 CIS5800 PR24 Analysis & Design 3 B+ Burns CIS
987654321 Wallace CIS Brathwaite 3412 CIS3400 XZ13 DBMS I 3 D Lewin CIS
987654321 Wallace CIS Brathwaite 1212 CIS5800 PR24 Analysis & Design 3 B+ Burns CIS
333445555 Wong ACCT Zafiris 2345 STA3154 MW6 Sampling & Audit 3 B Figueroa Stat
333445555 Wong ACCT Zafiris 1212 CIS5800 PR24 Analysis & Design 3 A Burns CIS
333445555 Wong ACCT Zafiris 4321 ECO4000 TR73 Econometrics 3 B Han ECO
333445555 Wong ACCT Zafiris 3412 CIS3400 XZ13 DBMS I 3 C+ Lewin CIS
999887777 Zelaya ACCT Zafiris 5432 ACC3300 TV24 Acct. Info Sys 4 B Lewin CIS
999887777 Zelaya ACCT Zafiris 2345 STA3154 MW6 Sampling & Audit 3 A Figueroa Stat
3
III. Second Normal Form (2NF)
A. A relation is in second normal form (2NF) if:
1. It is already in First Normal Form
2. all of its non-key attributes are dependent on all of the key, i.e., there are no partial dependencies
B. Relations that have a single attribute for a key are automatically in 2NF!!!
1. If there is only one attribute in the key, then there can be no partial dependencies;
2. This is one reason why we often use artificial identifiers as keys
C. Steps to normalize to 2NF
1. create a new relation with the attributes (both determinants and dependents) from the partial dependency)
a) remove any duplicate tuples from this new relation;
2. modify the original relation by removing the dependent attributes from the partial dependency;
a) keep the determinant in the modified relation;
D. In the previous 1NF relation, we have the following dependencies on the determinants, SSN and
CrsCode:
- SSN, CrsCode à Grade
- SSN àLname, Major, Advisor
- CrsCode à Course#, CrsName, Credits, Section, Faculty, FacDept
1. If we say the SSN, CrsCode is the key of this relation, it is obvious that there are attributes that are
dependent only on SSN, or only on CrsCode, and so the previous relation is NOT in 2NF. So we break it up
by making a relation for each determinant above. In these relations each non-key field is dependent on the
entire key of that relation (see next page).
E. The algorithm to convert to 2nf:
- CREATE the new relation and give it an appropriate name
- COPY the determinant of the partial dependency into the new relation, and make it the primary key of
that relation;
- CUT the dependent fields of the partial dependency out of the old relation, and paste them into the new
relation;
4
STUDENT (2NF)
STUDENT_SECTION (2 NF)
SSN LNAME Major Advisor
SSN CrsCode Grd
888665555 Borg CIS Patel
888665555 1111 A
453453453 Park ACCT Adams
453453453 5432 B
987987987 Jabbar FIN Al-Safir
453453453 2345 C
666884444 Narayan CIS Patel
987987987 4321 D
123456789 Smith ACCT Adams
987987987 1111 A
987654321 Wallace CIS Brathwaite
666884444 1212 A
333445555 Wong ACCT Zafiris
123456789 5432 B
999887777 Zelaya ACCT Zafiris
123456789 1212 B+
987654321 3412 D SSN à Lname, Major, Advisor
987654321 1212 B+
333445555 2345 B
SECTION (2NF)
333445555 1212 A
CrsCode Course# CrsName Credits Section Faculty FacDept
333445555 4321 B 1111 CIS3400 DBMS I 3 PR13 Burns CIS
333445555 3412 C+ 1212 CIS5800 Analysis & Design 3 PR24 Burns CIS
999887777 5432 B 2345 STA3154 Sampling & Audit 3 MW6 Figueroa Stat
999887777 2345 A 3412 CIS3400 DBMS I 3 XZ13 Lewin CIS
SSN, CrsCode à Grd 4321 ECO4000 Econometrics 3 TR73 Han ECO
5432 ACC3300 Acct. Info Sys 4 TV24 Lewin CIS
CrsCode à Course#, CrsName, Credits, Section, Faculty, FacDept
5
IV. Third Normal Form
A. A relation is in third normal form (3NF) if it is in second normal form and it contains no transitive
dependencies.
B. Consider relation R containing attributes A, B and C.
If A -> B and B -> C then A -> C
C. Transitive Dependency: Three attributes with the above dependencies.
D. Implications of Transitive Dependency for normalization:
1. In the example above, A will always be the primary key; B and C will be non-key attributes;
2. You need at least three attributes in the relation for a transitive dependency to exist;
3. If there is only one non-key attribute, there are no transitive dependencies;
Let's examine the SECTION relation
SECTION (2NF)
CrsCode Course# CrsName Credits Section Faculty FacDept
1111 CIS3400 DBMS I 3 PR13 Burns CIS
1212 CIS5800 Analysis & Design 3 PR24 Burns CIS
2345 STA3154 Sampling & Audit 3 MW6 Figueroa Stat
3412 CIS3400 DBMS I 3 XZ13 Lewin CIS
4321 ECO4000 Econometrics 3 TR73 Han ECO
5432 ACC3300 Acct. Info Sys 4 TV24 Lewin CIS
- CrsCode àCourse#, CrsName, Credits, Section, Faculty, FacDept
- Course# à CrsName, Credits
- Faculty à FacDept
There are two transitive dependencies. This will create anomalies as we add and delete data. We can remove the
anomalies by removing the transitive dependency. We do this by putting the non-key dependencies into their own
relations (next page):
6
E. The algorithm to convert to 3nf:
- CREATE the new relation and give it an appropriate name
- COPY the determinant of the transitive dependency into the new relation, and make it the primary key of
that relation;
o Remember to make the determinant a foreign key in the original relation;
- CUT the dependent fields of the transitive dependency out of the old relation, and paste them into the
new relation;
SECTION (3NF) COURSE (3NF)
CrsCode Course# Section Faculty Course# CrsName Credits
1111 CIS3400 PR13 Burns CIS3400 DBMS I 3
1212 CIS5800 PR24 Burns ACC3300 Acct. Info Sys 4
2345 STA3154 MW6 Figueroa STA3154 Sampling & Audit 3
3412 CIS3400 XZ13 Lewin ECO4000 Econometrics 3
4321 ECO4000 TR73 Han CIS5800 Analysis & Design 3
5432 ACC3300 TV24 Lewin
Course# à CrsName, Credits
CrsCode àCourse#, Section, Faculty
Faculty (3NF)
Teacher Dept.
Burns CIS
Lewin CIS
Figueroa Stat
Han ECO
Faculty à FacDept
• For the Course# à CrsName, Credits transitive dependency, we create a new table called COURSE that holds
the three attributes. We remove CrsName and Credits from the original relation, keeping Course# as a foreign key.
• For the Faculty à FacDept transitive dependency, we create a new relation called FACULTY. We remove
FacDept from the SECTION relation, but keep Faculty as a foreign key to the new relation.
7
V. Boyce-Codd Normal Form (BCNF)
A. A relation is in BCNF if
1. It is already in third normal form (3 NF)
2. Every determinant is a candidate key.
B. Consider relation R containing attributes A, B and C where
. A à B and B à C
. A is the primary key of R, and contains as least two fields, say C and D;
. B is a non-key field in R but a determinant of C;
. C is a subset of the primary key A;
. This creates a looping dependency, because we will have:
. C, D à B and B à C
. Thus, relation R is in 3rd normal form, because there are no transitive dependencies, but we still have a
determinant that is not a key (That is, B).
8
E. Example: We modify the STUDENT relation to allow students to have two majors, and thus two advisors
STUDENT (BCNF)
STUDENT (variation) SSN LNAME
SSN LName Major Advisor Appt_Dt 888665555 Borg
888665555 Borg CIS Patel 2/01/2005 453453453 Park
453453453 Park ACCT Adams 2/02/2005 987987987 Jabbar
987987987 Jabbar FIN Al-Safir 2/01/2005 666884444 Narayan
666884444 Narayan CIS Patel 3/30/2005 123456789 Smith
123456789 Smith ACCT Adams 4/29/2005 987654321 Wallace
123456789 Smith MGT Fazio 4/30/2005 333445555 Wong
987654321 Wallace CIS Brathwaite 3/30/2005 999887777 Zelaya
333445555 Wong ACCT Zafiris 3/31/2005
999887777 Zelaya ACCT Zhou 4/11/2005 SSN -> LName
999887777 Zelaya CIS Patel 3/19/2005
STUD_ADV (3 NF)
SSN Major Advisor Appt_Dt
SSN -> LName 888665555 CIS Patel 2/01/2005
SSN, Major -> Advisor, Appt_Dt 453453453 ACCT Adams 2/02/2005
Advisor -> Major 987987987 FIN Al-Safir 2/01/2005
666884444 CIS Patel 3/30/2005
Because LName is dependent on only part of the key,
123456789 ACCT Adams 4/29/2005
we make one relation for SSN -> Lname to achieve
123456789 MGT Fazio 4/30/2005
2NF.
987654321 CIS Brathwaite 3/30/2005
333445555 ACCT Zafiris 3/31/2005
999887777 ACCT Zhou 4/11/2005
999887777 CIS Patel 3/19/2005
SSN, Major -> Advisor, Appt_DT
Advisor -> Major
9
(Boyce-Code Normal Form, continued)
In STU_ADVISOR, we have: SSN, Major -> Advisor, Appt_Dt and Advisor -> Major
where Advisor is a determinant that is not part of a key, and there is no a transitive dependency.
STU_ADVISOR (3 NF)
SSN Major Advisor Appt_Dt
888665555 CIS Patel 2/01/2005
453453453 ACCT Adams 2/02/2005
987987987 FIN Al-Safir 2/01/2005
666884444 CIS Patel 3/30/2005
123456789 ACCT Adams 4/29/2005
123456789 MGT Fazio 4/30/2005
987654321 CIS Brathwaite 3/30/2005
333445555 ACCT Zafiris 3/31/2005
999887777 ACCT Zhou 4/11/2005
999887777 CIS Patel 3/19/2005
To achieve BCNF, we create relations where the determinants are keys. To do this, we perform a BCNF Swap—promote the non-
key determinant to replace the field it can determine. In this case, because Advisor à Major, we replace the SSN, Major composite
key with SSN, Advisor:
STU_ADVISOR (BCNF) We remove Major from the original determinant because it is dependent
SSN Advisor Appt_Dt on Advisor; 2) only Appt_Dt is dependent on the composite key in
888665555 Patel 2/01/2005 Stu_Advisor, for the M:N relationship
453453453 Adams 2/02/2005 ADVISOR (BCNF)
987987987 Al-Safir 2/01/2005 Advisor Major
666884444 Patel 3/30/2005 Adams ACCT
123456789 Adams 4/29/2005 Al-Safir FIN
123456789 Fazio 4/30/2005 Brathwaite CIS
987654321 Brathwaite 3/30/2005 Patel CIS
333445555 Zafiris 3/31/2005 Fazio MGT
999887777 Zhou 4/11/2005 Zafiris ACCT
999887777 Patel 3/19/2005 Zhou ACCT
SSN, Advisor -> Appt_DT Advisor -> Major
10
VI. Fourth Normal Form (4NF)
A. A relation is in fourth normal form if it is in BCNF and it contains no
multivalued dependencies.
1. Multivalued Dependency: A type of dependency where the
determinant can determine more than one value.
B. More formally, there are 3 criteria:
1. There must be at least 3 attributes in the relation. Call them A,
B, and C, for example.
2. Given A, one can determine multiple values of B. Given A, one
can determine multiple values of C. We represent this by:
A -->> B
A -->> C
3. B and C are independent of one another.
C. Implications of this:
1. There are no regular functional dependencies
2. Key Integrity: all three attributes are needed to create a unique
key
11
Unnormalized STUDENT Relation
12
Assumptions
SSN, CrsCode, Activities --> (there are no dependents, this is a composite key )
- students can take > 1 class
- students can participate > 1 activity
- CrsCodes and Activities are independent of each other.
- Note: There will be other attributes connected to the courses students take
and the activities in which they participate, but we leave them out to simplify
the analysis of this situation.
F. Problems:
1. Insertion Anomaly: We cannot enter a null value into the ACTIVITY
attribute, because it is part of the key. If we want to add a course for a
student who does not participate in an activity, we have to duplicate a
previous activity the student has taken.
2. Deletion Anomaly: If a student drops a course, we may lose information
about an activity. For example, if the student with SSN 987654321 drops
course 1212, we will lose the information that she was in the Student
Government.
13
G. Solution:
We have two themes (course grades and activities), so we create
two separate relations, one for each dependency:
SSN, ClassCode --> ;
SSN, Activity -->;
(Remember that there may be dependent attributes in these
dependencies, but they are omitted for clarity);
14