Normalization
Normalization
Definition
• R.x R.y (x: determinant, y: dependent)
• x functionally determines y in a relation R
• or y is functionally dependent on x
• if each x-value in the relation is associated
with only one y-value at any one time
• x and y may be composite attributes
• y may be associated with more than one x
Functional Dependencies
Example1
• COMPANY (SSN, PNO, HOURS, BONUS, ENAME,
PHONE, PNAME, {PLOCATIONS})
• SSN ENAME, PHONE
• PNO PNAME, {PLOCATIONS}
• HOURS BONUS
• {SSN, PNO} HOURS (not BONUS)
• {PLOCATIONS} is a multi-valued attribute here
• {SSN, PNO} is the primary key
Functional Dependencies
Example2
• REGISTRATION (SSN, SNAME, SADDRESS, STATUS,
COURSEID, COURSE#, CNAME, CDESC, SECTION#,
YEAR, GRADE, INSTRUCTOR, CLASSROOMS)
• Functional Dependencies:
SSN SNAME, SADDRESS, STATUS
COURSEID COURSE#, SECTION#, YEAR,
INSTRUCTOR, {CLASSROOMS}
{COURSE#, YEAR} CNAME, CDESC
{SSN, COURSEID} GRADE
Functional Dependencies
Example3
• ORDER (ORDERID, DATE, CUSTID, CNAME, CTYPE,
DISCOUNT, ITEMNO, ITEMNAME, ITEMPRICE, QTY)
• Functional Dependencies:
ORDERID DATE, CUSTID
CUSTID CNAME, CTYPE
CTYPE DISCOUNT
ITEMNO ITEMNAME, ITEMPRICE
{ORDERID, ITEMNO} QTY
Functional Dependencies
Full Functional Dependency
• more strict definition of FD (vs. partial FD)
• y is fully functionally dependent on x if it is
functionally dependent on all of (composite
candidate key) x, not just on a subset
• R (SSN, PNO, HOURS, ENAME)
partial FD: SSN ENAME
full FD: {SSN, PNO} HOURS
Functional Dependencies
Transitive Functional Dependency
• y is transitively functionally dependent on x
(candidate key) if x functionally determines z
(not a candidate key or a subset) and z
functionally determines y
• x y if x z and z y
• e.g.) {SSN, PNO} HOURS and HOURS
BONUS, then {SSN, PNO} BONUS
Normalization
Normalization
• a process of analyzing relations in order to
meet increasingly more stringent requirements
• a process of reducing unnecessary
redundancies
• a relation schema is said to be in a normal
form when it satisfies certain desirable
properties including functional dependencies
and key constraints
Normalization
Will lead to progressively better
groupings, or higher normal forms
1. identify functional dependencies of a relation
2. determine whether FDs meet a normal form
3. if a relation is not in a specific NF, split the
table to meet the normal form
4. repeat steps 2-3 for higher normal forms
Normalization
Review on Keys
• superkey: a set of attributes which will
uniquely identify each tuple in a relation
• candidate key: a minimal superkey
• primary key: a chosen candidate key
• secondary key: all the rest of candiate keys
• prime attribute: an attribute that is a part of
a candidate key (key column)
• nonprime attribute: a nonkey column
Normalization
Functional Dependency Type by Keys
• ‘whole (candidate) key nonprime attribute’:
full FD (no violation)
• ‘partial key nonprime attribute’: partial FD
(violation of 2NF)
• ‘nonprime attribute nonprime attribute’:
transitive FD (violation of 3NF)
• ‘not a whole key prime attribute’: violation
of BCNF
Normalization
Good Decomposition
• dependency preserving decomposition
- it is undesirable to lose functional
dependencies during decomposition
• lossless join decomposition
- join of decomposed relations should be able
to create the original relation (no spurious
tuples)
Normalization
Example 1: Company relation
• COMPANY (SSN, PNO, HOURS, BONUS, ENAME,
PHONE, PNAME, {PLOCATIONS})
• SSN ENAME, PHONE
PNO PNAME, {PLOCATIONS}
HOURS BONUS
{SSN, PNO} HOURS
• what if (SSN, PNO, PLOCATION)
Normal Forms
1NF (First Normal Form)
• a relation R is in 1NF if and only if it has only
single-valued attributes (atomic values)
• COMPANY (SSN, PNO, HOURS, BONUS,
ENAME, PHONE, PNAME, {PLOCATIONS})
• COMPANY relation is not in 1NF
- PLOCATIONS is a multi-valued attribute
Normal Forms
1NF (First Normal Form)
• solution: decompose the relation by creating a
new relation for the multi-valued attribute
• how to decompose a relation not in 1NF
- take out multi-valued attribute along with (a
copy of) its determinant and create a new
relation
- original relation should keep the determinant
- key of new relation: combination of multi-
valued attribute and its determinant
Normal Forms
1NF (First Normal Form)
• violation: PNO {PLOCATIONS} (multi-valued)
• before decomposition
COMPANY (SSN, PNO, HOURS, BONUS, ENAME,
PHONE, PNAME, {PLOCATIONS})
Normalized result:
Credit (CourseNo, CDept, Credit)
Course (CourseNo, SecNo, Semester, Year, Instructor, ClassTime, RoomNo, NoOfStudent)
Max_class_size (CourseNo, SecNo, ClassSizeMax)