0% found this document useful (0 votes)
25 views

Normalization

The document defines functional dependencies and provides examples. It then discusses various normal forms including 1NF, 2NF, 3NF, and BCNF. The goal of normalization is to reduce data redundancy and inconsistencies. Higher normal forms have more strict rules and eliminating violations involves decomposing relations into smaller relations until all dependencies meet the normal form.

Uploaded by

ebayoffer150
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Normalization

The document defines functional dependencies and provides examples. It then discusses various normal forms including 1NF, 2NF, 3NF, and BCNF. The goal of normalization is to reduce data redundancy and inconsistencies. Higher normal forms have more strict rules and eliminating violations involves decomposing relations into smaller relations until all dependencies meet the normal form.

Uploaded by

ebayoffer150
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

Functional Dependencies

 Definition
• R.x  R.y (x: determinant, y: dependent)
• x functionally determines y in a relation R
• or y is functionally dependent on x
• if each x-value in the relation is associated
with only one y-value at any one time
• x and y may be composite attributes
• y may be associated with more than one x
Functional Dependencies
 Example1
• COMPANY (SSN, PNO, HOURS, BONUS, ENAME,
PHONE, PNAME, {PLOCATIONS})
• SSN  ENAME, PHONE
• PNO  PNAME, {PLOCATIONS}
• HOURS  BONUS
• {SSN, PNO}  HOURS (not BONUS)
• {PLOCATIONS} is a multi-valued attribute here
• {SSN, PNO} is the primary key
Functional Dependencies
 Example2
• REGISTRATION (SSN, SNAME, SADDRESS, STATUS,
COURSEID, COURSE#, CNAME, CDESC, SECTION#,
YEAR, GRADE, INSTRUCTOR, CLASSROOMS)
• Functional Dependencies:
SSN  SNAME, SADDRESS, STATUS
COURSEID  COURSE#, SECTION#, YEAR,
INSTRUCTOR, {CLASSROOMS}
{COURSE#, YEAR}  CNAME, CDESC
{SSN, COURSEID}  GRADE
Functional Dependencies
 Example3
• ORDER (ORDERID, DATE, CUSTID, CNAME, CTYPE,
DISCOUNT, ITEMNO, ITEMNAME, ITEMPRICE, QTY)
• Functional Dependencies:
ORDERID  DATE, CUSTID
CUSTID  CNAME, CTYPE
CTYPE  DISCOUNT
ITEMNO  ITEMNAME, ITEMPRICE
{ORDERID, ITEMNO}  QTY
Functional Dependencies
 Full Functional Dependency
• more strict definition of FD (vs. partial FD)
• y is fully functionally dependent on x if it is
functionally dependent on all of (composite
candidate key) x, not just on a subset
• R (SSN, PNO, HOURS, ENAME)
partial FD: SSN  ENAME
full FD: {SSN, PNO}  HOURS
Functional Dependencies
 Transitive Functional Dependency
• y is transitively functionally dependent on x
(candidate key) if x functionally determines z
(not a candidate key or a subset) and z
functionally determines y
• x  y if x  z and z  y
• e.g.) {SSN, PNO}  HOURS and HOURS 
BONUS, then {SSN, PNO}  BONUS
Normalization
 Normalization
• a process of analyzing relations in order to
meet increasingly more stringent requirements
• a process of reducing unnecessary
redundancies
• a relation schema is said to be in a normal
form when it satisfies certain desirable
properties including functional dependencies
and key constraints
Normalization
 Will lead to progressively better
groupings, or higher normal forms
1. identify functional dependencies of a relation
2. determine whether FDs meet a normal form
3. if a relation is not in a specific NF, split the
table to meet the normal form
4. repeat steps 2-3 for higher normal forms
Normalization
 Review on Keys
• superkey: a set of attributes which will
uniquely identify each tuple in a relation
• candidate key: a minimal superkey
• primary key: a chosen candidate key
• secondary key: all the rest of candiate keys
• prime attribute: an attribute that is a part of
a candidate key (key column)
• nonprime attribute: a nonkey column
Normalization
 Functional Dependency Type by Keys
• ‘whole (candidate) key  nonprime attribute’:
full FD (no violation)
• ‘partial key  nonprime attribute’: partial FD
(violation of 2NF)
• ‘nonprime attribute  nonprime attribute’:
transitive FD (violation of 3NF)
• ‘not a whole key  prime attribute’: violation
of BCNF
Normalization
 Good Decomposition
• dependency preserving decomposition
- it is undesirable to lose functional
dependencies during decomposition
• lossless join decomposition
- join of decomposed relations should be able
to create the original relation (no spurious
tuples)
Normalization
 Example 1: Company relation
• COMPANY (SSN, PNO, HOURS, BONUS, ENAME,
PHONE, PNAME, {PLOCATIONS})
• SSN  ENAME, PHONE
PNO  PNAME, {PLOCATIONS}
HOURS  BONUS
{SSN, PNO}  HOURS
• what if (SSN, PNO, PLOCATION)
Normal Forms
 1NF (First Normal Form)
• a relation R is in 1NF if and only if it has only
single-valued attributes (atomic values)
• COMPANY (SSN, PNO, HOURS, BONUS,
ENAME, PHONE, PNAME, {PLOCATIONS})
• COMPANY relation is not in 1NF
- PLOCATIONS is a multi-valued attribute
Normal Forms
 1NF (First Normal Form)
• solution: decompose the relation by creating a
new relation for the multi-valued attribute
• how to decompose a relation not in 1NF
- take out multi-valued attribute along with (a
copy of) its determinant and create a new
relation
- original relation should keep the determinant
- key of new relation: combination of multi-
valued attribute and its determinant
Normal Forms
 1NF (First Normal Form)
• violation: PNO  {PLOCATIONS} (multi-valued)
• before decomposition
COMPANY (SSN, PNO, HOURS, BONUS, ENAME,
PHONE, PNAME, {PLOCATIONS})

COMPANY2 (SSN, PNO, HOURS, BONUS, ENAME,


PHONE, PNAME)
PLOCATIONS (PNO, PLOCATION)
Normal Forms
 2NF (Second Normal Form)
• A relation R in 2NF if and only if it is in 1NF
and every nonprime attribute depends on a
key, not a subset of a key
• All nonprime attributes of R must be fully
functionally dependent on the whole key of
the relation, not a part of the key
• No violation: single-attribute key or no
nonprime attribute
Normal Forms
 2NF (Second Normal Form)
• Violation: partial key  nonprime attribute
COMPANY2 (SSN, PNO, HOURS, BONUS, ENAME,
PHONE, PNAME)
SSN  ENAME, PHONE
PNO  PNAME
• Decomposition: nonprime attribute(s), its
determinant (copy), and dependents if any
• Key of new relation: determinant attribute
Normal Forms
 2NF (Second Normal Form)
• violation: SSN  ENAME, PHONE; PNO  PNAME
• before decomposition:
COMPANY2 (SSN, PNO, HOURS, BONUS, ENAME,
PHONE, PNAME)
• after decomposition:
COMPANY3 (SSN, PNO, HOURS, BONUS)
EMPLOYEE (SSN, ENAME, PHONE)
PROJECT (PNO, PNAME)
1NF and 2NF
Original COMPANY (SSN, PNO, HOURS, BONUS, ENAME,
PHONE, PNAME, {PLOCATIONS})

COMPANY2 (SSN, PNO, HOURS, BONUS, ENAME,


1NF PHONE, PNAME)
PLOCATIONS (PNO, PLOCATION)

COMPANY3 (SSN, PNO, HOURS, BONUS)


2NF EMPLOYEE (SSN, ENAME, PHONE)
PROJECT (PNO, PNAME)
Normal Forms
 3NF (Third Normal Form)
• A relation R in 3NF if and only if it is in 2NF and
every nonprime attribute does not depend on
another nonprime attribute
• All nonprime attributes of R are fully dependent on every
key of R.
• All nonprime attributes of R must be non-transitively
functionally dependent on a key of the relation
• Violation: nonprime attribute  nonprime att.
• Decomposition: create a new relation
Normal Forms
 3NF (Third Normal Form)
• violation: HOURS  BONUS
• p.k. of new relation: determinant (HOURS)
• before decomposition:
COMPANY3 (SSN, PNO, HOURS, BONUS)
• after decomposition (determinant copy):
WORKS (SSN, PNO, HOURS)
BONUS (HOURS, BONUS)
Normal Forms
 3NF (Third Normal Form)
• COMPANY (SSN, PNO, HOURS, BONUS, ENAME,
PHONE, PNAME, {PLOCATIONS})
 decomposition of COMPANY for 3NF (final)
EMPLOYEE (SSN, ENAME, PHONE)
PROJECT (PNO, PNAME)
PLOCATIONS (PNO, PLOCATION)
WORKS (SSN, PNO, HOURS)
BONUS (HOURS, BONUS)
1NF, 2NF, 3NF
COMPANY (SSN, PNO, HOURS, BONUS, ENAME, PHONE, PNAME, {PLOCATIONS})
Original
COMPANY2 (SSN, PNO, HOURS, BONUS, ENAME, PHONE, PNAME)
PLOCATIONS (PNO, PLOCATION)
1NF
COMPANY3 (SSN, PNO, HOURS, BONUS)
EMPLOYEE (SSN, ENAME, PHONE)
PROJECT (PNO, PNAME)
2NF
EMPLOYEE (SSN, ENAME, PHONE)
PROJECT (PNO, PNAME)
PLOCATIONS (PNO, PLOCATION)
3NF WORKS (SSN, PNO, HOURS)
BONUS (HOURS, BONUS)
Normal Forms
 3NF (Third Normal Form)
• SUPPLY (SNAME, STREET, CITY, STATE, TAXRATE)
SNAME  STREET, CITY, STATE
STATE  TAXRATE (violate 3NF)

• solution: decompose the relation


SUPPLIER (SNAME, STREET, CITY, STATE)
TAXINFO (STATE, TAXRATE)
Normal Forms
 BCNF (Boyce-Codd Normal Form)
• a relation R in BCNF if and only if every
determinant (R.x) is a candidate key
• more strict version of 3NF
• a relation in BCNF is also in 3NF
• a relation in BCNF will not produce any
update anomalies
Normal Forms
 BCNF (Boyce-Codd Normal Form)
• violation: not a (whole) key  prime attribute
• SHIPPING (SNO, SNAME, PNO, COST): in 3NF
candidate keys: {SNO, PNO}, {SNAME, PNO}
{SNO, PNO}  COST (key  nonprime)
{SNAME, PNO}  COST (key  nonprime)
SNO  SNAME (partial key  prime)
SNAME  SNO (partial key  prime)
Normal Forms
 BCNF (Boyce-Codd Normal Form)
• solution a (SNO as the primary key)
SHIPPER (SNO, SNAME)
SHIPPINGCOST (SNO, PNO, COST)
• or solution b (SNAME as the primary key)
SHIPPER (SNAME, SNO)
SHIPPINGCOST (SNAME, PNO, COST)
An example in 3NF but not BCNF
Normal Forms
 Summary
• 1NF: remove multi-valued attributes
• 2NF: remove partial dependencies
• 3NF: remove transitive dependencies
• BCNF: remove remaining anomalies from FDs
• 4NF: remove multi-valued dependencies
• 5NF: project-join NF
• DKNF: absolute NF (ideal, but not practical)
Normal Forms
 Exercise 1
 COURSE = {CourseNo, SecNo, CDept, Credit,
Instructor, Semester, Year, ClassTime,
RoomNo, ClassSizeMax, NoOfStudents}
 Functional dependency
CourseNo SecNo CDept Credit Instruc Sem Yr ClassTime RmNo ClassSize NoOfStudent

Example FD: CourseNo  CDept, Credit,


Exercise Answer
 COURSE = {CourseNo, SecNo, CDept, Credit, Instructor, Semester,
Year, ClassTime, RoomNo, ClassSizeMax, NoOfStudents}

Primary Key: (CourseNo, SecNo, Semester, Year)


Secondary Key: (ClassTime, RoomNo, Semester, Year)

FD1 CourseNo  CDept, Credit (violate 2NF)


FD2 {CourseNo,SecNo}ClassSizeMax (violate 2NF)
FD3{CourseNo, SecNo, Semester, Year} Instructor, ClassTime, RoomNo, NoOfStudent
FD4 {ClassTime, RoomNo, Semester, Year} Instructor, CourseNo, SecNo, NoOfStudent

Normalized result:
Credit (CourseNo, CDept, Credit)
Course (CourseNo, SecNo, Semester, Year, Instructor, ClassTime, RoomNo, NoOfStudent)
Max_class_size (CourseNo, SecNo, ClassSizeMax)

You might also like