0% found this document useful (0 votes)
24 views34 pages

Lec 04

The document discusses normal forms in database design. It defines functional dependency as a constraint between attribute sets where values are determined by other attributes. The document covers first normal form (1NF), which disallows multi-valued and composite attributes, and second normal form (2NF) where non-key attributes must be fully functionally dependent on the primary key. Decomposing relations can solve issues like insertion anomalies but may still leave problems, requiring decomposition to third normal form.

Uploaded by

dark lord
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views34 pages

Lec 04

The document discusses normal forms in database design. It defines functional dependency as a constraint between attribute sets where values are determined by other attributes. The document covers first normal form (1NF), which disallows multi-valued and composite attributes, and second normal form (2NF) where non-key attributes must be fully functionally dependent on the primary key. Decomposing relations can solve issues like insertion anomalies but may still leave problems, requiring decomposition to third normal form.

Uploaded by

dark lord
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

CS3402 : Chapter 4

Normal Forms

CS3402 1
Functional Dependency
 Functional dependency is a constraint between two sets of
attributes from the database

 Formal definition:
 Let R be a relation schema, and   R,   R (i.e.,  and  are
sets of R’s attributes). We say:

 If in any relation instance r(R), for all pairs of tuples t1 and t2 in
r, we have:
(t1[] = t2[])  (t1[] = t2[])

CS3402 2
Inference Rules for FDs

IR1 (reflexive rule) If X is a subset of Y, then X → Y

IR2 (augmentation rule) If X  Y, then XZ  YZ

IR3 (transitive rule) If X  Y and Y  Z, then X  Z

IR4 (decomposition rule) If X YZ, then X  Y and X  Z

IR5 (union rule) If X  Y and X  Z, then X  YZ

IR6 (pseudotransitive rule) If X  Y and WY  Z, then WX  Z

CS3402 3
Inference Rules for FDs
 Closure of a set of attributes X with respect to F is the set X+ of all
attributes that are functionally determined by X
 Note both X and X+ are a set of attributes

 If X+ consists of all attributes of R, X is a superkey for R


 From the value of X, we can determine the values the whole
tuple

 X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the


FDs in F

 From X to find out X+

CS3402 4
Inference Rules for FDs
 Closure of a set F of FDs is the set F+ of all FDs that can be inferred
from F.

 Two sets of FDs F and G are equivalent if:


 Every FD in F can be inferred from G, and
 Every FD in G can be inferred from F
 Hence, F and G are equivalent if F+ =G+

CS3402 5
Relational Database Design
 Logical / Conceptual DB Design
 Schema
what relations (tables) are needed?
what their attributes should be?

 What is a “bad” DB Design?


- Repetition of data/information
- Potential inconsistency
- Inability to represent certain information
- Loss of data/information

CS3402 6
Relational Database Design
 Introduction (con’d)
 Normalization theory
- based on functional dependencies
* universe of relations
* 1st Normal Form (1NF)
* 2NF 1NF
* 3NF 2NF
* BCNF 3NF
* 4NF BCNF
4NF
* ...
5NF

CS3402 7
Normalization
 Normalization
 Proposed by Codd (1972)
 take a relation schema through a series of tests based on FD
and primary keys to certify whether it satisfies a certain normal
form
 Decompose a relation into several related relation to achieve:
 Minimizing redundancy
 Minimizing the insertion, deletion and update anomalies

CS3402 8
Normalization
 Normalization requires two properties

 Non-additive or lossless join


 Decomposition is reversible and no information is lost
 No spurious tuples (tuples that should not exist) should be
generated by doing a natural-join of any relations
(Extremely important)

 Preservation of the functional dependencies


 Ensure each functional dependency is represented in some
individual relation or could be inferred from other
dependencies after decomposition.

CS3402 9
Lossless Decomposition

S# Country City
S3 USA N.Y.
reversible S5 USA L.A.

decomposition
S# Country S# Country
S3 USA S3 USA
S5 USA S5 USA

Natural Natural
S# City Join Country City Join
S3 N.Y. USA N.Y. Country City S#
S5 L.A. USA L.A. USA N.Y. S3
USA N.Y. S5
USA L.A. S3
USA L.A. S5

CS3402 10
First Normal Form with Primary Key
 First normal form (1NF)
 Disallow multi-values attributes, composite attributes and their
combination
 The domain of an attribute must be atomic (simple and
indivisible) values

 DEPARTMENT: each department can have a number of locations


 1NF: DEPT_LOCATIONS(Dnumber, Dlocation)
 1NF also disallows multivalued attributes that are themselves
composite

CS3402 11
CS3402 12
Figure 14.10 Normalizing
nested relations into 1NF.
(a) Schema of the
EMP_PROJ relation with a
nested relation attribute
PROJS. (b) Sample
extension of the EMP_PROJ
relation showing nested
relations within each tuple.
(c) Decomposition of
EMP_PROJ into relations
EMP_PROJ1 and
EMP_PROJ2 by propagating
the primary key.

CS3402 13
Problems with 1NF
 Example:
FIRST(S#, Country, City, P#, Qty)
and its Functional Dependency Diagram:
What’s the primary key? (S#,P#)

S# Country
Qty

P# City

CS3402 14
Problems with 1NF
 Problems with 1NF
 Insert Anomalies
Inability to represent certain information:
Eg, cannot enter “Supplier and City” information until
Supplier supplies at least one product
 Delete Anomalies
Deleting the “only tuple” for a supplier will destroy all
the information about that supplier
 Update Anomalies
“S# and City” could be redundantly represented for
each P#, which may cause potential inconsistency
when updating a tuple
CS3402 15
Solution: 1NF->2NF
 Possible Solution:
Replace the original table by two sub-tables
SECOND(S#, City, Country)
SP (S#, P#, Qty)
and now, their FD diagrams are:

S# Country
Qty S#

P#
City

CS3402 16
Second Normal Form with Primary Key
 X->Y

 Full functional dependency


 If removal of any attribute A from X means that the dependency
does not hold any more
 E.g., {S#, P#} → Qty

 Partial functional dependency


 If some attributes A belonging to X can be removed from X and
the dependency still holds
 E.g., {S#, P#} → Country as S# → Country

CS3402 17
Second Normal Form with Primary Key
 A relation R is in 2NF if every nonprime attributes A in R is fully
functional dependent on the primary key of R

 An attribute of R is called prime (key) attribute of R if it is a


member of some candidate key of R. Otherwise it is nonprime.

 Non-prime (non-key) attributes: Country, City, Qty


 Primary key: {S#, P#}

CS3402 18
Second Normal Form with Primary Key
 The nonprime attributes Country violates 2NF because of S#-
>Country
 Similarly, S#->City

 If a relation is not in 2NF, it can be 2NF normalized into a number


of 2NF relations in which nonprime attributes are associated only
with the part of the primary key on which they are fully functional
dependent.

 Solutions: SP (P#,S#,Qty); Second(S#, City, Country);

CS3402 19
Problem with 2NF
Country

 Still, there are problems with 2NF:


 Insert Anomalies
 Cannotinsert “City and Country” information until some
Supplier in that city exist
 Delete Anomalies
Deleting an “only tuple” for a particular city will destroy all
the information about that city
 Update Anomalies
The supplier move to another city,inconsistency between
city and country will happen
CS3402 20
Solution: 2NF->3NF
 Possible Solution:
Replace the SECOND table by two sub-tables
SC (S#, City)
CC (City, country)
and still keep the table SP (S#, P#, Qty). Now the FD
diagrams for the new tables:

S#
Qty S# city

P#
city Country
CS3402 21
Third Normal Form with Primary Key
 3NF is based on the concept of transitive dependency

 A functional dependency X->Y in a relation R is transitive


dependency if there exists a set of attributes Z in R that is neither a
candidate key nor a subset of any key of R, and both X->Z and Z-
>Y hold

 X-> Z ->Y (Z: a non-prime attribute or a set including non-prime


attributes )

 E.g., dependency S#->Country is transitive through City


 S#->City and City->Country, and City is neither a key itself nor a
subset of the key
 S#-> City-> Country
CS3402 22
Third Normal Form with Primary Key
 According to Codd’s original definition, a relation scheme R is in
3NF if it satisfies 2NF and no non-prime attribute of R is transitively
dependent on the primary key

 To normalize R into 3NF, R should be decomposed to break the


transitive dependency.

 3NF: SC (S#, City), CC (City, Country), SP (S#, P#, Qty).

S#
Qty S# city

P#
city Country
CS3402 23
Normal Forms Defined Informally
 1st normal form
 All attributes are simple (no composite or multivalued attributes)
 2nd normal form
 All non-prime attributes depend on the whole key (no partial
dependency)
 3rd normal form
 All non-prime attributes only depend on the key (no transitive
dependency)

 In general, 3NF is desirable and powerful enough


 But, still there are cases where a stronger normal form is needed

CS3402 24
Test and Remedy for Normalization

CS3402 25
Consider all candidate keys of a relation (not just a defined
primary key)

General Definitions of 2NF and 3NF

 LOTS(Property_id# , County_name, Lot#, area, price, tax_rate)


describes parcels of land for sale in various counties of a state
 Two candidate keys: Property_id# and {County_name, Lot#}
 Property_id# is the primary key
 LOTS violates 2NF due to FD3 as Tax_rate is partially dependent
on the candidate key {County_name, Lot#}
 County_name => Tax-rate (partial dependent)

CS3402 26
General Definition of Second
Normal Form

CS3402 27
CS3402 28
General Definition of Third Normal
Form

 Superkey => include a key


 Prime attributes => attributes in a key

 LOTS2 is in 3NF
 FD4 in LOTS1 violates 3NF because Area is not a superkey and
Price is not a prime attribute in LOTS1
 LOTS1 violates 3NF because Price is transitively dependent on
each of the candidate keys of LOTS via the nonprime attribute Area
 To normalize LOTS1 into 3NF, LOTS1B{Area, Price}

CS3402 29
Boyce-Codd Normal Form
 BCNF was proposed as a simpler form of 3NF, but it was found to
be stricter than 3NF
 Every relation in BCNF is also in 3NF
 BUT Relation in 3NF is not necessarily in BCNF

 Difference:
 Condition which allows A to be prime is absent from BCNF

CS3402 30
Boyce-Codd Normal Form
 Suppose we have thousands of lots in the relation but the lots are
from only two counties: A and B
 Lot sizes in A are only 0.5, 0.6, 0.7, 0.8, 0.9 acres whereas lots in B
are 1.1, 1.2, …, 1.9
 Then, Area -> County_name
 FD5 satisfies 3NF in LOTS1A because County_name is a prime
attribute
 FD5 violates BCNF in LOTS1A because Area is not a superkey of
LOTS1A
 Create new relation R(Area, County_name). The area of lots can be
represented in R(Area, County_name) to reduce redundancy

CS3402 31
County_name is a prime attribute

CS3402 32
Boyce-Codd Normal Form
 Some notes on BCNF:
 BCNF is stronger than 3NF
 BCNF is useful when a relation has:
multiple candidate keys, and
these candidate keys are composite ones, and
they overlap on some common attributes
 BCNF reduces to 3NF if the above conditions do not apply

CS3402 33
References
 6e
 Ch. 14, p. 487-515
 Ch. 15, p. 529-533

CS3402 34

You might also like