0% found this document useful (0 votes)
9 views40 pages

LN344 0910 Normal Forms

The document outlines the principles of functional dependency and normalization in relational database design, emphasizing the importance of reducing redundancy, avoiding update anomalies, and ensuring clear semantics of attributes. It covers various normal forms, including 1NF, 2NF, 3NF, and BCNF, and provides guidelines for achieving a good database schema. Additionally, it discusses functional dependencies and inference rules that help define the quality of relational designs.

Uploaded by

levent
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views40 pages

LN344 0910 Normal Forms

The document outlines the principles of functional dependency and normalization in relational database design, emphasizing the importance of reducing redundancy, avoiding update anomalies, and ensuring clear semantics of attributes. It covers various normal forms, including 1NF, 2NF, 3NF, and BCNF, and provides guidelines for achieving a good database schema. Additionally, it discusses functional dependencies and inference rules that help define the quality of relational designs.

Uploaded by

levent
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Functional Dependency & Normalization

Lecture 09-10

BIL344 Database Systems


Mustafa Sert
Assoc. Prof.
[email protected]

Depa r tment of Computer Engi neer i ng, Ba şkent Uni ver si ty


Ankara 06810 TURKEY
Lecture Outline
 Informal Design Guidelines for Relational Databases
 Semantics of the Relation Attributes
 Redundant Information in Tuples and Update Anomalies
 Null Values in Tuples
 Functional Dependencies (FDs)
 Definition of FD
 Inference Rules for FDs
 Normal Forms
 1st Normal Form
 2nd Normal Form
 3rd Normal Form
 Boyce-Code Normal Form (BCNF)

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Informal Design Guidelines for Relational
Databases
 Four informal guidelines for Relational database
design to make it a “good” design:
 Making sure that the semantics of the attributes is clear
in the schema
 Reducing the redundant information in tuples
 Reducing the NULL values in tuples
 Disallowing the possibility of generating spurious tuples

 + the formal guidelines to measure the quality of a


database schema
 Formal concepts of functional dependencies and normal
forms, such as 1NF, 2NF, 3NF, and BCNF

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Semantics of the Relation Attributes

 Each tuple in a relation should represent one entity or


relationship instance
 Only foreign keys should be used to refer to other entities
 Entity and relationship attributes should be kept apart as
much as possible
 Design a schema that can be explained easily relation by
relation. The semantics of attributes should be easy to
interpret.

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


A sample
database
schema for a
COMPANY

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
Redundant Information in Tuples and Update
Anomalies

 Mixing attributes of multiple entities may cause


problems
 Information is stored redundantly wasting storage
 Problems with update anomalies:
◼ Insertion anomalies
◼ Deletion anomalies
◼ Modification anomalies

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
Redundancy

Redundancy Redundancy

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


EXAMPLE OF AN UPDATE ANOMALY

Consider the relation:


EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)
 Update Anomaly
◼ Changing the name of project number P1 from “Billing” to
“Customer-Accounting” may cause this update to be made for all 100
employees working on project P1
 Insert Anomaly
◼ Cannot insert a project unless an employee is assigned to .
◼ Inversely- Cannot insert an employee unless he/she is assigned to a
project.

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


EXAMPLE OF AN UPDATE ANOMALY (2)

 Delete Anomaly
◼ When a project is deleted, it will result in deleting all the
employees who work on that project. Alternately, if an employee is
the sole employee on a project, deleting that employee would result
in deleting the corresponding project.
 Design a schema that does not suffer from the
insertion, deletion and update anomalies. If there
are any present, then note them so that applications
can be made to take them into account

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Null Values in Tuples

 Relations should be designed such that their tuples will


have as few NULL values as possible
 Attributes that are NULL frequently could be placed in
separate relations (with the primary key)
 Reasons for nulls:
◼ Attribute not applicable or invalid
◼ Attribute value unkown (may exist)
◼ Value known to exist, but unavailable

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Spurious (Wrong) Tuples

 Bad designs for a relational database may result in


erroneous results for certain JOIN operations
 The "lossless join" property is used to guarantee
meaningful results for join operations
 The relations should be designed to satisfy the lossless
join condition. No spurious tuples should be generated
by doing a natural-join of any relations

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Spurious Tuples (2)

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


 * indicates Spurious Tuples (3)
the wrong
(spurious)
information
that is not
valid.
 That is, the
natural join
of
EMP_PROJ1
and
EMP_LOCS
produced
additional
tuples that
were not in
EMP_PROJ
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
Spurious (Wrong) Tuples (4)
 As a result
 Decomposing EMP_PROJ into EMP_LOCS and EMP_PROJ1 is
undesirable because when we JOIN them back using
NATURAL_JOIN, we don NOT get the correct original
information
 This is because, Plocation is the attribute that relates
EMP_LOCS and EMP_PROJ1, and Plocation is neither primary
key nor a foreign key in either EMP_LOCS or EMP_PROJ1
 Guideline: design relation schemas so that they can be
joined with equality conditions on attributes that are
appropriately related (primary-key, foreign-key) pairs in a
way that guarantees that no spurious tuples are generated.

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Functional Dependencies

 Functional dependencies (FDs) are used to specify


formal measures of the "goodness" of relational
designs
 FDs and keys are used to define normal forms for
relations
 FDs are constraints that are derived from the
meaning and interrelationships of the data
attributes

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Functional Dependencies (2)

 A set of attributes X functionally determines a set of


attributes Y if the value of X determines a unique value for Y
 X →Y holds if whenever two tuples have the same value for
X, they must have the same value for Y
If t1[X]=t2[X], then t1[Y]=t2[Y] in any relation instance r(R)
 X → Y in R specifies a constraint on all relation instances r(R)
 FDs are derived from the real-world constraints on the
attributes

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Examples of FD constraints

 Social Security Number determines employee name


SSN → ENAME
 Project Number determines project name and location
PNUMBER → {PNAME, PLOCATION}
 Employee SSN and project number determines the
hours per week that the employee works on the
project
{SSN, PNUMBER} → HOURS

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Functional Dependencies (3)

 An FD is a property of the attributes in the schema


R
 The constraint must hold on every relation instance
r(R)
 If K is a key of R, then K functionally determines all
attributes in R (since we never have two distinct
tuples with t1[K]=t2[K])

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


An Example
 The following FDs may A B C D
hold on the example a1 b1 c1 d1
relation: a1 b2 c2 d2
 B→ C a2 b2 c2 d3
 C→B a3 b3 c4 d3

 The followings do NOT


hold:
 A→B
 B →A
 D→C

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Inference Rules for FDs

 Given a set of FDs F, we can infer additional FDs that


hold whenever the FDs in F hold
 Armstrong's inference rules
A1. (Reflexive) If Y subset-of X, then X → Y
A2. (Augmentation) If X → Y, then XZ → YZ
(Notation: XZ stands for X U Z)
A3. (Transitive) If X → Y and Y → Z, then X → Z

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Additional Useful Inference Rules

 Decomposition
 If X → YZ, then X → Y and X → Z
 Union
 If X → Y and X → Z, then X → YZ
 Pseudotransitivity
 If X → Y and WY → Z, then WX → Z
 Closure of a set F of FDs is the set F+ of all FDs that
can be inferred from F

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Introduction to Normalization

 Normalization: Process of decomposing


unsatisfactory "bad" relations by breaking up their
attributes into smaller relations
 Normal form: Condition using keys and FDs of a
relation to certify whether a relation schema is in a
particular normal form
 2NF, 3NF, BCNF based on keys and FDs of a relation
schema
 4NF based on keys, multi-valued dependencies

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


First Normal Form

 Disallows composite attributes, multivalued


attributes, and nested relations; attributes whose
values for an individual tuple are non-atomic
 Considered to be part of the definition of relation

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
Second Normal Form

 Uses the concepts of FDs, primary key


 Definitions:
 Prime attribute - attribute that is member of the
primary key K
 Full functional dependency - a FD Y → Z where
removal of any attribute from Y means the FD does not
hold any more, otherwise it is partial dependency.

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Examples
Second Normal Form

 {SSN, PNUMBER} → HOURS is a full FD since neither


SSN → HOURS nor PNUMBER → HOURS hold
 {SSN, PNUMBER} → ENAME is not a full FD (it is called a
partial dependency ) since SSN → ENAME also holds
 A relation schema R is in second normal form (2NF) if
every non-prime attribute A in R is fully functionally
dependent on the primary key
 R can be decomposed into 2NF relations via the process of
2NF normalization

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Example

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Third Normal Form
 Definition
 Transitive functional dependency – if there a set of
atributes Z in R that is neither a candidate key nor a subset
of any key of R, and both X → Z and Z → Y holds.
 Examples:
 SSN → DMGRSSN is a transitive FD since
SSN → DNUMBER and DNUMBER → DMGRSSN hold
 SSN → ENAME is non-transitive since there is no set of
attributes X where SSN → X and X → ENAME

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


3rd Normal Form

A relation schema R is in third normal form (3NF) if it


satisfies 2NF and NO non-prime attribute of R is
transitively dependent on the primary key

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Example

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Candidate Key

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


BCNF (Boyce-Codd Normal Form)

 A relation schema R is in Boyce-Codd Normal Form


(BCNF) if whenever an FD X → A holds in R, then X is
a superkey of R
 Each normal form is strictly stronger than the previous one:
◼ Every 2NF relation is in 1NF
◼ Every 3NF relation is in 2NF
◼ Every BCNF relation is in 3NF
 There exist relations that are in 3NF but not in BCNF
 The goal is to have each relation in BCNF (or 3NF)

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


We assume that there are two candidate keys: (1)property_id (2)
{county_name, lot#}

It is 3NF, but not in BCNF!

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Exercise

 Given the relation


Book(Book_title, Authorname, Book_type, Listprice,
Author_affil, Publisher)
The FDs are
Book_title → Publisher, Book_type
Book_type → Listprice
Authorname →Author_affil

 What normal form the relation in?

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Exercise
 Solution
 {Book_title, Authorname} is the key for this relation. It is also 1NF
 Using full-functional dependency rule, we can say that this
relation is NOT 2NF
 2NF Decomposition:
◼ Book0(Book_title, Authorname)
◼ Book1(Book_title, Publisher, Book_type, Listprice)
◼ Book2(Authorname, Author_address)
 3NF Ayrıştırması:
◼ Book0(Book_title, Authorname)
◼ Book1-1(Book_title, Publisher, Book_type)
◼ Book1-2(Book_type, Listprice)
◼ Book2(Authorname, Author_address)
 This decomposition solves the transitivity of Listprice
 Relations also hold BCNF
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
SUMMARY (1)

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


40
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.

You might also like