0% found this document useful (0 votes)
5 views37 pages

Normalization Review

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views37 pages

Normalization Review

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

Normalization

 Normalization theory is based on the


observation that relations with certain
properties are more effective in
inserting, updating and deleting data
than other sets of relations containing
the same data
 Normalizationis a multi-step process
beginning with an “unnormalized”relation
 Hospitalexample from Atre, S. Data Base:
Structured Techniques for Design,
Performance, and Management.

IS 257 – Fall 2008


Normal Forms

First Normal Form (1NF)


Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)

IS 257 – Fall 2008


Normalization

Functional
dependency
No transitive
of nonkey
dependency
attributes on
between
the primary
nonkey Boyce- key - Atomic
attributes
Codd and values only

All
Higher Full
determinants Functional
are candidate dependency
keys - Single of nonkey
multivalued attributes on
dependency the primary
key

IS 257 – Fall 2008


Functional
Dependencies
Functional dependencies (FDs) are
used to specify formal measures of
the "goodness" of relational designs
FDs and keys are used to define

normal forms for relations
FDs are constraints that are
derived from the meaning and
interrelationships of the data
attributes
Functional
Dependency definition
 A set of attributes X functionally determines a set of
attributes Y if the value of X determines a unique
value for Y
 X Y holds if whenever two tuples have the same
value for X, they must have the same value for Y
If t1[X]=t2[X], then t1[Y]=t2[Y] in any relation instance
r(R)

 X  Y in R specifies a constraint on all relation


instances r(R)
 FDs are derived from the real-world constraints on
the attributes
Examples of FD
constraints
 Social
Security Number determines
employee name
SSN  ENAME
 Project
Number determines project
name and location
PNUMBER  {PNAME, PLOCATION}
 EmployeeSSN and project number
determines the hours per week that the
employee works on the project
{SSN, PNUMBER}  HOURS
Functional
Dependencies and
Keys
An FD is a property of the attributes
in the schema R
The constraint must hold on every
relation instance r(R)
 If K is a key of R, then K
functionally determines all
attributes in R (since we never have
two distinct tuples with t1[K]=t2[K])
Inference Rules for
FDs
 Given a set of FDs F, we can infer
additional FDs that hold whenever the
FDs in F hold
 Armstrong's inference rules
A1. (Reflexive) If Y subset-of X, then X  Y
A2. (Augmentation) If X  Y, then XZ  YZ
(Notation: XZ stands for X U Z)
A3. (Transitive) If X  Y and Y  Z, then X 
Z
 A1,A2, A3 form a sound and complete
set of inference rules
Additional Useful
Inference Rules
 Decomposition
 If X  YZ, then X  Y and X  Z
 Union
 If X  Y and X  Z, then X  YZ
 Psuedotransitivity
 If X  Y and WY  Z, then WX  Z
 Closure of a set F of FDs is the set F+
of all FDs that can be inferred from F
Introduction to
Normalization
 Normalization: Process of
decomposing unsatisfactory "bad"
relations by breaking up their attributes
into smaller relations
 Normal form: Condition using keys
and FDs of a relation to certify whether
a relation schema is in a particular
normal form
 2NF, 3NF, BCNF based on keys and FDs of a
relation schema
 4NF based on keys, multi-valued
dependencies
Unnormalized Relations

First step in normalization is to


convert the data into a two-
dimensional table
In unnormalized relations data
can repeat within a column

IS 257 – Fall 2008


Unnormalized Relation
Patient # Surgeon # Surg. date Patient Name Patient Addr Surgeon Surgery Postop drug
Drug side effects

Gallstone
s removal;
Jan 1, 15 New St. Beth Little Kidney
145 1995; June New York, Michael stones Penicillin, rash
1111 311 12, 1995 John White NY Diamond removal none- none

Eye
Charles Cataract
Apr 5, Field removal
243 1994 May 10 Main St. Patricia Thrombos Tetracyclin Fever
1234 467 10, 1995 Mary Jones Rye, NY Gold is removal e none none
Dogwood
Lane Open
Jan 8, Harrison, David Heart Cephalosp
2345 189 1996 Charles Brown NY Rosen Surgery orin none
55 Boston
Post Road,
Nov 5, Chester, Cholecyst
4876 145 1995 Hal Kane CN Beth Little ectomy Demicillin none
Blind Brook Gallstone
May 10, Mamaronec s
5123 145 1995 Paul Kosher k, NY Beth Little Removal none none
Eye
Cornea
Replacem
Apr 5, Hilton Road ent Eye
1994 Dec Larchmont, Charles cataract Tetracyclin
6845 243 15, 1984 Ann Hood NY Field removal e Fever

IS 257 – Fall 2008


First Normal Form

To move to First Normal Form a


relation must contain only atomic
values at each row and column.
 No repeating groups
 A column or set of columns is called a
Candidate Key when its values can
uniquely identify the row in the
relation.

IS 257 – Fall 2008


First Normal Form
Patient # Surgeon # Surgery DatePatient Name Patient Addr Surgeon Name Surgery Drug adminSide Effects

15 New St.
New York, Gallstone
1111 145 01-Jan-95 John White NY Beth Little s removal Penicillin rash
15 New St. Kidney
New York, Michael stones
1111 311 12-Jun-95 John White NY Diamond removal none none
Eye
10 Main St. Cataract Tetracyclin
1234 243 05-Apr-94 Mary Jones Rye, NY Charles Field removal e Fever

10 Main St. Thrombos


1234 467 10-May-95 Mary Jones Rye, NY Patricia Gold is removal none none
Dogwood
Lane Open
Charles Harrison, Heart Cephalosp
2345 189 08-Jan-96 Brown NY David Rosen Surgery orin none

55 Boston
Post Road,
Chester, Cholecyst
4876 145 05-Nov-95 Hal Kane CN Beth Little ectomy Demicillin none

Blind Brook Gallstone


Mamaronec s
5123 145 10-May-95 Paul Kosher k, NY Beth Little Removal none none
Eye
Hilton Road Cornea
Larchmont, Replacem Tetracyclin
6845 243 05-Apr-94 Ann Hood NY Charles Field ent e Fever

Hilton Road Eye


Larchmont, cataract
6845 243 15-Dec-84 Ann Hood NY Charles Field removal none none

IS 257 – Fall 2008


1NF Storage Anomalies
 Insertion: A new patient has not yet undergone surgery --
hence no surgeon # -- Since surgeon # is part of the key, we
cannot insert.
 Insertion: If a surgeon is newly hired and has not operated
yet -- there will be no way to include that person in the
database.
 Update: If a patient comes in for a new procedure, and has
moved, we need to change multiple address entries.
 Deletion (type 1): Deleting a patient record may also delete
all info about a surgeon.
 Deletion (type 2): When there are functional dependencies
(like side effects and drug) changing one item eliminates
other information.

IS 257 – Fall 2008


Second Normal Form

A relation is said to be in Second


Normal Form when every non-key
attribute is fully functionally
dependent on the primary key.
 That is, every non-key attribute needs
the full primary key for unique
identification

IS 257 – Fall 2008


Why is this not in 2NF?
Patient # Surgeon # Surgery DatePatient Name Patient Addr Surgeon Name Surgery Drug adminSide Effects

15 New St.
New York, Gallstone
1111 145 01-Jan-95 John White NY Beth Little s removal Penicillin rash
15 New St. Kidney
New York, Michael stones
1111 311 12-Jun-95 John White NY Diamond removal none none
Eye
10 Main St. Cataract Tetracyclin
1234 243 05-Apr-94 Mary Jones Rye, NY Charles Field removal e Fever

10 Main St. Thrombos


1234 467 10-May-95 Mary Jones Rye, NY Patricia Gold is removal none none
Dogwood
Lane Open
Charles Harrison, Heart Cephalosp
2345 189 08-Jan-96 Brown NY David Rosen Surgery orin none

55 Boston
Post Road,
Chester, Cholecyst
4876 145 05-Nov-95 Hal Kane CN Beth Little ectomy Demicillin none

Blind Brook Gallstone


Mamaronec s
5123 145 10-May-95 Paul Kosher k, NY Beth Little Removal none none
Eye
Hilton Road Cornea
Larchmont, Replacem Tetracyclin
6845 243 05-Apr-94 Ann Hood NY Charles Field ent e Fever

Hilton Road Eye


Larchmont, cataract
6845 243 15-Dec-84 Ann Hood NY Charles Field removal none none

IS 257 – Fall 2008


Second Normal Form

Patient # Patient Name Patient Address


15 New St. New
1111 John White York, NY
10 Main St. Rye,
1234 Mary Jones NY
Charles Dogwood Lane
2345 Brown Harrison, NY
55 Boston Post
4876 Hal Kane Road, Chester,
Blind Brook
5123 Paul Kosher Mamaroneck, NY
Hilton Road
6845 Ann Hood Larchmont, NY

IS 257 – Fall 2008


Second Normal Form

Surgeon # Surgeon Name

145 Beth Little

189 David Rosen

243 Charles Field

311 Michael Diamond

467 Patricia Gold

IS 257 – Fall 2008


Second Normal Form
Patient # Surgeon # Surgery Date Surgery Drug Admin Side Effects
Gallstones
1111 145 01-Jan-95 removal
Kidney Penicillin rash
stones
1111 311 12-Jun-95 removal none none
Eye Cataract
1234 243 05-Apr-94 removal Tetracycline Fever
Thrombosis
1234 467 10-May-95 removal none none
Open Heart Cephalospori
2345 189 08-Jan-96 Surgery n none
Cholecystect
4876 145 05-Nov-95 omy Demicillin none
Gallstones
5123 145 10-May-95 Removal none none
Eye cataract
6845 243 15-Dec-84 removal none none
Eye Cornea
6845 243 05-Apr-94 Replacement Tetracycline Fever

IS 257 – Fall 2008


1NF Storage Anomalies Removed

 Insertion:
Can now enter new patients
without surgery.
 Insertion:
Can now enter Surgeons who
have not operated.
 Deletion (type 1): If Charles Brown dies,
the corresponding tuples from Patient and
Surgery tables can be deleted without
losing information on David Rosen.
 Update:If John White comes in for third
time, and has moved, we only need to
change the Patient table

IS 257 – Fall 2008


2NF Storage Anomalies
 Insertion: Cannot enter the fact that a
particular drug has a particular side effect
unless it is given to a patient.
 Deletion:If John White receives some other
drug because of the penicillin rash, and a
new drug and side effect are entered, we
lose the information that penicillin can
cause a rash
 Update:If drug side effects change (a new
formula) we have to update multiple
occurrences of side effects.
IS 257 – Fall 2008
Third Normal Form
 A relation is said to be in Third Normal
Form if there is no transitive functional
dependency between non-key attributes
 When one non-key attribute can be determined
with one or more non-key attributes there is
said to be a transitive functional dependency.
 The side effect column in the Surgery
table is determined by the drug
administered
 Side
effect is transitively functionally
dependent on drug so Surgery is not 3NF

IS 257 – Fall 2008


Why is this not in 3NF?
Patient # Surgeon # Surgery Date Surgery Drug Admin Side Effects
Gallstones
1111 145 01-Jan-95 removal
Kidney Penicillin rash
stones
1111 311 12-Jun-95 removal none none
Eye Cataract
1234 243 05-Apr-94 removal Tetracycline Fever
Thrombosis
1234 467 10-May-95 removal none none
Open Heart Cephalospori
2345 189 08-Jan-96 Surgery n none
Cholecystect
4876 145 05-Nov-95 omy Demicillin none
Gallstones
5123 145 10-May-95 Removal none none
Eye cataract
6845 243 15-Dec-84 removal none none
Eye Cornea
6845 243 05-Apr-94 Replacement Tetracycline Fever

IS 257 – Fall 2008


Third Normal Form

Patient # Surgeon # Surgery Date Surgery Drug Admin

1111 145 01-Jan-95 Gallstones removal Penicillin


Kidney stones
1111 311 12-Jun-95 removal none

1234 243 05-Apr-94 Eye Cataract removal Tetracycline

1234 467 10-May-95 Thrombosis removal none

2345 189 08-Jan-96 Open Heart Surgery Cephalosporin

4876 145 05-Nov-95 Cholecystectomy Demicillin

5123 145 10-May-95 Gallstones Removal none

6845 243 15-Dec-84 Eye cataract removal none


Eye Cornea
6845 243 05-Apr-94 Replacement Tetracycline

IS 257 – Fall 2008


Third Normal Form

Drug Admin Side Effects


Cephalosporin none
Demicillin none
none none
Penicillin rash
Tetracycline Fever

IS 257 – Fall 2008


2NF Storage Anomalies Removed

Insertion: We can now enter the


fact that a particular drug has a
particular side effect in the Drug
relation.
Deletion: If John White receives
some other drug as a result of the
rash from penicillin, the information
on penicillin and rash is maintained.
Update: The side effects for each
drug appear only once.
IS 257 – Fall 2008
Boyce-Codd Normal Form
Most 3NF relations are also BCNF
relations.
A 3NF relation is NOT in BCNF if:
 Candidate keys in the relation are
composite keys (they are not single
attributes)
 There is more than one candidate key
in the relation, and
 The keys are not disjoint, that is, some
attributes in the keys are common

IS 257 – Fall 2008


Fourth Normal Form

Any relation is in Fourth Normal


Form if it is BCNF and any
multivalued dependencies are
trivial
Eliminate non-trivial multivalued
dependencies by projecting into
simpler tables

IS 257 – Fall 2008


Fifth Normal Form

Arelation is in 5NF if every join


dependency in the relation is
implied by the keys of the relation
Implies that relations that have
been decomposed in previous
normal forms can be recombined
via natural joins to recreate the
original relation.

IS 257 – Fall 2008


Effectiveness and Efficiency Issues
for DBMS
 Focus on the relational model
 Any column in a relational database
can be searched for values.
 To improve efficiency indexes using
storage structures such as BTrees
and Hashing are used
 But many useful functions are not
indexable and require complete
scans of the the database

IS 257 – Fall 2008


Example: Text Fields
In conventional RDBMS, when a
text field is indexed, only exact
matching of the text field contents
(or Greater-than and Less-than).
 Can search for individual words using
pattern matching, but a full scan is
required.
Text searching is still done best
(and fastest) by specialized text
search programs (Search Engines)

IS 257 – Fall 2008


Normalization

 Normalization is performed to
reduce or eliminate Insertion,
Deletion or Update anomalies.
 However, a completely normalized
database may not be the most
efficient or effective
implementation.
“Denormalization”is sometimes
used to improve efficiency.
IS 257 – Fall 2008
Normalizing to death

Normalization splits database


information across multiple tables.
To retrieve complete information
from a normalized database, the
JOIN operation must be used.
JOIN tends to be expensive in
terms of processing time, and very
large joins are very expensive.

IS 257 – Fall 2008


Downward Denormalization
Customer Customer
Before: After:
ID ID
Address Address
Name Name
Telephone Telephone

Order Order
Order No Order No
Date Taken Date Taken
Date Dispatched Date Dispatched
Date Invoiced Date Invoiced
Cust ID Cust ID
Cust Name
IS 257 – Fall 2008
Upward Denormalization
Order Order
Order No Order No
Date Taken Date Taken
Date Dispatched Date Dispatched
Date Invoiced Date Invoiced
Cust ID Cust ID
Cust Name Cust Name
Order Price
Order Item
Order No Order Item
Item No Order No
Item Price Item No
Num Ordered Item Price
Num Ordered

IS 257 – Fall 2008


Denormalization

Usually driven by the need to


improve query speed
Query speed is improved at the
expense of more complex or
problematic DML (Data
manipulation language) for
updates, deletions and insertions.

IS 257 – Fall 2008

You might also like