0% found this document useful (0 votes)
8 views54 pages

Unit 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views54 pages

Unit 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 54

Normalization

• Normalization theory is based on the


observation that relations with certain
properties are more effective in inserting,
updating and deleting data than other sets of
relations containing the same data
• Normalization is a multi-step process beginning
with an “unnormalized”relation
– Hospital example from Atre, S. Data Base: Structured
Techniques for Design, Performance, and
Management.

IS 257 – Fall 2008


Normal Forms
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Boyce-Codd Normal Form (BCNF)
• Fourth Normal Form (4NF)
• Fifth Normal Form (5NF)

IS 257 – Fall 2008


Normalization

Functional
dependency
No transitive
of nonkey
dependency
attributes on
between
the primary
nonkey Boyce- key - Atomic
attributes
Codd and values only

All
Higher Full
determinants Functional
are candidate dependency
keys - Single of nonkey
multivalued attributes on
dependency the primary
key

IS 257 – Fall 2008


Functional Dependencies
• Functional dependencies (FDs) are used to
specify formal measures of the "goodness" of
relational designs
• FDs and keys are used to define normal forms
for relations
• FDs are constraints that are derived from the
meaning and interrelationships of the data
attributes
Functional Dependency definition

• A set of attributes X functionally determines a set of


attributes Y if the value of X determines a unique value for Y
• X Y holds if whenever two tuples have the same value for
X, they must have the same value for Y
If t1[X]=t2[X], then t1[Y]=t2[Y] in any relation instance r(R)
• X  Y in R specifies a constraint on all relation instances
r(R)
• FDs are derived from the real-world constraints on the
attributes
Examples of FD constraints
• Social Security Number determines employee name
SSN  ENAME
• Project Number determines project name and
location
PNUMBER  {PNAME, PLOCATION}
• Employee SSN and project number determines the
hours per week that the employee works on the
project
{SSN, PNUMBER}  HOURS
Functional Dependencies and Keys
• An FD is a property of the attributes in the
schema R
• The constraint must hold on every relation
instance r(R)
• If K is a key of R, then K functionally
determines all attributes in R (since we never
have two distinct tuples with t1[K]=t2[K])
Inference Rules for FDs
• Given a set of FDs F, we can infer additional FDs that
hold whenever the FDs in F hold
• Armstrong's inference rules
A1. (Reflexive) If Y subset-of X, then X  Y
A2. (Augmentation) If X  Y, then XZ  YZ
(Notation: XZ stands for X U Z)
A3. (Transitive) If X  Y and Y  Z, then X  Z
• A1, A2, A3 form a sound and complete set of
inference rules
Additional Useful Inference Rules
• Decomposition
– If X  YZ, then X  Y and X  Z
• Union
– If X  Y and X  Z, then X  YZ
• Psuedotransitivity
– If X  Y and WY  Z, then WX  Z
• Closure of a set F of FDs is the set F+ of all FDs that
can be inferred from F
Introduction to Normalization
• Normalization: Process of decomposing
unsatisfactory "bad" relations by breaking up their
attributes into smaller relations
• Normal form: Condition using keys and FDs of a
relation to certify whether a relation schema is in a
particular normal form
– 2NF, 3NF, BCNF based on keys and FDs of a relation schema
– 4NF based on keys, multi-valued dependencies
Unnormalized Relations
• First step in normalization is to convert the
data into a two-dimensional table
• In unnormalized relations data can repeat
within a column

IS 257 – Fall 2008


Unnormalized Relation
Patient # Surgeon # Surg. date Patient Name Patient Addr Surgeon Surgery Postop drug
Drug side effects

Gallstone
s removal;
Jan 1, 15 New St. Beth Little Kidney
145 1995; June New York, Michael stones Penicillin, rash
1111 311 12, 1995 John White NY Diamond removal none- none

Eye
Charles Cataract
Apr 5, Field removal
243 1994 May 10 Main St. Patricia Thrombos Tetracyclin Fever
1234 467 10, 1995 Mary Jones Rye, NY Gold is removal e none none
Dogwood
Lane Open
Jan 8, Harrison, David Heart Cephalosp
2345 189 1996 Charles Brown NY Rosen Surgery orin none
55 Boston
Post Road,
Nov 5, Chester, Cholecyst
4876 145 1995 Hal Kane CN Beth Little ectomy Demicillin none
Blind Brook Gallstone
May 10, Mamaronec s
5123 145 1995 Paul Kosher k, NY Beth Little Removal none none
Eye
Cornea
Replacem
Apr 5, Hilton Road ent Eye
1994 Dec Larchmont, Charles cataract Tetracyclin
6845 243 15, 1984 Ann Hood NY Field removal e Fever

IS 257 – Fall 2008


First Normal Form
• To move to First Normal Form a relation must
contain only atomic values at each row and
column.
– No repeating groups
– A column or set of columns is called a Candidate
Key when its values can uniquely identify the row
in the relation.

IS 257 – Fall 2008


First Normal Form
Patient # Surgeon # Surgery DatePatient Name Patient Addr Surgeon Name Surgery Drug adminSide Effects

15 New St.
New York, Gallstone
1111 145 01-Jan-95 John White NY Beth Little s removal Penicillin rash
15 New St. Kidney
New York, Michael stones
1111 311 12-Jun-95 John White NY Diamond removal none none
Eye
10 Main St. Cataract Tetracyclin
1234 243 05-Apr-94 Mary Jones Rye, NY Charles Field removal e Fever

10 Main St. Thrombos


1234 467 10-May-95 Mary Jones Rye, NY Patricia Gold is removal none none
Dogwood
Lane Open
Charles Harrison, Heart Cephalosp
2345 189 08-Jan-96 Brown NY David Rosen Surgery orin none

55 Boston
Post Road,
Chester, Cholecyst
4876 145 05-Nov-95 Hal Kane CN Beth Little ectomy Demicillin none

Blind Brook Gallstone


Mamaronec s
5123 145 10-May-95 Paul Kosher k, NY Beth Little Removal none none
Eye
Hilton Road Cornea
Larchmont, Replacem Tetracyclin
6845 243 05-Apr-94 Ann Hood NY Charles Field ent e Fever

Hilton Road Eye


Larchmont, cataract
6845 243 15-Dec-84 Ann Hood NY Charles Field removal none none

IS 257 – Fall 2008


1NF Storage Anomalies
• Insertion: A new patient has not yet undergone surgery --
hence no surgeon # -- Since surgeon # is part of the key, we
cannot insert.
• Insertion: If a surgeon is newly hired and has not operated
yet -- there will be no way to include that person in the
database.
• Update: If a patient comes in for a new procedure, and has
moved, we need to change multiple address entries.
• Deletion (type 1): Deleting a patient record may also delete
all info about a surgeon.
• Deletion (type 2): When there are functional dependencies
(like side effects and drug) changing one item eliminates
other information.

IS 257 – Fall 2008


Second Normal Form
• A relation is said to be in Second Normal
Form when every non-key attribute is fully
functionally dependent on the primary key.
– That is, every non-key attribute needs the full
primary key for unique identification

IS 257 – Fall 2008


Why is this not in 2NF?
Patient # Surgeon # Surgery DatePatient Name Patient Addr Surgeon Name Surgery Drug adminSide Effects

15 New St.
New York, Gallstone
1111 145 01-Jan-95 John White NY Beth Little s removal Penicillin rash
15 New St. Kidney
New York, Michael stones
1111 311 12-Jun-95 John White NY Diamond removal none none
Eye
10 Main St. Cataract Tetracyclin
1234 243 05-Apr-94 Mary Jones Rye, NY Charles Field removal e Fever

10 Main St. Thrombos


1234 467 10-May-95 Mary Jones Rye, NY Patricia Gold is removal none none
Dogwood
Lane Open
Charles Harrison, Heart Cephalosp
2345 189 08-Jan-96 Brown NY David Rosen Surgery orin none

55 Boston
Post Road,
Chester, Cholecyst
4876 145 05-Nov-95 Hal Kane CN Beth Little ectomy Demicillin none

Blind Brook Gallstone


Mamaronec s
5123 145 10-May-95 Paul Kosher k, NY Beth Little Removal none none
Eye
Hilton Road Cornea
Larchmont, Replacem Tetracyclin
6845 243 05-Apr-94 Ann Hood NY Charles Field ent e Fever

Hilton Road Eye


Larchmont, cataract
6845 243 15-Dec-84 Ann Hood NY Charles Field removal none none

IS 257 – Fall 2008


Second Normal Form
Patient # Patient Name Patient Address
15 New St. New
1111 John White York, NY
10 Main St. Rye,
1234 Mary Jones NY
Charles Dogwood Lane
2345 Brown Harrison, NY
55 Boston Post
4876 Hal Kane Road, Chester,
Blind Brook
5123 Paul Kosher Mamaroneck, NY
Hilton Road
6845 Ann Hood Larchmont, NY

IS 257 – Fall 2008


Second Normal Form
Surgeon # Surgeon Name

145 Beth Little

189 David Rosen

243 Charles Field

311 Michael Diamond

467 Patricia Gold

IS 257 – Fall 2008


Second Normal Form
Patient # Surgeon # Surgery Date Surgery Drug Admin Side Effects
Gallstones
1111 145 01-Jan-95 removal
Kidney Penicillin rash
stones
1111 311 12-Jun-95 removal none none
Eye Cataract
1234 243 05-Apr-94 removal Tetracycline Fever
Thrombosis
1234 467 10-May-95 removal none none
Open Heart Cephalospori
2345 189 08-Jan-96 Surgery n none
Cholecystect
4876 145 05-Nov-95 omy Demicillin none
Gallstones
5123 145 10-May-95 Removal none none
Eye cataract
6845 243 15-Dec-84 removal none none
Eye Cornea
6845 243 05-Apr-94 Replacement Tetracycline Fever

IS 257 – Fall 2008


1NF Storage Anomalies Removed

• Insertion: Can now enter new patients without


surgery.
• Insertion: Can now enter Surgeons who have not
operated.
• Deletion (type 1): If Charles Brown dies, the
corresponding tuples from Patient and Surgery tables
can be deleted without losing information on David
Rosen.
• Update: If John White comes in for third time, and
has moved, we only need to change the Patient table

IS 257 – Fall 2008


2NF Storage Anomalies
• Insertion: Cannot enter the fact that a particular drug
has a particular side effect unless it is given to a
patient.
• Deletion: If John White receives some other drug
because of the penicillin rash, and a new drug and
side effect are entered, we lose the information that
penicillin can cause a rash
• Update: If drug side effects change (a new formula)
we have to update multiple occurrences of side
effects.

IS 257 – Fall 2008


Third Normal Form
• A relation is said to be in Third Normal Form if there
is no transitive functional dependency between non-
key attributes
– When one non-key attribute can be determined with one
or more non-key attributes there is said to be a transitive
functional dependency.
• The side effect column in the Surgery table is
determined by the drug administered
– Side effect is transitively functionally dependent on drug
so Surgery is not 3NF

IS 257 – Fall 2008


Why is this not in 3NF?
Patient # Surgeon # Surgery Date Surgery Drug Admin Side Effects
Gallstones
1111 145 01-Jan-95 removal
Kidney Penicillin rash
stones
1111 311 12-Jun-95 removal none none
Eye Cataract
1234 243 05-Apr-94 removal Tetracycline Fever
Thrombosis
1234 467 10-May-95 removal none none
Open Heart Cephalospori
2345 189 08-Jan-96 Surgery n none
Cholecystect
4876 145 05-Nov-95 omy Demicillin none
Gallstones
5123 145 10-May-95 Removal none none
Eye cataract
6845 243 15-Dec-84 removal none none
Eye Cornea
6845 243 05-Apr-94 Replacement Tetracycline Fever

IS 257 – Fall 2008


Third Normal Form
Patient # Surgeon # Surgery Date Surgery Drug Admin

1111 145 01-Jan-95 Gallstones removal Penicillin


Kidney stones
1111 311 12-Jun-95 removal none

1234 243 05-Apr-94 Eye Cataract removal Tetracycline

1234 467 10-May-95 Thrombosis removal none

2345 189 08-Jan-96 Open Heart Surgery Cephalosporin

4876 145 05-Nov-95 Cholecystectomy Demicillin

5123 145 10-May-95 Gallstones Removal none

6845 243 15-Dec-84 Eye cataract removal none


Eye Cornea
6845 243 05-Apr-94 Replacement Tetracycline

IS 257 – Fall 2008


Third Normal Form

Drug Admin Side Effects


Cephalosporin none
Demicillin none
none none
Penicillin rash
Tetracycline Fever

IS 257 – Fall 2008


2NF Storage Anomalies Removed

• Insertion: We can now enter the fact that a


particular drug has a particular side effect in
the Drug relation.
• Deletion: If John White receives some other
drug as a result of the rash from penicillin, the
information on penicillin and rash is
maintained.
• Update: The side effects for each drug appear
only once.

IS 257 – Fall 2008


Boyce-Codd Normal Form
• Most 3NF relations are also BCNF relations.
• A 3NF relation is NOT in BCNF if:
– Candidate keys in the relation are composite keys
(they are not single attributes)
– There is more than one candidate key in the
relation, and
– The keys are not disjoint, that is, some attributes
in the keys are common

IS 257 – Fall 2008


07/31/24
IS 257 – Fall 2008
IS 257 – Fall 2008
IS 257 – Fall 2008
IS 257 – Fall 2008
IS 257 – Fall 2008
IS 257 – Fall 2008
IS 257 – Fall 2008
IS 257 – Fall 2008
Boyce Codd Normal Form (BCNF)
Example 2 - Movie (Not in BCNF)
Scheme  {MovieTitle, MovieID, PersonName, Role, Payment }
1. Key1  {MovieTitle, PersonName}
2. Key2  {MovieID, PersonName}
3. Both role and payment functionally depend on both candidate keys thus 3NF
4. {MovieID}  {MovieTitle}
5. Dependency between MovieID & MovieTitle Violates BCNF

Example 3 - Consulting (Not in BCNF)


Scheme  {Client, Problem, Consultant}
1. Key1  {Client, Problem}
2. Key2  {Client, Consultant}
3. No non-key attribute hence 3NF
4. {Client, Problem}  {Consultant}
5. {Client, Consultant}  {Problem}
6. Dependency between attributess belonging to keys violates BCNF
BCNF - Decomposition
Example 2 (Convert to BCNF)
Old Scheme  {MovieTitle, MovieID, PersonName, Role, Payment }
New Scheme  {MovieID, PersonName, Role, Payment}
New Scheme  {MovieTitle, PersonName}
• Loss of relation {MovieID}  {MovieTitle}
New Scheme  {MovieID, PersonName, Role, Payment}
New Scheme  {MovieID, MovieTitle}
• We got the {MovieID}  {MovieTitle} relationship back

Example 3 (Convert to BCNF)


Old Scheme  {Client, Problem, Consultant}
New Scheme  {Client, Consultant}
New Scheme  {Client, Problem}
Multivalued Dependency
• Multivalued dependency occurs in the
situation where there are multiple
independent multivalued attributes in a single
table.
• A multivalued dependency is a complete
constraint between two sets of attributes in a
relation.
• It requires that certain tuples be present in a
relation.
07/31/24
Multivalued Dependency

07/31/24
• This dependence can be represented like this:
 car_model ->-> maf_year
 car_model->-> colour

07/31/24
Fourth Normal Form (4NF)
• Fourth normal form eliminates independent many-to-one relationships
between columns.
• To be in Fourth Normal Form,
– a relation must first be in Boyce-Codd Normal Form.
– a given relation may not contain more than one multi-valued attribute.

Example (Not in 4NF)


Scheme  {MovieName, ScreeningCity, Genre)
Primary Key: {MovieName, ScreeningCity, Genre)
1. All columns are a part of the only candidate key, hence BCNF
2. Many Movies can have the same Genre
3. Many Cities can have the same movie Movie ScreeningCity Genre

4. Violates 4NF Hard Code Los Angles Comedy

Hard Code New York Comedy

Bill Durham Santa Cruz Drama

Bill Durham Durham Drama

The Code Warrier New York Horror


4NF - Decomposition
1. Move the two multi-valued relations to separate tables
2. Identify a primary key for each of the new entity.

Example 1 (Convert to 3NF)


Old Scheme  {MovieName, ScreeningCity, Genre}
New Scheme  {MovieName, ScreeningCity}
New Scheme  {MovieName, Genre}

Movie Genre Movie ScreeningCity


Hard Code Comedy Hard Code Los Angles

Bill Durham Drama Hard Code New York

The Code Warrier Horror Bill Durham Santa Cruz

Bill Durham Durham

The Code Warrier New York


Fourth Normal Form (4NF)
Example 2 (Not in 4NF) Manager Child Employee

Scheme  {Manager, Child, Employee} Jim Beth Alice

1. Primary Key  {Manager, Child, Employee} Mary Bob Jane

2. Each manager can have more than one child Mary NULL Adam

3. Each manager can supervise more than one employee


4. 4NF Violated

Example 3 (Not in 4NF)


Scheme  {Employee, Skill, ForeignLanguage}
1. Primary Key  {Employee, Skill, Language }
2. Each employee can speak multiple languages
3. Each employee can have multiple skills Employee Skill Language
4. Thus violates 4NF 1234 Cooking French

1234 Cooking German


1453 Carpentry Spanish

1453 Cooking Spanish


2345 Cooking Spanish
4NF - Decomposition
Example 2 (Convert to 4NF) Manager Child Manager Employee
Old Scheme  {Manager, Child, Employee} Jim Beth Jim Alice

New Scheme  {Manager, Child} Mary Bob Mary Jane


Mary Adam
New Scheme  {Manager, Employee}

Example 3 (Convert to 4NF)


Old Scheme  {Employee, Skill, ForeignLanguage}
New Scheme  {Employee, Skill}
New Scheme  {Employee, ForeignLanguage}
Employee Skill Employee Language
1234 Cooking 1234 French

1453 Carpentry 1234 German

1453 Cooking 1453 Spanish

2345 Cooking 2345 Spanish


07/31/24
IS 257 – Fall 2008
IS 257 – Fall 2008
IS 257 – Fall 2008
Lossy Decomposition
• As the name suggests, when a relation is
decomposed into two or more relational
schemas, the loss of information is
unavoidable when the original relation is
retrieved.

IS 257 – Fall 2008


07/31/24
Fifth Normal Form
• A relation is in 5NF if every join dependency in
the relation is implied by the keys of the
relation
• Implies that relations that have been
decomposed in previous normal forms can be
recombined via natural joins to recreate the
original relation.

IS 257 – Fall 2008


Effectiveness and Efficiency Issues for DBMS

• Focus on the relational model


• Any column in a relational database can be
searched for values.
• To improve efficiency indexes using storage
structures such as BTrees and Hashing are
used
• But many useful functions are not indexable
and require complete scans of the the
database

IS 257 – Fall 2008

You might also like