Normalization 2011
Normalization 2011
GIS Applications
Objectives
Spring 2011
Minimize unnecessary redundancies
Support general purpose query processing
Minimize unwanted side effects of database updates
Inadvertent deletions or insertion errors
The fourth and fifth normal forms are relatively rare Saying that column Y is functionally dependent upon X is the
same as saying the values of column X identify the values of
Normal forms are guidelines not strict rules. Can deviate column Y.
from them to meet practical requirements.
Such a functional dependency is denoted as XY
Based on the analysis of functional dependencies among
attributes.
1
The issue of keys Candidate and Primary Keys
The goal of database normalization is to ensure that every non-key Superkey – a set of one or more attributes that uniquely
column in every table is directly dependent on the key, the whole identifies a specific instance of an entity
key and nothing but the key.
Candidate key – any subset of the attributes of a superkey
that is also a superkey and not reducible to another superkey
2
Full Functional Dependence Foreign Keys
Applies to tables with composite keys A foreign key identifies a column or a set of columns in one
(referencing) table that refers to a column or set of columns in
another (referenced) table.
Column Y in relational table R is fully functionally on X of
R if it is functionally dependent on X and not functionally Columns in the referencing table must be the primary key
dependent upon any subset of X. or other candidate key in the referenced table.
Full functional dependence means that when a primary key is A foreign key completes a relationship by identifying the parent
composite, then the other columns must be identified by the entity.
entire key and not just some of the columns that make up the key.
Foreign keys provide a method for maintaining integrity in
the data (called referential integrity) and for navigating
between different instances of an entity.
3
Un-Normalized Relations Normalized relations: First
one-to-many relationship
Normal Form
one-to-one? relationship
one-to-one relationship A relation is in first normal form if the underlying
domains contain only atomic values
Student
Student# Major
j Course# There
Th are no repeating
ti groups within
ithi a tuple
t l
N
Name
4
Insert anomaly Update anomaly
Insertion of a new course cannot occur until a student has Changing a course title or course number requires
registered for the course since Student # is part of the searching all tuples to find every occurrence of a course
composite key number or title
Student # Course # Course Instructor Instructor Grade Student # Course # Course Instructor Instructor Grade
Title name Location Title name Location
38214 IS 350 Database Codd B104 A 38214 IS 350 Database Codd B104 A
38214 IS 465 Sys Anal Kemp B213 C 38214 IS 465 Sys Anal Kemp B213 C
69173 IS 465 Sys Anal Kemp B213 A 69173 IS 465 Sys Anal Kemp B213 A
69173 PM 300 Op Res Lewis D317 B 69173 PM 300 Op Res Lewis D317 B
5
Second Normal Form Second Normal Form
To convert from first to second normal form – remove partial dependencies
A relation is in second normal form if it is in 1NF and every Create 2 new relations, one with attributes fully dependent on primary
non-key attribute is fully dependent on the primary key key, other with attributes that were only partially dependent
Student # Course # Grade Courses are independent of
Student # Course # Course Instructor Instructor Grade 38214 IS 350 A Student # and so can be inserted
Title name Location 38214 IS 465 C or deleted independently, only a
38214 IS 350 D b
Database C dd
Codd B104 A 69173 IS 465 B single tuple needs to be updated
38214 IS 465 Sys Anal Kemp B213 C 69173 PM 300 C in the course relation
69173 IS 465 Sys Anal Kemp B213 A Course # Course Title Instructor Instructor
Name Location
69173 PM 300 Op Res Lewis D317 B
IS 350 Database Codd B104
IS 465 Sys Anal Kemp B213
Course Title, Instructor Name and Instructor Location are PM 300 Prod man Lewis D317
partially dependent on the primary key (only on Course#)
QM 440 Op Res Kemp B213
Instructor Instructor
Course# Course Title Name Location
6
Update anomaly Third Normal Form
A relation is in third normal form if it is in 2NF and
Course # Course Title Instructor Instructor contains no transitive dependencies
Name Location
Every non-key attribute is fully dependent on the primary
IS 350 Database Codd B104
key and there are no transitive dependencies
IS 465 Sys Anal Kemp B213
PM 300 Prod man Lewis D317 Instructor Name Instructor Location Non-key attributes that
QM 440 O R
Op Res K
Kemp B213 Codd B104 participate
ti i t ini the
th
Kemp B213 transitive dependency
To update instructor information the entire relation must be Lewis D317 form a new relation
searched since instructor information occurs more than once.
Course # Course title Instructor Name
IS 350 Database Codd Foreign key – a non-
IS 465 Sys Anal Kemp key attribute in one
relation that serves
PM 300 Prod Mang Lewis as a primary key in
QM 440 OP Res Kemp another relation
Each
E h advisor
d i advises
d i only
l one major
j
Student # Major Advisor Need to project into 2 new relations
There are 2 possible candidate keys: 123 Physics Einstein
Student # Major Advisor Student # Advisor Advisor Major
Student #-Major or Student# – 123 Music Mozart
123 Physics Einstein 123 Einstein Einstein Physics
Advisor and they are overlapping. 456 Biol Darwin
123 Music Mozart 123 Mozart Mozart Music
789 Physics Bohr
456 Biol Darwin Attributes that are part of a 456 Darwin Darwin Biol
999 Physics Einstein
789 Physics Bohr candidate key are dependent on 789 Bohr Bohr Physics
999 Physics Einstein part of another candidate key. 999 Einstein
7
Fourth Normal Form Fourth Normal Form
Car Color Doors Several
Removes multi-valued dependencies Outback Gray 2 redundancies
Outback Gray 4
exist in the
Multi-valued dependency – when 3 attributes (A, B, C) relation
exist in a relation and for each value of A there is a well Outback Navy 4
defined set of values for B and a well defined set of values Outback Navy 2 Can generate
for C,, yyet B and C are independent
p of each other Forester Silver 2
deletion and
update anomalies
Forester Silver 4
8
Limits of Normalization References
1. E.F. Codd, "A Relational Model of Data for Large Shared Data
There can be a number of cases where there is a compelling Banks", Comm. ACM 13 (6), June 1970, pp. 377-387. The original
need for non first normal form structures. paper introducing the relational data model.
2. E.F. Codd, "Normalized Data Base Structure: A Brief Tutorial", ACM
Spatial data objects is one of them SIGFIDET Workshop on Data Description, Access, and Control, Nov.
11-12, 1971, San Diego,
g California, E.F. Codd and A.L. Dean (eds.).
( )
An early tutorial on the relational model and normalization.
Object-relational model supports ability to implement non-
3. E.F. Codd, "Further Normalization of the Data Base Relational
first normal structures
Model", R. Rustin (ed.), Data Base Systems (Courant Computer
arrays Science Symposia 6), Prentice-Hall, 1972. Also IBM Research Report
RJ909. The first formal treatment of second and third normal forms.
nested tables
4. C.J. Date, An Introduction to Database Systems (third edition),
Addison-Wesley, 1981.