0% found this document useful (0 votes)
33 views9 pages

Normalization 2011

 Each attribute contains only a single value Normalization is a systematic approach to organizing data in a database to minimize redundancy and inconsistencies. It involves decomposing tables to satisfy certain normal forms, including first normal form which requires that each attribute contain a single value and there are no repeating groups. The goal of normalization is to ensure that every non-key attribute is fully dependent on the primary key. This helps reduce data anomalies upon updates or deletions.

Uploaded by

vivek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views9 pages

Normalization 2011

 Each attribute contains only a single value Normalization is a systematic approach to organizing data in a database to minimize redundancy and inconsistencies. It involves decomposing tables to satisfy certain normal forms, including first normal form which requires that each attribute contain a single value and there are no repeating groups. The goal of normalization is to ensure that every non-key attribute is fully dependent on the primary key. This helps reduce data anomalies upon updates or deletions.

Uploaded by

vivek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Normalization

 Normalization is a foundation for relational database design


 Systematic approach to efficiently organize data in a
Normalization database

GIS Applications
Objectives
Spring 2011
 Minimize unnecessary redundancies
 Support general purpose query processing
 Minimize unwanted side effects of database updates
Inadvertent deletions or insertion errors

Normalization Functional Dependence


 Normalization theory is based on the concepts of normal forms.  The concept of functional dependence is the basis for the
first four normal forms.
 A relational table is in a particular normal form if it
satisfies a certain set of constraints.  Given a relation R, a column Y, of R is said to be functionally
dependent upon column X of R if and only if each value of X
 The first three normal forms defined by E. F. Codd are in R is associated with precisely one value of Y at any given
typically required of all tables in a relational database time

 The fourth and fifth normal forms are relatively rare  Saying that column Y is functionally dependent upon X is the
same as saying the values of column X identify the values of
 Normal forms are guidelines not strict rules. Can deviate column Y.
from them to meet practical requirements.
 Such a functional dependency is denoted as XY
 Based on the analysis of functional dependencies among
attributes.

1
The issue of keys Candidate and Primary Keys
The goal of database normalization is to ensure that every non-key Superkey – a set of one or more attributes that uniquely
column in every table is directly dependent on the key, the whole identifies a specific instance of an entity
key and nothing but the key.
Candidate key – any subset of the attributes of a superkey
that is also a superkey and not reducible to another superkey

Primary key – a selection from the set of candidate keys -


used to index a relation

Dog registry: (dog_name, owner_name, address, phone)

Field test: (field#, year, crop)

Catalog Order: (catalog_item #, customer #, billing_address,


shipping_address)

Primary Keys Composite Keys


Every relation (entity) must have a primary key Often more than one attribute is required to uniquely identify an
entity. A primary key made up of more than one attribute is known
To qualify as a primary key, an attribute must have the following as a composite key.
properties:
• it must have a non-null value for each instance of the entity
• the value must be unique for each instance of an entity
• the values must not change or become null during the life of
each entity instance Student # Student Name Major
38214 Bright IS
38214 Bright EE
69173 Smith PHY

2
Full Functional Dependence Foreign Keys
 Applies to tables with composite keys  A foreign key identifies a column or a set of columns in one
(referencing) table that refers to a column or set of columns in
another (referenced) table.
 Column Y in relational table R is fully functionally on X of
R if it is functionally dependent on X and not functionally  Columns in the referencing table must be the primary key
dependent upon any subset of X. or other candidate key in the referenced table.

 Full functional dependence means that when a primary key is  A foreign key completes a relationship by identifying the parent
composite, then the other columns must be identified by the entity.
entire key and not just some of the columns that make up the key.
 Foreign keys provide a method for maintaining integrity in
the data (called referential integrity) and for navigating
between different instances of an entity.

 Every relationship in the model must be supported by a foreign key.

Steps in Normalization Un-Normalized Relations


Un-normalized relations contain one or more repeating groups –
 Assemble data items from user views multiple values at the intersection of rows and columns

Student Student Major Course Course title Instructor Instructor Grade


 Convert to un-normalized relations # Name # name Location
38214 Bright IS IS 350 Databases Codd B104 A
 Convert to first normal form (1NF) IS 465 System Kemp B213 C
A l i
Analysis
 Convert to second normal form (2NF) 69173 Jones PM IS 465 System Kemp B213 A
Analysis
PM 300 Prod Mang Lewis D317 B
 Convert to third normal form (3NF) Op Res
QM 440 Kemp B213 C

Should result in simple relations that correspond to entities or


associations between entity classes
Contains redundant information e.g IS 465 appears in more than one
Normalized tables when recombined (joined), should convey row
exactly the same information as the original table.

3
Un-Normalized Relations Normalized relations: First
one-to-many relationship
Normal Form
one-to-one? relationship
one-to-one relationship  A relation is in first normal form if the underlying
domains contain only atomic values
Student
Student# Major
j Course#  There
Th are no repeating
ti groups within
ithi a tuple
t l
N
Name

 Most relational systems require a database to be in 1NF


Need to address the one-to-many relationships

First Normal Form Identification of Primary Key


Remove repeating groups and form 2 new relations – migrate the
primary key, and assure there is a valid new primary key Student # Student Name Major
38214 Bright IS
Student # Student Name Major
69173 Smith PM
38214 Bright IS
69173 Smith PM
Student # Course # Course Instructor Instructor Grade
Student # Course # Course Instructor Instructor Grade Title name Location
Title name Location
38214 IS 350 Database Codd B104 A
38214 IS 350 Database Codd B104 A
38214 IS 465 Sys Anal Kemp B213 C
38214 IS 465 Sys Anal Kemp B213 C
69173 IS 465 Sys Anal Kemp B213 A
69173 IS 465 Sys Anal Kemp B213 A
69173 PM 300 Op Res Lewis D317 B
69173 PM 300 Op Res Lewis D317 B

4
Insert anomaly Update anomaly
Insertion of a new course cannot occur until a student has Changing a course title or course number requires
registered for the course since Student # is part of the searching all tuples to find every occurrence of a course
composite key number or title

Student # Course # Course Instructor Instructor Grade Student # Course # Course Instructor Instructor Grade
Title name Location Title name Location
38214 IS 350 Database Codd B104 A 38214 IS 350 Database Codd B104 A
38214 IS 465 Sys Anal Kemp B213 C 38214 IS 465 Sys Anal Kemp B213 C
69173 IS 465 Sys Anal Kemp B213 A 69173 IS 465 Sys Anal Kemp B213 A
69173 PM 300 Op Res Lewis D317 B 69173 PM 300 Op Res Lewis D317 B

Deletion anomaly Functional Dependencies


Student # Course # Course Instructor Instructor Grade
Dropping a single student from a course requires dropping the Title name Location
course and losing the associated course and instructor 38214 IS 350 Database Codd B104 A
information
38214 IS 465 Sys Anal Kemp B213 C
69173 IS 465 Sys Anal Kemp B213 A

Student # Course # Course Instructor Instructor Grade 69173 PM 300 O R


Op Res L i
Lewis D317 B
Title name Location
38214 IS 350 Database Codd B104 A Course # Course Title
38214 IS 465 Sys Anal Kemp B213 C
Course # Instructor Name
69173 IS 465 Sys Anal Kemp B213 A
Course # Instructor Location
69173 PM 300 Op Res Lewis D317 B
Student #, Course #  Grade

5
Second Normal Form Second Normal Form
To convert from first to second normal form – remove partial dependencies
A relation is in second normal form if it is in 1NF and every Create 2 new relations, one with attributes fully dependent on primary
non-key attribute is fully dependent on the primary key key, other with attributes that were only partially dependent
Student # Course # Grade Courses are independent of
Student # Course # Course Instructor Instructor Grade 38214 IS 350 A Student # and so can be inserted
Title name Location 38214 IS 465 C or deleted independently, only a
38214 IS 350 D b
Database C dd
Codd B104 A 69173 IS 465 B single tuple needs to be updated
38214 IS 465 Sys Anal Kemp B213 C 69173 PM 300 C in the course relation
69173 IS 465 Sys Anal Kemp B213 A Course # Course Title Instructor Instructor
Name Location
69173 PM 300 Op Res Lewis D317 B
IS 350 Database Codd B104
IS 465 Sys Anal Kemp B213
Course Title, Instructor Name and Instructor Location are PM 300 Prod man Lewis D317
partially dependent on the primary key (only on Course#)
QM 440 Op Res Kemp B213

Transitive dependencies Insertion anomaly


Course # Course Title Instructor Instructor
Name Location Since instructor is dependent on Course # as primary
IS 350 Database Codd B104 key no information about an instructor can be added
A non-key until an instructor has been assigned to a course
IS 465 Sys Anal Kemp B213
attribute is
PM 300 Prod man Lewis D317 dependent on
QM 440 Op Res Kemp B213 one or more no-
key attributes Delete anomaly
Deleting data for a course results in deleting instructor
one-to-one relationship
information
one-to-one relationship

Instructor Instructor
Course# Course Title Name Location

6
Update anomaly Third Normal Form
 A relation is in third normal form if it is in 2NF and
Course # Course Title Instructor Instructor contains no transitive dependencies
Name Location
 Every non-key attribute is fully dependent on the primary
IS 350 Database Codd B104
key and there are no transitive dependencies
IS 465 Sys Anal Kemp B213
PM 300 Prod man Lewis D317 Instructor Name Instructor Location Non-key attributes that
QM 440 O R
Op Res K
Kemp B213 Codd B104 participate
ti i t ini the
th
Kemp B213 transitive dependency
To update instructor information the entire relation must be Lewis D317 form a new relation
searched since instructor information occurs more than once.
Course # Course title Instructor Name
IS 350 Database Codd Foreign key – a non-
IS 465 Sys Anal Kemp key attribute in one
relation that serves
PM 300 Prod Mang Lewis as a primary key in
QM 440 OP Res Kemp another relation

Boyce-Codd Normal Form Boyce-Codd Normal Form


Occurs in the case of overlapping candidate keys
A relation is in Boyce-Codd normal form if it is in 3NF
 Each student can major in several subjects and there are no dependencies in candidate keys
 For each major a student has one advisor
 Each major has several advisors

 Each
E h advisor
d i advises
d i only
l one major
j
Student # Major Advisor Need to project into 2 new relations
There are 2 possible candidate keys: 123 Physics Einstein
Student # Major Advisor Student # Advisor Advisor Major
Student #-Major or Student# – 123 Music Mozart
123 Physics Einstein 123 Einstein Einstein Physics
Advisor and they are overlapping. 456 Biol Darwin
123 Music Mozart 123 Mozart Mozart Music
789 Physics Bohr
456 Biol Darwin Attributes that are part of a 456 Darwin Darwin Biol
999 Physics Einstein
789 Physics Bohr candidate key are dependent on 789 Bohr Bohr Physics
999 Physics Einstein part of another candidate key. 999 Einstein

7
Fourth Normal Form Fourth Normal Form
Car Color Doors Several
Removes multi-valued dependencies Outback Gray 2 redundancies
Outback Gray 4
exist in the
Multi-valued dependency – when 3 attributes (A, B, C) relation
exist in a relation and for each value of A there is a well Outback Navy 4
defined set of values for B and a well defined set of values Outback Navy 2 Can generate
for C,, yyet B and C are independent
p of each other Forester Silver 2
deletion and
update anomalies
Forester Silver 4

Car Color Car Doors


Project to 2
Outback Gray Outback 2
Car Color Doors new
Outback Navy Outback 4 relations
Forester Silver Forester 2
A record type should not contain two or more independent Forester 4
multi-valued facts about an entity.

Normalization Summary Limits of Normalization


Leads to simpler (to implement) applications and to more  Normalization rules are guidelines
maintainable systems
Based on a set of rules that define normal forms – of which first  In certain circumstances 3NF or higher may not be desirable
three are most important:
Customer (Name, Street, City, State, Zipcode)

First normal form: All column values are atomic D


Does not meet 3NF

Second normal form: All column values depend on the


Customer(Name, Street, Zipcode)
whole primary key: no partial dependencies
Location(Zipcode, City, State)
Third normal form: No column value depends on the value
3NF may not be efficient in terms of regular queries
of any other column except the primary key – no
transitive dependencies
Need to apply judgment and common sense

8
Limits of Normalization References
1. E.F. Codd, "A Relational Model of Data for Large Shared Data
 There can be a number of cases where there is a compelling Banks", Comm. ACM 13 (6), June 1970, pp. 377-387. The original
need for non first normal form structures. paper introducing the relational data model.
2. E.F. Codd, "Normalized Data Base Structure: A Brief Tutorial", ACM
 Spatial data objects is one of them SIGFIDET Workshop on Data Description, Access, and Control, Nov.
11-12, 1971, San Diego,
g California, E.F. Codd and A.L. Dean (eds.).
( )
An early tutorial on the relational model and normalization.
 Object-relational model supports ability to implement non-
3. E.F. Codd, "Further Normalization of the Data Base Relational
first normal structures
Model", R. Rustin (ed.), Data Base Systems (Courant Computer
arrays Science Symposia 6), Prentice-Hall, 1972. Also IBM Research Report
RJ909. The first formal treatment of second and third normal forms.
nested tables
4. C.J. Date, An Introduction to Database Systems (third edition),
Addison-Wesley, 1981.

You might also like