Unit 3 Normalization
Unit 3 Normalization
Unit -3
Relational Model
Main idea:
Table: relation
Column header: attribute
Row: tuple
Account Customer
AccountId CustomerId Balance Id Name Addr
150 20 11,000 20 Tom Irvine
160 23 2,300 23 Jane LA
180 23 32,000 32 Jack Riverside
• Database schema consists of
– a set of relation schema
– Account(AccountId, CustomerId, Balance)
– Customer(Id, Name, Addr)
– a set of constraints over the relation schema
– AccountId, CustomerId must an integer
– Name and Addr must be a string of characters
– CustomerId in Account must be of Ids in Customer
– etc.
NULL value
Key:
Minimal superkey (no proper subset is a superkey)
If more than one key: choose one as a primary key
Example:
Key 1: LogID (primary key)
Key 2: AccountId, Xact#
Superkeys: all supersets of the keys
There are two Integrity Rules that every relation should follow :
1. Entity Integrity (Rule 1)
2. Referential Integrity (Rule 2)
r
s
S R
Examples of Referential Integrity
Account Customer
AccountId CustomerId Balance Id Name Addr
150 20 11,000 20 Tom Irvine
160 23 2,300 23 Jane LA
180 23 32,000 32 Jack Riverside
Account.customerId to Customer.Id
Student Dept
Id Name Dept Name chair
1111 Mike ICS ICS Tom
2222 Harry CE CE Jane
3333 Ford ICS MATH Jack
Relational Algebra is :
1. The formal description of how a relational database
operates
2. An interface to the data stored in the database itself.
3. The mathematics which underpin SQL operations
INTERSECTION of R and S
the intersection of R and S is a relation that includes
all tuples that are both in R and S.
DIFFERENCE of R and S
the difference of R and S is the relation that contains
all the tuples that are in R but that are not in S.
Union , Intersection , Difference -
OR
πename, salary(employee)
Projection
Emp Contact
E.name Dept C.Name Addr
Jack Physics Jack Irvine
Jack Physics Tom LA
Jack Physics Mary Riverside
Tom ICS Jack Irvine
Tom ICS Tom LA
Tom ICS Mary Riverside
JOIN Example
JOIN is used to combine related tuples from two
relations R and S.
In its simplest form the JOIN operator is just the
cross product of the two relations and is represented
as (R ⋈ S).
R S
R.A>S.C
R(A,B) S(C,D)
A B C D
3 4 2 7
5 7 6 8
RS Result
R.A R.B S.C S.D
3 4 2 7 R.A R.B S.C S.D
3 4 6 8 3 4 2 7
5 7 2 7 5 7 2 7
5 7 6 8
Theta-Join
R S
R.A>S.C, R.B S.D
R(A,B) S(C,D)
A B C D
3 4 2 7
5 7 6 8
RS Result
R.A R.B S.C S.D R.A R.B S.C S.D
3 4 2 7 3 4 2 7
3 4 6 8
5 7 2 7
5 7 6 8
Equi-Join
R(A,B) S(C,D)
A B C D
3 4 2 7
5 7 6 8
R S
R.B=S.D
RS Result
R.A R.B S.C S.D R.A R.B S.C S.D
3 4 2 7 5 7 2 7
3 4 6 8
5 7 2 7
5 7 6 8
Natural-Join
R S = L (R S)
R.A1=S.A1,…,R.Ak=S.Ak
Natural-Join
Pad null values for both left and right dangling tuples.
Joins Revised
balance
Tree representation custssn= ssn
account name=tom
customer
Example 1(cont)
customer(ssn, name, city)
account(custssn, balance)
“List account balances of Tom.”
balance
ssn=custssn
account name=tom
customer
Comparing RA and SQL
Relational algebra:
is closed (the result of every expression is a relation)
has a rigorous foundation
has simple semantics
is used for reasoning, query optimisation, etc.
SQL:
is a superset of relational algebra
has convenient formatting features, etc.
provides aggregate functions
has complicated semantics
is an end-user language.
Functional
Dependencies
And
Normalization
Schema Normalization
• Reflexivity:
– If Y is a subset of X, then X →Y.
– Example: AB→A, ABC→AB, etc.
• Augmentation:
– If X→Y, then XZ→YZ.
– Example: If A→B, then AC→BC.
• Transitivity:
– If X→Y, and Y→Z, then X→Z.
– Example: If AB→C, and C→D, then AB→D.
More Rules Derived from AAs
Projectivity
If X→YZ, then X→Y and X→Z
Pseudo-Transitivity Rule:
If X→Y, WY→Z, then WX→Z
The Normalization Process
In relational databases the term normalization refers to a reversible step-
by-step process in which a given set of relations is decomposed into a set
of smaller relations that have a progressively simpler and more regular
structure.
rmalized Relati
o
n t normal fo o
rs d normal r
Un
ns
Functional
on normal f
dependency
m m
Sec Fi
No transitive of nonkey
dependency
d f
or
attributes on
between the primary
Thir
orm
nonkey key - Atomic
attributes Boyce- values only
Codd and
Higher
All Full
determinants Functional
are candidate dependency
of nonkey
keys - Single
multivalued attributes on
dependency the primary
key
Normal Forms
Gallstone
s removal;
Jan 1, 15 New St. Beth Little Kidney
145 1995; June New York, Michael stones Penicillin, rash
1111 311 12, 1995 John White NY Diamond removal none- none
Eye
Charles Cataract
Apr 5, Field removal
243 1994 May 10 Main St. Patricia Thrombos Tetracyclin Fever
1234 467 10, 1995 Mary Jones Rye, NY Gold is removal e none none
Dogwood
Lane Open
Jan 8, Harrison, David Heart Cephalosp
2345 189 1996 Charles Brown NY Rosen Surgery orin none
55 Boston
Post Road,
Nov 5, Chester, Cholecyst
4876 145 1995 Hal Kane CN Beth Little ectomy Demicillin none
Blind Brook Gallstone
May 10, Mamaronec s
5123 145 1995 Paul Kosher k, NY Beth Little Removal none none
Eye
Cornea
Replacem
Apr 5, Hilton Road ent Eye
1994 Dec Larchmont, Charles cataract Tetracyclin
6845 243 15, 1984 Ann Hood NY Field removal e Fever
First Normal Form
Emp-ID Emp-Hrly-Rate
C
Third Normal Form
A relation is in 3NF iff it is in 2NF and every non key attribute is non
transitively dependent on the primary key.
A relation r(R) is in Third Normal Form (3NF) if and only if the following
conditions are satisfied simultaneously:
r(R) is already in 2NF.
No nonprime attribute is transitively dependent on the key.
A* A*
B B
Convert to
C
B*
* indicates the key or the C
determinant of the relation.
Third Normal Form
Using the general procedure, we will transform our 2NF
relation example to a 3NF relation.
The relation EMPLOYEE is not in 3NF because there is a
transitive dependency of a nonprime attribute on the primary
key of the relation.
In this case, the nonprime attribute Emp-Hrly-Rate is
transitively dependent on the key through the functional
dependency Emp-Dpt → Emp-Hrly-Rate.
To transform this relation into a 3NF relation:
it is necessary to remove any transitive dependency of a
nonprime attribute on the key.
It is necessary to create two new relations.
Third Normal Form
The scheme of
the first relation that we have
named EMPLOYEE is:
The scheme of
the second relation that we have
named CHARGES is:
R = { A, B, C, D }
F = { A → B, C → D }.
Key is {AC}.
introduce
Decomposition: { (A, B), (C, D), (A, C) } virtually
Consider it a two step decomposition:
1. Decompose R into R1 = (A, B), R2 = (A, C, D)
2. Decompose R2 into R3 = (C, D), R4 = (A, C)
This is a lossless join decomposition.
The redundancy has been eliminated, but the information about which
companies make which products and which of these products they
supply to which agents has been lost. The natural join of these two
projections will result in some spurious tuples (additional tuples which were
not present in the original relation).
Fifth Normal Form
This table can be decomposed into its three projections without loss of
information as demonstrated below .
If we take the natural join of these relations then we get the original
relation back. So this is the correct decomposition.