Data Normalization
Data Normalization
1
Normalization
Normalization
Process for evaluating and correcting table structures to
minimize data redundancies .
“Reduces data anomalies”
2
Normalization
There is a sequence to normal forms:
1NF is considered the weakest,
2NF is stronger than 1NF,
3NF is stronger than 2NF, and
BCNF is considered the strongest
Also,
any relation that is in BCNF, is in 3NF;
any relation in 3NF is in 2NF; and
any relation in 2NF is in 1NF.
3
Normalization
4
Normalization
We consider a relation in BCNF to be fully normalized.
A design that has a lower normal form than another design has
more redundancy. Uncontrolled redundancy can lead to data
integrity problems.
5
Functional Dependencies
Functional Dependencies
We say an attribute, B, has a functional dependency on
another attribute, A, if for any two records, which have
the same value for A, then the values for B in these two
records must be the same. We illustrate this as:
AB
Example: Suppose we keep track of employee email
addresses, and we only track one email address for each
employee. Suppose each employee is identified by their
unique employee number. We say there is a functional
dependency of email address on employee number:
7
Functional Dependencies
EmpNum EmpEmail
EmpNum EmpFname 3 different ways
EmpNum EmpLname you might see FDs
depicted
EmpEmail
EmpNum EmpFname
EmpLname
8
Determinant
Functional Dependency
EmpNum EmpEmail
9
• Product table (Prodcut_code, price, type)
What is the determinant?
10
Transitive dependency
Transitive dependency
Consider attributes A, B, and C, and where
A B and B C.
Functional dependencies are transitive, which
means that we also have the functional dependency
AC
We say that C is transitively dependent on A
through B.
11
Transitive dependency
EmpNum DeptNum
DeptNum DeptName
12
• Table A (Student_id, Student_name, Prof_id,
Prof_name)
– How many functional dependencies are there in
the relation (table)? What are they?
– Where is the “transitive dependency”?
16
First Normal Form
The following in not in 1NF
17
Second Normal Form
Second Normal Form
A relation is in 2NF if it is in 1NF, and every non-key attribute
is fully dependent on each candidate key. (That is, we don’t
have any partial functional dependency.)
18
Second Normal Form
Consider this InvLine table (in 1NF):
InvNum LineNum ProdNum Qty InvDate
InvNum, LineNum ProdNum, Qty
There are two
candidate keys.
InvDate is the only
non-key attribute, and
InvNum InvDate it is dependent on
InvLine is not 2NF since there is a partial InvNum
dependency of InvDate on InvNum
InvLine is
only in 1NF
19
Second Normal Form
InvLine
InvNum LineNum ProdNum Qty InvDate
The above relation has redundancies: the invoice date is
repeated on each invoice line.
We can improve the database by decomposing the relation
into two relations:
InvNum LineNum ProdNum Qty
InvNum InvDate
EmployeeDept
ename ssn bdate address dnumber dname
dnumber dname.
22
Third Normal Form
Third Normal Form
• A relation is in 3NF if the relation is in 1NF and all
determinants of non-key attributes are candidate keys
That is, for any functional dependency: X Y, where Y is
a non-key attribute (or a set of non-key attributes), X is a
candidate key.
• This definition of 3NF differs from BCNF only in the
specification of non-key attributes - 3NF
• A relation in 3NF will not have any transitive dependencies
of non-key attribute on a candidate key through another
non-key attribute.
23
Third Normal Form
Consider this Employee relation Candidate keys
are? …
24
Third Normal Form
EmpNum EmpName DeptNum DeptName
25
Patient relation example
Patient # Surgeon # Surg. date Patient Name Patient Addr Surgeon Surgery Postop drug
Drug side effects
Gallstone
s removal;
Jan 1, 15 New St. Beth Little Kidney
145 1995; June New York, Michael stones Penicillin, rash
1111 311 12, 1995 John White NY Diamond removal none- none
Eye
Charles Cataract
Apr 5, Field removal
243 1994 May 10 Main St. Patricia Thrombos Tetracyclin Fever
1234 467 10, 1995 Mary Jones Rye, NY Gold is removal e none none
Dogwood
Lane Open
Jan 8, Harrison, David Heart Cephalosp
2345 189 1996 Charles Brown NY Rosen Surgery orin none
55 Boston
Post Road,
Nov 5, Chester, Cholecyst
4876 145 1995 Hal Kane CN Beth Little ectomy Demicillin none
Blind Brook Gallstone
May 10, Mamaronec s
5123 145 1995 Paul Kosher k, NY Beth Little Removal none none
Eye
Cornea
Replacem
Apr 5, Hilton Road ent Eye
1994 Dec Larchmont, Charles cataract Tetracyclin
6845 243 15, 1984 Ann Hood NY Field removal e Fever
26
Patient #
Unnormalized Relation
Surgeon # Surg. date Patient Name Patient Addr Surgeon Surgery Postop drug
Drug side effects
Gallstone
s removal;
Jan 1, 15 New St. Beth Little Kidney
145 1995; June New York, Michael stones Penicillin, rash
1111 311 12, 1995 John White NY Diamond removal none- none
Eye
Charles Cataract
Apr 5, Field removal
243 1994 May 10 Main St. Patricia Thrombos Tetracyclin Fever
1234 467 10, 1995 Mary Jones Rye, NY Gold is removal e none none
Dogwood
Lane Open
Jan 8, Harrison, David Heart Cephalosp
2345 189 1996 Charles Brown NY Rosen Surgery orin none
55 Boston
Post Road,
Nov 5, Chester, Cholecyst
4876 145 1995 Hal Kane CN Beth Little ectomy Demicillin none
Blind Brook Gallstone
May 10, Mamaronec s
5123 145 1995 Paul Kosher k, NY Beth Little Removal none none
Eye
Cornea
Replacem
Apr 5, Hilton Road ent Eye
1994 Dec Larchmont, Charles cataract Tetracyclin
6845 243 15, 1984 Ann Hood NY Field removal e Fever
First Normal Form
• To move to First Normal Form a relation must
contain only atomic values at each row and
column.
– No repeating groups
– A column or set of columns is called a Candidate
Key when its values can uniquely identify the row
in the relation.
First Normal Form
Patient # Surgeon # Surgery DatePatient Name Patient Addr Surgeon Name Surgery Drug adminSide Effects
15 New St.
New York, Gallstone
1111 145 01-Jan-95 John White NY Beth Little s removal Penicillin rash
15 New St. Kidney
New York, Michael stones
1111 311 12-Jun-95 John White NY Diamond removal none none
Eye
10 Main St. Cataract Tetracyclin
1234 243 05-Apr-94 Mary Jones Rye, NY Charles Field removal e Fever
55 Boston
Post Road,
Chester, Cholecyst
4876 145 05-Nov-95 Hal Kane CN Beth Little ectomy Demicillin none