Lecture 8 Normalization
Lecture 8 Normalization
Lecture 8: Normalization
Acknowledgement
Northeastern University.
• Other slides have been created based on the
Database system concepts book, 7th Edition.
1
9/26/24
Announcements
• Keys/Super keys
• Attribute closure
• Minimal cover
2
9/26/24
Today’s topics
• Normalization Objective
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
• Normal forms
3
9/26/24
• Overall purpose:
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
4
9/26/24
10
5
9/26/24
Database Design
• Necessary to concentrate on data characteristics
required to build database model.
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
11
12
6
9/26/24
constructs
• Five mapping steps involved:
– Strong entities
– Supertype/subtype relationships
– Weak entities
– Binary relationships
– Higher degree relationships
13
processes
• Includes all necessary technical specifications
• Steps laid out for conversion from old to new
system
• Training principles and methodologies are also
planned
– Submitted for management approval
14
7
9/26/24
15
Normalization
• Theory and process by which to evaluate and
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
16
8
9/26/24
Objectives of Normalization
17
18
9
9/26/24
Example Schema
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
19
20
10
9/26/24
Example Schema
What is this table about?
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
• Employees? Departments?
21
22
11
9/26/24
Types of modifications:
– Insertion
– Update
– Deletion
23
Insertion Anomaly
Difficult or impossible to insert a new row
• Add a new employee
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
– Unknown manager
– Typo in department/manager info
• Add a new department
– Requires at least one employee
24
12
9/26/24
Update Anomaly
Updates may result in logical inconsistencies
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
25
Deletion Anomaly
Deletion of data representing certain facts necessitates
deletion of data representing completely different facts
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
26
13
9/26/24
27
Bad Decomposition
CAR
ID Make Color
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
1 Toyota Blue
2 Audi Blue
3 Toyota Red
CAR1 CAR2
28
14
9/26/24
Bad decomposition
ID Make Color
1 Toyota Blue
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
1 Audi Blue
2 Toyota Blue
2 Audi Blue
3 Toyota Red
CAR1 CAR2
Additive Decomposition
CAR ID Make Color
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
1 Toyota Blue
2 Audi Blue
3 Toyota Red
JOIN
ID Make Color
1 Toyota Blue
1 Audi Blue
2 Toyota Blue
2 Audi Blue
3 Toyota Red
30
15
9/26/24
31
Normalization Process
• Submit a relational schema to a set of tests (related to
FDs) to certify whether it satisfies a normal form
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
16
9/26/24
33
34
17
9/26/24
Examples: 1NF?
Student(FirstName, LastName, Knowledge)
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
35
18
9/26/24
Examples: 1NF?
Assume, a video library maintains a database of movies rented
out. Without any normalization, all information is stored in one
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
Examples: 1NF
Full names Physical Movies rented Salutation
address
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
38
19
9/26/24
Examples 1NF?
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
39
1NF Violation
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
40
20
9/26/24
Important FD Definitions
Trivial FD X ® Y, Y Í X
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
Transitive FD X ® Y and Y ® Z ∴ X ® Z
41
42
21
9/26/24
43
Example 2NF?
StudentID Course StudentAddress
1 COMP570 555 Huntington
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
44
22
9/26/24
Examples 2NF?
• Students(IDSt, StudentName, IDProf, ProfessorName,
Grade)
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
Students
IDSt StudentName IDProf ProfessorName Grade
1 Mueller 3 Schmid 5
2 Meier 2 Borner 4
3 Tobler 1 Bernasconi 3
45
Examples 2NF?
• All attributes a single valued (1NF).
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
Students Professors
IDSt StudentName IDProf ProfessorName
1 Mueller 1 Bernasconi
2 Meier 2 Borner
3 Tobler 3 Schmid
Grade
IDSt IDProf Grade
1 3 5
2 2 4
3 1 6
46
23
9/26/24
Examples 2NF?
• Suppose a school wants to store the data of teachers
and the subjects they teach. They create a table that
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
looks like this: Since a teacher can teach more than one
subjects, the table can have multiple rows for a same
teacher.
Teacher Teacher_id Subject Teacher_age
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
47
Examples 2NF?
Teacher(Teacher_id, Subject, Teacher_age)
F={Teacher_id, Subject ® Teacher_age; Teacher_id ®
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
Teacher_age}
• Only key is: {Teacher_id, Subject}
Teacher
Teacher_id Subject Teacher_age
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
48
24
9/26/24
Examples 2NF?
• To make the table complies with 2NF we can break it in
two tables like this.
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
Teacher Teacher_Subject
Teacher_id Teacher_age Teacher_id Subject
111 38 111 Maths
222 38 111 Physics
333 40 222 Biology
333 Physics
333 Chemistry
49
• Relation is in 2NF?
– Trivially true (why?)
• List all non-trivial FDs for this relation state
{Year}®{Winner, Nationality}
{Winner} ®{Nationality}
• What if we insert (1998, Jan Ullrich, USA)?
50
25
9/26/24
Exercise 2NF?
Patients(StaffNo, ApptDate, ApptTime, DentistName,
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
51
Exercise 2NF?
R(ABCDEGH)
F = {ABC ® EG, A ® D, E ® GH, AB ® H, BCE ® AD}
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
SA: BC
IA: AE
Keys: ABC, BCE
R1(AD) F1 = {A ® D}
R2(ABCEGH)
F2 = {ABC ® EG, E ® GH, AB ® H, BCE ® A}
Keys: ABC, BCE
52
26
9/26/24
Exercise 2NF?
R2(ABCEGH)
F2 = {ABC ® EG, E ® GH, AB ® H, BCE ® A}
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
53
54
27
9/26/24
conditions hold:
• Table must be in 2NF
• Transitive functional dependency of non-prime
attribute on any super key should be removed.
An attribute that is not part of any candidate key is
known as non-prime attribute.
55
56
28
9/26/24
3NF Example
F= {Year ® Winner, Nationality; Winner ® Nationality}
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
Examples: 3NF
Suppose a company wants to store the complete
address of each employee, they create a table named
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
58
29
9/26/24
Examples: 3NF
A bank uses the following relation:
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
60
Exercises: 3NF
T
A B C D E
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
62
30
9/26/24
68
69
31
9/26/24
Examples: BCNF
70
Another Example
71
32
9/26/24
72
Decompose R Using X ® Y
Replace R by relations with schemas:
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
1. R1 = X +
2. R2 = R – (X + – X)
Project given FD’s F onto the two new relations
R1
R-X + X X +-X
R2
R 73
33
9/26/24
Examples: BCNF?
• Let’s take R = {A,B,C,D,E,G} and
F = {BC ® D, CD ® E}
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
• Key is {A,B,C,G}
• For example we use FD: BC ® D to decompose R
into two relations R1 and R2
• X = BC and X+ = BCDE
• R1 = BCDE, R2= ABCG
• It means R1 intersect R2 = X
74
beersLiked ® manf}
• Pick BCNF violation name ® addr
• Close the left side:
{name}+ = {name, addr, favBeer}
• Decomposed relations:
1. Drinkers1(name, addr, favBeer)
2. Drinkers2(name, beersLiked, manf)
75
34
9/26/24
76
77
35
9/26/24
2. Drinkers3(beersLiked, manf)
3. Drinkers4(name, beersLiked)
Notice: Drinkers1 tells us about drinkers,
Drinkers3 tells us about beers, and
Drinkers4 tells us the relationship between drinkers
and the beers they like
Compare with running example:
1. Drinkers(name, addr, phone)
2. Beers(name, manf)
3. Likes(drinker,beer)
78
Exercises: BCNF
• Suppose there is a company wherein employees work
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
79
36
9/26/24
BCNF– Motivation
There is one structure of FD’s that causes
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
81
82
37
9/26/24
An Unenforceable FD
83
An Unenforceable FD
38
9/26/24
Another Unenforceable FD
Departures(time, track, train)
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
85
Another Unenforceable FD
86
39
9/26/24
Another Unenforceable FD
F = {A ® BC, C ® DE}
88
40
9/26/24
Exercise
• The table shown in Figure below is susceptible to update
anomalies. Provide examples of insertion, deletion, and
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
Exercise
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
90
41
9/26/24
Multivalued dependencies
• A multivalued dependency (MVD) has the from
X ↠ Y, where X and Y are sets of attributes in a
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
relation R.
• X ↠ Y means that X Y Z
whenever two rows in R a b1 c1
agree on all the attributes a b2 c2
of X, then we can swap a b2 c1
their Y components and a b1 c2
get two rows that are also … … …
in R
91
MVD examples
User (uid, gid, place)
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
• uid ↠ gid
• uid ↠ place
– Intuition: given uid, gid, and place are
“independent”
• uid, gid ↠ place
– Trivial: LHS ∪ RHS = all attributes of R
• uid, gid ↠ uid
– Trivial: LHS ⊇ RHS
92
42
9/26/24
•MVD complementation:
–If 𝑋↠𝑌,then 𝑋↠𝑎𝑡𝑡𝑟𝑠 𝑅 −𝑋−𝑌
•MVD augmentation:
–If 𝑋 ↠ 𝑌 and 𝑉 ⊆ 𝑊, then 𝑋𝑊 ↠ 𝑌𝑉
•MVD transitivity:
–If 𝑋 ↠ 𝑌 and 𝑌 ↠ 𝑍, then 𝑋 ↠ 𝑍 − 𝑌
93
–If 𝑋 → 𝑌, then 𝑋 ↠ 𝑌
•Coalescence:
–If 𝑋 ↠ 𝑌 and 𝑍 ⊆ 𝑌 and there is some 𝑊 disjoint
from 𝑌 such that 𝑊 → 𝑍, then 𝑋 → 𝑍
94
43
9/26/24
Procedure
• Start with the “if-part” of 𝑑, and treat them as
“seed” tuples in a relation
• Apply the given dependencies in 𝒟 repeatedly
– If we apply an FD, we infer equality of two
symbols
– If we apply an MVD, we infer more tuples
95
96
44
9/26/24
Proof by chase
• In R(A, B, C, D), does A ↠ B and B ↠ C imply that
A ↠ C?
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
97
A → C?
A→B b1 = b2
B→C c1 = c2
• In general, with both MVD’s and FD’s, chase can
generate both new tuples and new equlities
98
45
9/26/24
Counterexample by chase
• In R(A, B, C, D), does A ↠ BC and CD → B imply that
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
A → B?
99
4NF
A relation R is in Fourth Normal Form (4NF) if
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
100
46
9/26/24
101
102
47
9/26/24
Summary
• Philosophy behind BCNF, 4NF: Data should depend
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
on the key, the whole key, and nothing but the key!
– You could have multiple keys though
• Other normal forms
– 3NF: More relaxed than BCNF; will not remove
redundancy if doing so makes FDs harder to
enforce
– 2NF: Slightly more relaxed than 3NF
– 1NF: All column values must be atomic
103
Summary
• Normalization is the theory and process by
which to evaluate and improve relational
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
database design
– Makes the schema informative
– Minimizes information duplication
– Avoids modification anomalies
– Disallows spurious tuples
104
48
9/26/24
105
49