0% found this document useful (0 votes)
5 views49 pages

Lecture 8 Normalization

This document outlines a lecture on normalization in database design, presented by Assoc. Prof. Nguyen Thi Thuy Loan at International University, VNU-HCMC. It covers the objectives of normalization, the importance of minimizing data redundancy, and avoiding modification anomalies, while also detailing the database design and implementation process. Key concepts include functional dependencies, normal forms, and the significance of a well-structured relational schema.

Uploaded by

fsdhunggg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views49 pages

Lecture 8 Normalization

This document outlines a lecture on normalization in database design, presented by Assoc. Prof. Nguyen Thi Thuy Loan at International University, VNU-HCMC. It covers the objectives of normalization, the importance of minimizing data redundancy, and avoiding modification anomalies, while also detailing the database design and implementation process. Key concepts include functional dependencies, normal forms, and the significance of a well-structured relational schema.

Uploaded by

fsdhunggg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

9/26/24

International University, VNU-HCMC

School of Computer Science and Engineering

Lecture 8: Normalization

Instructor: Nguyen Thi Thuy Loan


[email protected], [email protected]
https://nttloan.wordpress.com/

International University, VNU-HCMC

Acknowledgement

• The following slides are references from


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

Northeastern University.
• Other slides have been created based on the
Database system concepts book, 7th Edition.

1
9/26/24

International University, VNU-HCMC

Announcements

• Reminder: Final project report


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

- Due May 3, no late days

International University, VNU-HCMC

Keys and FDs: review


• Functional Dependencies
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• Keys/Super keys
• Attribute closure
• Minimal cover

2
9/26/24

International University, VNU-HCMC

Today’s topics
• Normalization Objective
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• Normal forms

International University, VNU-HCMC

Database Design and Implementation Process


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

3
9/26/24

International University, VNU-HCMC

Database Design and Implementation Process


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

International University, VNU-HCMC

The Database Initial Study

• Overall purpose:
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

– Analyze company situation


– Define problems and constraints
– Define objectives
– Define scope and boundaries
• Interactive and iterative processes required to
complete first phase of DBLC (Database Life Cycle)
successfully.

4
9/26/24

International University, VNU-HCMC

The Database Initial Study (cont’d)

• Analyze the company situation


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

– General conditions in which company operates,


its organizational structure, and its mission.
– Discover what company’s operational
components are, how they function, and how
they interact.

International University, VNU-HCMC

Data Analysis and Requirements

• Discover data element characteristics


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

– Obtains characteristics from different sources


• Requires thorough understanding of the company’s
data types and their extent and uses.
• Take into account business rules
– Derived from description of operations

10

5
9/26/24

International University, VNU-HCMC

Database Design
• Necessary to concentrate on data characteristics
required to build database model.
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• Two views of data within system:


– Business view
•Data as information source
– Designer’s view
•Data structure, access, and activities required to
transform data into information

11

International University, VNU-HCMC

DBMS Software Selection


• Critical to information system’s smooth operation
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• Common factors affecting purchasing decisions:


– Cost
– DBMS features and tools
– Underlying model
– Portability
– DBMS hardware requirements

12

6
9/26/24

International University, VNU-HCMC

Map the conceptual model to the logical model

• Map the conceptual model to the chosen database


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

constructs
• Five mapping steps involved:
– Strong entities
– Supertype/subtype relationships
– Weak entities
– Binary relationships
– Higher degree relationships

13

International University, VNU-HCMC

Detailed Systems Design


• Designer completes design of system’s
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

processes
• Includes all necessary technical specifications
• Steps laid out for conversion from old to new
system
• Training principles and methodologies are also
planned
– Submitted for management approval

14

7
9/26/24

International University, VNU-HCMC

Implementation and Loading

• Actually implement all design specifications


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

from previous phase:


– Install the DBMS
•Virtualization: creates logical representations
of computing resources independent of
physical resources
– Create the Database
– Load or Convert the Data

15

International University, VNU-HCMC

Normalization
• Theory and process by which to evaluate and
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

improve relational database design


• Typically divide larger tables into smaller, less
redundant tables
• Spans both logical and physical database design

16

8
9/26/24

International University, VNU-HCMC

Objectives of Normalization

• Make the schema informative


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• Minimize information duplication


• Avoid modification anomalies
• Disallow spurious tuples

17

International University, VNU-HCMC

Straw Man Schema


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

18

9
9/26/24

International University, VNU-HCMC

Example Schema
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

19

International University, VNU-HCMC

Make the Schema Informative


• Design a relational schema so that it is easy to explain
its meaning.
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• Do not combine attributes from multiple entity types and


relationship types into a single relation; semantic
ambiguities will result and the relation cannot be easily
explained.

• Normalized tables, and the relationship between one


normalized table and another, mirror real- world
concepts and their interrelationships.

20

10
9/26/24

International University, VNU-HCMC

Example Schema
What is this table about?
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• Employees? Departments?

21

International University, VNU-HCMC

Minimize Information Duplication


• Avoid data redundancies
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• Avoid excessive use of NULLs (e.g. fat tables)


– Wastes space
– Can make information querying/understanding
complicated and error-prone

22

11
9/26/24

International University, VNU-HCMC

Avoid Modification Anomalies


An undesired side-effect resulting from an
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

attempt to modify a table (that has not been


sufficiently normalized)

Types of modifications:
– Insertion
– Update
– Deletion

23

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Insertion Anomaly
Difficult or impossible to insert a new row
• Add a new employee
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

– Unknown manager
– Typo in department/manager info
• Add a new department
– Requires at least one employee

24

12
9/26/24

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Update Anomaly
Updates may result in logical inconsistencies
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• Change the department name/manager

25

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Deletion Anomaly
Deletion of data representing certain facts necessitates
deletion of data representing completely different facts
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• Delete James E. Borg

26

13
9/26/24

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Disallow Spurious Tuples

Avoid relational design that matches attributes


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

across relations that are not (foreign key, primary


key) combinations because joining on such
attributes may produce invalid tuples

27

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Bad Decomposition
CAR
ID Make Color
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

1 Toyota Blue
2 Audi Blue
3 Toyota Red

CAR1 CAR2

ID Color Make Color


1 Blue Toyota Blue
2 Blue Audi Blue
3 Red Toyota Red

Association between Color and Make is lost.

28

14
9/26/24

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Bad decomposition
ID Make Color
1 Toyota Blue
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

1 Audi Blue
2 Toyota Blue
2 Audi Blue
3 Toyota Red

CAR1 CAR2

ID Color Make Color


1 Blue Toyota Blue
2 Blue Audi Blue
3 Red Toyota Red

Join returns more rows than the original relation


29

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Additive Decomposition
CAR ID Make Color
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

1 Toyota Blue
2 Audi Blue
3 Toyota Red

JOIN
ID Make Color
1 Toyota Blue
1 Audi Blue
2 Toyota Blue
2 Audi Blue
3 Toyota Red

30

15
9/26/24

International University, VNU-HCMC

Lossless join decomposition


• Decompose relation R into relations S and T
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• Attrs(R) = attrs (S) ∪ attrs(T)


– S = 𝜋!""#$ % (𝑅)
– T = 𝜋!""#$ & (𝑅)
• The decomposition is a lossless join decomposition
if, given known constraints such as FD’s, we can
guarantee that R = S ⋈ 𝑇

31

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Normalization Process
• Submit a relational schema to a set of tests (related to
FDs) to certify whether it satisfies a normal form
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• If it does not pass, decompose into smaller relations


that satisfy the normal form
– Must be non-additive (i.e. no spurious tuples!)
• The normal form of a relation refers to the highest
normal form that it meets
• The normal form of a database refers to the lowest
normal form that any relation meets
– Practically, a database is normalized if all relations ≥
3NF
32

16
9/26/24

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

1NF – First Normal Form


• The domain of an attribute must include only atomic
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

values and that the value of any attribute in a tuple


must be a single value from the domain of that
attribute
• No relations within relations or relations as
attribute values within tuples
• Considered part of the formal definition of a relation
in the basic (flat) relational model
– In other words, an implicit constraint

33

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

1NF – First Normal Form

• A relation is in first normal form if every attribute in


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

every row can contain only one single (atomic) value.

34

17
9/26/24

International University, VNU-HCMC

Examples: 1NF?
Student(FirstName, LastName, Knowledge)
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

FirstName LastName Knowledge


Thomas Mueller Java, C++, PHP
Ursula Meier PHP, Java
Igor Mueller C++, Java

The attribute Knowledge can contain multiple values and


therefore the relation is not in the first normal form.

But the attributes FirstName and LastName are atomic


attributes that can contain only one value.

35

International University, VNU-HCMC

Examples: 1NF violation


FirstName LastName Knowledge
Thomas Mueller Java, C++, PHP
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

Ursula Meier PHP, Java


Igor Mueller C++, Java
FirstName LastName Knowledge
Thomas Mueller Java
Thomas Mueller C++
Thomas Mueller PHP
Ursula Meier PHP
Ursula Meier Java
Igor Mueller C++
Igor Mueller Java 36

18
9/26/24

International University, VNU-HCMC

Examples: 1NF?
Assume, a video library maintains a database of movies rented
out. Without any normalization, all information is stored in one
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

table as shown below.

Full Physical Movies rented Salutation


names address
Janet First Pirates of the Caribbean; Ms.
Jones Street Clash of the Titans
Plot No 4
Robert 3rd street Forgetting Sarah Marshal; Mr.
Phil 34 Daddy’s Little Girls
Robert 5th Clash of the Titans Mr.
Phil Avenue
37

International University, VNU-HCMC

Examples: 1NF
Full names Physical Movies rented Salutation
address
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

Janet Jones First Street Plot Pirates of the Ms.


No 4 Caribbean
Janet Jones First street Plot Clash of the Ms.
No 4 Titans
Robert Phil 3rd street 34 Forgetting Sarah Mr.
Marshal
Robert Phil 3rd Street 34 Daddy’s Little Mr.
Girls
Robert Phil 5th Avenue Clash of the Mr
Titans

38

19
9/26/24

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Examples 1NF?
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

39

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

1NF Violation
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

40

20
9/26/24

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Important FD Definitions

Trivial FD X ® Y, Y Í X
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

Non-prime An attribute that does not occur in any


attribute key (opposite: Prime)

Full FD X® Y, "A Î X((X – {A}) ↛ Y)

Transitive FD X ® Y and Y ® Z ∴ X ® Z

41

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

2NF – Second Normal Form


• 1NF AND every non-prime attribute is fully FD on
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

the primary key.


– Must test all FDs whose LHS is part of the PK

• To fix, decompose into relations in which non-


prime attributes are associated only with the
part of the primary key on which they are fully
functionally dependent.

42

21
9/26/24

International University, VNU-HCMC

2NF – Second Normal Form

• A relation is in second normal form if it is in 1NF


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

and every non key attribute is fully functionally


dependent on the primary key.

43

International University, VNU-HCMC

Example 2NF?
StudentID Course StudentAddress
1 COMP570 555 Huntington
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

1 COMP285 555 Huntington


2 COMP570 610 Huntington
3 COMP355 Louis Prang
3 COMP553 Louis Prang
{StudentID, Course}®{StudentAddress}
{StudentID}® {StudentAddress}
StudentID Course
1 COMP570
StudentID StudentAddress 1 COMP285
1 555 Huntington 2 COMP570
2 610 Huntington 3 COMP355
3 Louis Prang 3 COMP553

44

22
9/26/24

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Examples 2NF?
• Students(IDSt, StudentName, IDProf, ProfessorName,
Grade)
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

F={IDProf ® ProfessorName; IDSt ® StudentName; IDSt,


IDProf ®Grade}
The attributes IDSt and IDProf are the identification keys.

Students
IDSt StudentName IDProf ProfessorName Grade
1 Mueller 3 Schmid 5
2 Meier 2 Borner 4
3 Tobler 1 Bernasconi 3
45

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Examples 2NF?
• All attributes a single valued (1NF).
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

Students Professors
IDSt StudentName IDProf ProfessorName
1 Mueller 1 Bernasconi
2 Meier 2 Borner
3 Tobler 3 Schmid
Grade
IDSt IDProf Grade
1 3 5
2 2 4
3 1 6
46

23
9/26/24

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Examples 2NF?
• Suppose a school wants to store the data of teachers
and the subjects they teach. They create a table that
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

looks like this: Since a teacher can teach more than one
subjects, the table can have multiple rows for a same
teacher.
Teacher Teacher_id Subject Teacher_age
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
47

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Examples 2NF?
Teacher(Teacher_id, Subject, Teacher_age)
F={Teacher_id, Subject ® Teacher_age; Teacher_id ®
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

Teacher_age}
• Only key is: {Teacher_id, Subject}

Teacher
Teacher_id Subject Teacher_age
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
48

24
9/26/24

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Examples 2NF?
• To make the table complies with 2NF we can break it in
two tables like this.
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

Teacher Teacher_Subject
Teacher_id Teacher_age Teacher_id Subject
111 38 111 Maths
222 38 111 Physics
333 40 222 Biology
333 Physics
333 Chemistry

49

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

2NF Can Suffer Update Anomalies


Year Winner Nationality
1994 Miguel Indurain Spain
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

1995 Miguel Indurain Spain


1996 Bjarne Riis Denmark
1997 Jan Ullrich Germany

• Relation is in 2NF?
– Trivially true (why?)
• List all non-trivial FDs for this relation state
{Year}®{Winner, Nationality}
{Winner} ®{Nationality}
• What if we insert (1998, Jan Ullrich, USA)?

50

25
9/26/24

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Exercise 2NF?
Patients(StaffNo, ApptDate, ApptTime, DentistName,
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

PatientNo, PatientName, SurgeryNo)

F={StaffNo, ApptDate, ApptTime ® PatientNo,


PatientName; StaffNo ®DentistName; PatientNo ®
PatientName,SurgeryNo; StaffNo, ApptDate ®
SugeryNo; ApptDate, ApptTime, PatientNo ®StaffNo,
DentistName}

51

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Exercise 2NF?
R(ABCDEGH)
F = {ABC ® EG, A ® D, E ® GH, AB ® H, BCE ® AD}
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

SA: BC
IA: AE
Keys: ABC, BCE
R1(AD) F1 = {A ® D}
R2(ABCEGH)
F2 = {ABC ® EG, E ® GH, AB ® H, BCE ® A}
Keys: ABC, BCE

52

26
9/26/24

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Exercise 2NF?
R2(ABCEGH)
F2 = {ABC ® EG, E ® GH, AB ® H, BCE ® A}
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

Keys: ABC, BCE

R21(EGH) F21 = {E ® GH}


R22(ABCE) F22 = {ABC ® E}

53

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

3NF – Third Normal Form


• 2NF AND every non-prime attribute is non-
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

transitively dependent on every key.


“A non-key field must provide a fact about the key, the
whole key, and nothing but the key. So help me Codd.”

• To fix, decompose into multiple relations, whereby


the intermediate non-key attribute(s) functionally
determine other non-prime attributes.

54

27
9/26/24

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

3NF – Third Normal Form


A table design is said to be in 3NF if both the following
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

conditions hold:
• Table must be in 2NF
• Transitive functional dependency of non-prime
attribute on any super key should be removed.
An attribute that is not part of any candidate key is
known as non-prime attribute.

55

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

3NF – Third Normal Form


In other words, 3NF can be explained like this: A table
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

is in 3NF if it is in 2NF and for each functional


dependency X ® Y at least one of the following
conditions hold:
• X is a super key of table
• Y is a prime attribute of table
An attribute that is a part of one of the candidate keys
is known as prime attribute.

56

28
9/26/24

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

3NF Example
F= {Year ® Winner, Nationality; Winner ® Nationality}
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

Year Winner Nationality


1994 Miguel Indurain Spain
1995 Miguel Indurain Spain
1996 Bjarne Riis Denmark
1997 Jan Ullrich Germany

Year Winner Winner Nationality


1994 Miguel Indurain Miguel Indurain Spain
1995 Miguel Indurain Bjarne Riis Denmark
1996 Bjarne Riis Jan Ullrich Germany
1997 Jan Ullrich
57

International University, VNU-HCMC

Examples: 3NF
Suppose a company wants to store the complete
address of each employee, they create a table named
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

employee_details that looks like this:


Employees(Emp_id, Emp_Name, Emp_zip, Emp_state,
Emp_city, Emp_district)

F={Emp_zip ® Emp_state, Emp_city, Emp_district;


Emp_id ® Emp_zip; Emp_id ® Emp_Name}

Only key is {Emp_id}

58

29
9/26/24

International University, VNU-HCMC

Examples: 3NF
A bank uses the following relation:
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

Vendors(ID, Name, Account_No, Bank_Code_No,


Bank)

F={ID ® Name, Account_No, Bank_Code_No;


Bank_Code_No ® Bank}

Only key is {ID}

60

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Exercises: 3NF
T
A B C D E
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

Consider the schema for relation T, as well as all FDs.


What is the normal form of T? If T violates 3NF, provide a
3NF decomposition that satisfies the FDs (including the
primary key) and does not produce spurious tuples.
Show and explain all steps of your analysis and
decomposition (if applicable).

62

30
9/26/24

International University, VNU-HCMC

Boyce-Codd Normal Form (BCNF)

• We say a relation R is in BCNF if whenever


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

X ® Y is a nontrivial FD that holds in R, X is a


superkey
o Remember: nontrivial means Y is not
contained in X
o Remember, a super key is any superset of a key
(not necessarily a proper superset)

68

International University, VNU-HCMC

Boyce-Codd Normal Form (BCNF)

• It is an advanced version of 3NF that’s why it is


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

also referred to as 3NF. BCNF is stricter than 3NF.


• A table complies with BCNF:
o It is in 3NF and
o for every functional dependency X ® Y, X should
be the super key of the table.

69

31
9/26/24

International University, VNU-HCMC

Examples: BCNF

Drinkers(name, addr, beersLiked, manf, favBeer)


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

FD’s: F= {name ® addr, favBeer;


beersLiked ® manf}
Only key is {name, beersLiked}
In each FD, the left side is not a superkey
Any one of these FD’s shows Drinkers
is not in BCNF

70

International University, VNU-HCMC

Another Example

Beers(name, manf, manfAddr)


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

FD’s: F={name ® manf, manf ® manfAddr}


Only key is {name}
Name ® manf does not violate BCNF, but manf
® manfAddr does

71

32
9/26/24

International University, VNU-HCMC

Decomposition into BCNF


Given: relation R with FD’s F
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

Look among the given FD’s for a BCNF violation


X® Y
If any FD following from F violates BCNF,
then there will surely be an FD in F itself
that violates BCNF
Compute X+
Not all attributes, or else X is a superkey

72

International University, VNU-HCMC

Decompose R Using X ® Y
Replace R by relations with schemas:
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

1. R1 = X +
2. R2 = R – (X + – X)
Project given FD’s F onto the two new relations
R1

R-X + X X +-X

R2
R 73

33
9/26/24

International University, VNU-HCMC

Examples: BCNF?
• Let’s take R = {A,B,C,D,E,G} and
F = {BC ® D, CD ® E}
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• Key is {A,B,C,G}
• For example we use FD: BC ® D to decompose R
into two relations R1 and R2
• X = BC and X+ = BCDE
• R1 = BCDE, R2= ABCG
• It means R1 intersect R2 = X

74

International University, VNU-HCMC

Examples: BCNF Decomposition


Drinkers(name, addr, beersLiked, manf, favBeer)
F= {name ® addr, name ® favBeers,
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

beersLiked ® manf}
• Pick BCNF violation name ® addr
• Close the left side:
{name}+ = {name, addr, favBeer}
• Decomposed relations:
1. Drinkers1(name, addr, favBeer)
2. Drinkers2(name, beersLiked, manf)

75

34
9/26/24

International University, VNU-HCMC

Examples: BCNF Decomposition

• We are not done; we need to check Drinkers1


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

and Drinkers2 for BCNF


• Projecting FD’s is easy here
• For Drinkers1(name, addr, favBeer), relevant
FD’s F = {name ® addr, name ® favBeer}
Thus, {name} is the only key and Drinkers1 is in
BCNF

76

International University, VNU-HCMC

Examples: BCNF Decomposition

For Drinkers2(name, beersLiked, manf),


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

the only FD is beersLiked ® manf, and the only key


is {name, beersLiked}
Violation of BCNF
{beersLiked}+ = {beersLiked, manf},
So we decompose Drinkers2 into:
1. Drinkers3(beersLiked, manf)
2. Drinkers4(name, beersLiked)

77

35
9/26/24

International University, VNU-HCMC

Examples: BCNF Decomposition


The resulting decomposition of Drinkers:
1. Drinkers1(name, addr, favBeer)
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

2. Drinkers3(beersLiked, manf)
3. Drinkers4(name, beersLiked)
Notice: Drinkers1 tells us about drinkers,
Drinkers3 tells us about beers, and
Drinkers4 tells us the relationship between drinkers
and the beers they like
Compare with running example:
1. Drinkers(name, addr, phone)
2. Beers(name, manf)
3. Likes(drinker,beer)
78

International University, VNU-HCMC

Exercises: BCNF
• Suppose there is a company wherein employees work
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

in more than one department. They store the data like


this:
• Employees (Emp_id, Emp_Nationality, Emp_Dept,
Dept_type, Dept_no_of_emp)
• F = {Emp_id ® Emp_Nationality; Emp_Dept ®
Dept_type, Dept_no_of_emp}
• Only key is {Emp_id, Emp_dept}

79

36
9/26/24

International University, VNU-HCMC

BCNF– Motivation
There is one structure of FD’s that causes
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

trouble when we decompose


AB ® C and C ® B
Example:
A = street address; B = city; C = post code
There are two keys, {A,B } and {A,C }
C ® B is a BCNF violation, so we must
decompose into AC, BC

81

International University, VNU-HCMC

We Cannot Enforce FD’s


The problem is that if we use AC and BC as our
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

database schema, we cannot enforce the FD


AB ® C by checking FD’s in these decomposed
relations
Example with A = street, B = city, and
C = post code on the next slide

82

37
9/26/24

International University, VNU-HCMC

An Unenforceable FD

street post city post


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

Campusvej 5230 Odense 5230


Vestergade 5000 Odense 5000

Join tuples with equal post codes


street city post
Campusvej Odense 5230
Vestergade Odense 5000

No FD’s were violated in the decomposed relations and


FD street, city ® post holds for the database as a whole

83

International University, VNU-HCMC

An Unenforceable FD

street post city post


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

Hjallesevej 5230 Odense 5230


Hjallesevej 5000 Odense 5000

Join tuples with equal post codes


street city post
Hjallesevej Odense 5230
Hjallesevej Odense 5000

Although no FD’s were violated in the decomposed


relations, FD street, city ® post is violated by the
database as a whole.
84

38
9/26/24

International University, VNU-HCMC

Another Unenforceable FD
Departures(time, track, train)
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

F={time, track ® train; train ® track}


Two keys, {time,track} and {time,train}
train ® track is a BCNF violation, so we must
decompose into
Departures1(time, train)
Departures2(track,train)

85

International University, VNU-HCMC

Another Unenforceable FD

time train tracktrain


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

19:08 ICL54 4 ICL54


19:16 IC852 3 IC852

Join tuples with equal train code


time track train
19:08 4 ICL54
19:16 3 IC852

No FD’s were violated in the decomposed relations,


FD time, track ® train holds for the database as a whole

86

39
9/26/24

International University, VNU-HCMC

Another Unenforceable FD

time train Tracktrain Train


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

19:08 ICL54 4 ICL54


19:08 IC 42 4 IC 42

Join tuples with equal train code


time track train
19:08 4 ICL54
19:08 4 IC 42

Although no FD’s were violated in the decomposed


relations, FD time, track ® train is violated by the
database as a whole.
87

International University, VNU-HCMC

Examples: Decomposition into BCNF

1. Let’s take R(ABCDE), and FD’s


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

F = {A ® BC, C ® DE}

2. Let’s take R(ABCD) and FD’s


F = {AB ® C, B ® D; C ® A}

88

40
9/26/24

International University, VNU-HCMC

Exercise
• The table shown in Figure below is susceptible to update
anomalies. Provide examples of insertion, deletion, and
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

modification anomalies. Decomposition step by step to


achieve BCNF if it has not yet achieved BCNF.
Staff Dentist patie Patient Appointment Surgery
No Name ntNo Name Date time No
S1011 Tony Smith P100 Gillian 12-Aug-03 10.00 S10
White
S1011 Tony smith P105 Jill Bell 13-Aug-03 12.00 S15
S1024 Helen P108 Ian Mackay 12-Sep-03 10.00 S10
Pearson
S1024 Helen P108 Ian Mackay 14-Sep-03 10.00 S10
Pearson
S1032 Robin Plevin P105 Jill Bell 14-Oct-03 16.30 S15
S1032 Robin Plevin P110 John Walker 15-Oct-03 18.00 S13
89

International University, VNU-HCMC

Exercise
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

90

41
9/26/24

International University, VNU-HCMC

Multivalued dependencies
• A multivalued dependency (MVD) has the from
X ↠ Y, where X and Y are sets of attributes in a
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

relation R.
• X ↠ Y means that X Y Z
whenever two rows in R a b1 c1
agree on all the attributes a b2 c2
of X, then we can swap a b2 c1
their Y components and a b1 c2
get two rows that are also … … …
in R

91

International University, VNU-HCMC

MVD examples
User (uid, gid, place)
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• uid ↠ gid
• uid ↠ place
– Intuition: given uid, gid, and place are
“independent”
• uid, gid ↠ place
– Trivial: LHS ∪ RHS = all attributes of R
• uid, gid ↠ uid
– Trivial: LHS ⊇ RHS

92

42
9/26/24

International University, VNU-HCMC

Complete MVD + FD rules


FD reflexivity, augmentation, and transitivity
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

•MVD complementation:
–If 𝑋↠𝑌,then 𝑋↠𝑎𝑡𝑡𝑟𝑠 𝑅 −𝑋−𝑌
•MVD augmentation:
–If 𝑋 ↠ 𝑌 and 𝑉 ⊆ 𝑊, then 𝑋𝑊 ↠ 𝑌𝑉
•MVD transitivity:
–If 𝑋 ↠ 𝑌 and 𝑌 ↠ 𝑍, then 𝑋 ↠ 𝑍 − 𝑌

93

International University, VNU-HCMC

Complete MVD + FD rules


•Replication (FD is MVD):
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

–If 𝑋 → 𝑌, then 𝑋 ↠ 𝑌
•Coalescence:
–If 𝑋 ↠ 𝑌 and 𝑍 ⊆ 𝑌 and there is some 𝑊 disjoint
from 𝑌 such that 𝑊 → 𝑍, then 𝑋 → 𝑍

94

43
9/26/24

International University, VNU-HCMC

An elegant solution: chase


• Given a set of FD’s and MVD’s 𝒟, does another
dependency 𝑑 (FD or MVD) follow from 𝒟?
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

Procedure
• Start with the “if-part” of 𝑑, and treat them as
“seed” tuples in a relation
• Apply the given dependencies in 𝒟 repeatedly
– If we apply an FD, we infer equality of two
symbols
– If we apply an MVD, we infer more tuples

95

International University, VNU-HCMC

An elegant solution: chase


• If we infer the “then-part” of 𝑑, we have a proof
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• Otherwise, if nothing more can be inferred, we have


a counter example

96

44
9/26/24

International University, VNU-HCMC

Proof by chase
• In R(A, B, C, D), does A ↠ B and B ↠ C imply that
A ↠ C?
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

97

International University, VNU-HCMC

Another proof by chase


• In R(A, B, C, D), does A → B and B → C imply that
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

A → C?

A→B b1 = b2
B→C c1 = c2
• In general, with both MVD’s and FD’s, chase can
generate both new tuples and new equlities

98

45
9/26/24

International University, VNU-HCMC

Counterexample by chase
• In R(A, B, C, D), does A ↠ BC and CD → B imply that
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

A → B?

99

International University, VNU-HCMC

4NF
A relation R is in Fourth Normal Form (4NF) if
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• For every non-trivial MVD X ↠ Y in R, X is a superkey


• That is, all FD’s and MVD’s follow from “key → other
attributes” (i.e., no MVD’s and no FD’s besides key
functional dependencies)
• 4NF is stronger than BCNF, because every FD is also
an MVD

100

46
9/26/24

International University, VNU-HCMC

4NF decomposition algorithm


• Find a 4NF violation: A non-trivial MVD X ↠ Y in R
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

where X is not a superkey


• Decompose R into R1 and R2, where
– R1 has attributes X Y
– R2 has attributes X Z (where Z contains R
attributes not in X or Y)
• Repeat until all relations are in 4NF
• Almost identical to BCNF decomposition algorithm
• Any decomposition on a 4NF violation is lossless

101

International University, VNU-HCMC

4NF decomposition example


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

102

47
9/26/24

International University, VNU-HCMC

Summary
• Philosophy behind BCNF, 4NF: Data should depend
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

on the key, the whole key, and nothing but the key!
– You could have multiple keys though
• Other normal forms
– 3NF: More relaxed than BCNF; will not remove
redundancy if doing so makes FDs harder to
enforce
– 2NF: Slightly more relaxed than 3NF
– 1NF: All column values must be atomic

103

International University, VNU-HCMC CS3200 – Database Design· · · Spring 2018· · · Derbinsky

Summary
• Normalization is the theory and process by
which to evaluate and improve relational
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

database design
– Makes the schema informative
– Minimizes information duplication
– Avoids modification anomalies
– Disallows spurious tuples

• Make sure all your relations are at least 3NF!


– Higher normal forms exist
– We may reduce during physical design

104

48
9/26/24

International University, VNU-HCMC


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

Thank you for your attention!

105

49

You might also like