0% found this document useful (0 votes)
66 views

Schema Refinement and Normal Forms: Also Used

The document discusses schema refinement and normal forms in database design. It describes how normal forms like 2NF, 3NF, and BCNF help avoid certain types of redundancy and anomalies. The key points are: - Normal forms help determine if decomposing a relation will reduce redundancy based on its functional dependencies. - 2NF and 3NF aim to eliminate partial and transitive dependencies respectively, though 3NF allows some redundancy. - BCNF most strictly eliminates non-trivial functional dependencies other than key constraints to fully remove redundancy based on dependencies. - Decomposing a relation into smaller relations can help achieve normal forms to reduce redundancy, though some queries may become more complex and information may need re

Uploaded by

ramuappala
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Schema Refinement and Normal Forms: Also Used

The document discusses schema refinement and normal forms in database design. It describes how normal forms like 2NF, 3NF, and BCNF help avoid certain types of redundancy and anomalies. The key points are: - Normal forms help determine if decomposing a relation will reduce redundancy based on its functional dependencies. - 2NF and 3NF aim to eliminate partial and transitive dependencies respectively, though 3NF allows some redundancy. - BCNF most strictly eliminates non-trivial functional dependencies other than key constraints to fully remove redundancy based on dependencies. - Decomposing a relation into smaller relations can help achieve normal forms to reduce redundancy, though some queries may become more complex and information may need re

Uploaded by

ramuappala
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 42

Schema Refinement and

Normal Forms
Chapter 19
Also used:
www.cs.rpi.edu/~zaki/cs4380/Spring02/lectures/lecture10.ppt
http://www.bkent.net/Doc/simple5.htm
wikipedia

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke


Normal Forms
 Returning to the issue of schema refinement, the first
question to ask is whether any refinement is needed!
 If a relation is in a certain normal form (BCNF, 3NF
etc.), it is known that certain kinds of problems are
avoided/minimized. This can be used to help us
decide whether decomposing the relation will help.
 Role of FDs in detecting redundancy:
 Consider a relation R with 3 attributes, ABC.
• No FDs hold: There is no redundancy here.
• Given A  B: Several tuples could have the same A
value, and if so, they’ll all have the same B value!
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke
Second Normal Form
 Partial Dependency: Y->A is a partial dependency if
Y is a proper subset of a candidate key and A is not
part of any candidate key (non-prime).
 Ideally A should depend on the whole key

(a fact about the whole key)


 Is only relevant when we have a composite key.
 A Relation conforms to second normal form if it does
not contain any partial dependencies.

Key Y A

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke


2NF Example

 Completed( studentId, courseId, department, grade)


 Here studentIddepartment is a partial dependency
since we have a composite key (studentId,courseId)

 Inventory( PART, WAREHOUSE, QUANTITY,


WAREHOUSE-ADDRESS )
 WAREHOUSEWAREHOUSE-ADDRESS is a
partial dependency
 These result in redundancy and inset, delete, update
anomalies (How can we fix these two relations?)
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke
Transitive Dependency
 Happens when an attribute is not directly related to the key.
 If K is the key, X is not a key, and XY, KY is inferred only
transitively, so Y is transitively dependent on the key.
 It happens when a non-key field is a fact about another non-
key field, i.e. (EMPLOYEE, DEPARTMENT, LOCATION)

 Below, Tournament,Year Winner, WinnerWinner Date of Birth

Tournament Year Winner Winner Date of Birth


Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke
Boyce-Codd Normal Form (BCNF)
 Reln R with FDs F is in BCNF if, for all X  A in F 
 A  X (called a trivial FD), or
 X contains a key for R.
 In other words, R is in BCNF if the only non-trivial
FDs that hold over R are key constraints.
 No redundancy in R that can be predicted using FDs
alone.
 a non-key field must provide a fact about the key, the
whole key, and nothing but the key
 There are no transitive dependencies…

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke


Third Normal Form (3NF)
 Reln R with FDs F is in 3NF if, for all X  A in F 
 A  X (called a trivial FD), or
 X contains a key for R, or
 A is part of some key for R.
 Minimality of a key is crucial in third condition above!
 If R is in BCNF, obviously in 3NF.
 If R is in 3NF, some redundancy is possible. It is a
compromise, used when BCNF not achievable (e.g., no
``good’’ decomp, or performance considerations).
 Lossless-join, dependency-preserving decomposition of R into a
collection of 3NF relations always possible (unlike BCNF).
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke
What Does 3NF Achieve?
 If 3NF violated by X  A, one of the following holds:
 X is a subset of some key K (partial dependency)
• We store (X, A) pairs redundantly.
 X is not a proper subset of any key. (transitive dependency)
• There is a chain of FDs K  X  A, which means that
we cannot associate an X value with a K value unless we
also associate an A value with an X value.
 But: even if reln is in 3NF, some problems could arise.
 Thus, 3NF is indeed a compromise relative to BCNF.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke


3NF Shortcomings
 Assume credit card info is also recorded in Reserves:
 Reserves( Sid,Bid,Day,CreditCardNo), SBDC in short
 SC and CS, therefore the candidate keys are SBD
and CBD
 So every attribute is a part of some key, in which case
3NF is guaranteed
 Yet, for each reservation of sailor S, same (S, C) pair is
stored redundantly.
 Thus, 3NF is indeed a compromise relative to BCNF.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke


Tutor ID Tutor Soc. Security Num. Student ID
1078 088-51-0074 31850
1078 088-51-0074 37921
1293 096-77-4146 46224
1480 072-21-2223 31850

•The table shows which tutors are assigned to which students.

•Candidate keys are: {Tutor ID, Student ID} and {Tutor SSN, Student ID}

•2NF prohibits partial functional dependencies of non-prime attributes on


candidate keys

• 3NF prohibits transitive functional dependencies of non-prime attributes


on candidate keys. This table is in 3NF since no non-prime attributes

The dependency of Tutor ID on Tutor Social Security Number breaks the BCNF
since Tutor SSN is not a candidate key.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1


Decomposition of a Relation Scheme
 Suppose that relation R contains attributes A1 ... An.
A decomposition of R consists of replacing R by two or
more relations such that:
 Each new relation scheme contains a subset of the attributes
of R (and no attributes that do not appear in R), and
 Every attribute of R appears as an attribute of one of the
new relations.
 Intuitively, decomposing R means we will store
instances of the relation schemes produced by the
decomposition, instead of instances of R.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1


Wages R W
Hourly Worker 8 10
5 7
Hourly_Emps2
 Problems due to R W:
S N L R H
 Update anomaly: Can
we change W in just 123-22-3666 Attishoo 48 8 40
the 1st tuple of SNLRWH? 231-31-5368 Smiley 22 8 30
 Insertion anomaly: What if
131-24-3650 Smethurst 35 5 30
we want to insert an
employee and don’t know
434-26-3751 Guldu 35 5 32
the hourly wage for his 612-67-4134 Madayan 35 8 40
rating?
S N L R W H
 Deletion anomaly: If we
delete all employees with 123-22-3666 Attishoo 48 8 10 40
rating 5, we lose the 231-31-5368 Smiley 22 8 10 30
information about the 131-24-3650 Smethurst 35 5 7 30
wage for rating 5!
434-26-3751 Guldu 35 5 7 32
Will 2 smaller tables be better? 612-67-4134 Madayan 35 8 10 40
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1
Example Decomposition

 Decompositions should be used only when needed.


 SNLRWH has FDs S  SNLRWH and R  W
 Second FD causes violation of 3NF; W values repeatedly
associated with R values. Easiest way to fix this is to create
a relation RW to store these associations, and to remove W
from the main schema:
• i.e., we decompose SNLRWH into SNLRH and RW
 The information to be stored consists of SNLRWH
tuples. If we just store the projections of these tuples
onto SNLRH and RW, are there any potential
problems that we should be aware of?
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1
Problems with Decompositions

 There are three potential problems to consider:


 Some queries become more expensive.
• e.g., How much did sailor Joe earn? (salary = W*H)
 Given instances of the decomposed relations, we may not
be able to reconstruct the corresponding instance of the
original relation!
• Fortunately, not in the SNLRWH example.
 Checking some dependencies may require joining the
instances of the decomposed relations.
• Fortunately, not in the SNLRWH example.
 Tradeoff: Must consider these issues vs. redundancy.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1
Lossless Join Decompositions

 Decomposition of R into X and Y is lossless-join w.r.t.


a set of FDs F if, for every instance r that satisfies F:
  X (r)   Y (r) = r
 It is always true that r   X (r)   Y (r)
 In general, the other direction does not hold! If it does, the
decomposition is lossless-join.
 Definition extended to decomposition into 3 or more
relations in a straightforward way.
 It is essential that all decompositions used to deal with
redundancy be lossless! (Avoids Problem (2).)
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1
Is this decomposition lossless?
Tournament Year Winner Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977

Winner Winner Date of Birth


Tournament Year Winner
Al Fredrickson 21 July 1975
Indiana Invitational 1998 Al Fredrickson
Cleveland Open 1999 Bob Albertson Bob Albertson 28 September 1968

Des Moines Al Fredrickson 21 July 1975


1999 Al Fredrickson
Masters
Chip Masterson 14 March 1977
Indiana Invitational 1999 Chip Masterson

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1


Is this decomposition lossless?
Tournament Year Winner Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977

Winner Winner Date of Birth


Tournament Year Winner
Al Fredrickson 21 July 1975
Indiana Invitational 1998 Al Fredrickson
Cleveland Open 1999 Bob Albertson Bob Albertson 28 September 1968

Des Moines Al Fredrickson 21 July 1975


1999 Al Fredrickson
Masters
Chip Masterson 14 March 1977
Indiana Invitational 1999 Chip Masterson

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1


Is this decomposition lossless?
StudentId CourseId Dept Grade
12 CS101 IE B-
12 CS102 IE C+
12 IE201 IE C-
7 CS201 CS A
7 CS352 CS D

StudentId CourseId Grade


StudentId Dept
12 CS101 B-
12 IE
12 CS102 C+
7 CS
12 IE201 C-
7 CS201 A
7 CS352 D

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1


A lossy decomposition

A B B C
A B C 1 2 2 3
1 2 3 4 5 5 6
4 5 6 7 2 2 8
7 2 8
A B C
Joining AB and BC produces tuples that did not 1 2 3
Originally exist in ABC
4 5 6
Under what circumstances a decomposition
7 2 8
is lossless ? 1 2 8
7 2 3
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1
When is a Join Lossless? A B
1 2
A B C 4 5
 The decomposition of R into 7 2
1 2 3
X and Y is lossless-join wrt F 4 5 6
if and only if the closure of F 7 2 8 B C
contains: 2 3
 X  Y  X, or 5 6
 X  Y  Y 2 8
A B C
 In particular, the 1 2 3
decomposition of R into 4 5 6
UV and R - V is lossless-join 7 2 8
if U  V holds over R. 1 2 8
7 2 3
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 2
Dependency Preserving Decomposition

 Consider Contracts table CSJDPQV


 (Contract, Supplier, Project, Dept, Part, Quantity, Value)
 C is key,
 JPC every project purchases a part using the same contract
 SDP departments purchase at most one part from suppliers

 BCNF decomposition: CSJDQV and SDP


 Problem: Checking JPC requires a join!

 A decomposition is dependency preserving if each constraint can be


checked by looking at one table only…
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 2
Dependency Preserving Decomposition

 Dependency preserving decomposition (Intuitive):


 If R is decomposed into X, Y and Z, and we enforce the FDs
that hold on X, on Y and on Z, then all FDs that were given
to hold on R must also hold. (Avoids Problem (3).)
 Projection of set of FDs F: If R is decomposed into X, ...
projection of F onto X (denoted FX ) is the set of FDs
UV in F+ (closure of F ) such that U, V are in X.

 Given F = {AB, AC, DA, BEF} and ABCDEF is


decomposed into ABC and DEF,
FABC={AB, AC}+, FDEF={}
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 2
Dependency Preserving Decompositions
(Contd.)
 Decomposition of R into X and Y is dependency
preserving if (FX union FY ) + = F +
 i.e., if we consider only dependencies in the closure F + that
can be checked in X without considering Y, and in Y
without considering X, these imply all dependencies in F +.

 Important to consider F +, not F, in this definition:


 ABC, AB, BC, CA, decomposed into AB and BC.
 Is this dependency preserving? Is CA preserved?????

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 2


Dependency Preserving Decompositions
(Contd.)
 ABC, AB, BC, CA, decomposed into AB and BC.
 Is this dependency preserving? Is CA preserved?????

 F + = {AB, BC, CA, AC, BA, CB, …}


 FAB = {AB, BA}, FBC = {BC, CB}

 {FAB union FBC }+ includes CA due to CB, BA

 Dependency preserving does not imply lossless join:


 ABC, A  B, decomposed into AB and BC.
 And vice-versa! (Example?)
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 2
Decomposition into BCNF
 Consider relation R with FDs F. If X  Y violates
BCNF, decompose R into R - Y and XY.
 Repeated application of this idea will give us a collection of
relations that are in BCNF; lossless join decomposition, and
guaranteed to terminate.
 e.g., CSJDPQV, key C, JP  C, SD  P, J  S
 To deal with SD  P, decompose into SDP, CSJDQV.
 To deal with J  S, decompose CSJDQV into JS and CJDQV
 In general, several dependencies may cause violation
of BCNF. The order in which we ``deal with’’ them
could lead to very different sets of relations!
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 2
BCNF and Dependency Preservation

 In general, there may not be a dependency preserving


decomposition into BCNF.
 e.g., CSZ, CS  Z, Z C
 Can’t decompose while preserving 1st FD; not in BCNF.
 Similarly, decomposition of CSJDQV into SDP, JS and
CJDQV is not dependency preserving (w.r.t. the FDs JP
C, SD P and J S). 
 However, it is a lossless join decomposition.
 In this case, adding JPC to the collection of relations gives us
a dependency preserving decomposition.
• JPC tuples stored only for checking FD! (Redundancy!)
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 2
Decomposition into 3NF
 Obviously, the algorithm for lossless join decomp into
BCNF can be used to obtain a lossless join decomp
into 3NF (typically, can stop earlier).
 To ensure dependency preservation, one idea:
 If X  Y is not preserved, add relation XY.
 Problem is that XY may violate 3NF! e.g., consider the
addition of CJP to `preserve’ JP  C. What if we also
have J  C ? (partial dependency)
 Refinement: Instead of the given set of FDs F, use a
minimal cover for F.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 2


Minimal Cover for a Set of FDs
 Minimal cover G for a set of FDs F:
 Closure of F = closure of G.
 Right hand side of each FD in G is a single attribute.
 If we modify G by deleting an FD or by deleting attributes
from an FD in G, the closure changes.
 Intuitively, every FD in G is needed, and ``as small as
possible’’ in order to get the same closure as F.
 Being small is in two respects:
 The right side is small (only one attribute)
 The left side is small (you cannot delete any)

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 2


Minimal Cover for a Set of FDs
 Minimal cover General method:
 Decompose the right sides (i.e. ABC into AB, AC)
 Minimize the left sides
 Delete redundant FDs
 Intuitively, every FD in G is needed, and ``as small as
possible’’ in order to get the same closure as F.
 e.g., AB, ABCDE, EFGH, ACDFEG has the
following minimal cover:
 AB, ACDE, EFG and EFH

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 2


3NF Decomposition

 Produce the minimal cover


 Apply the same algorithm for BCNF
 For every XA not preserved, add a new
relation XA
 XA is in 3NF (why?)

 So, a lossless, dependency preserving


decomposition is possible for 3NF, but not
BCNF

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 3


Refining an ER Diagram
 1st diagram translated: Before:
since
Workers(S,N,L,D,S) name dname
Departments(D,M,B) ssn lot did budget
 Lots associated with workers.
Employees Works_In Departments
 Suppose all workers in a
dept are assigned the same
lot: D  L
 Redundancy; fixed by: After:
budget
Workers2(S,N,D,S) name
since
dname
Dept_Lots(D,L) ssn did lot
 Can fine-tune this:
Workers2(S,N,D,S) Employees Works_In Departments
Departments(D,M,B,L)
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 3
Multivalued dependency

Course Book Lecturer

CS101 Savitch David

CS101 Deitel David

CS101 Savitch Markus

CS101 Deitel Markus

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 3


Multivalued dependency

Because the lecturers attached to the course and


the books attached to the course are independent
of each other, this database design has a
multivalued dependency; if we were to add a
new book to the AHA course, we would have to
add one record for each of the lecturers on that
course, and vice versa. Put formally, there are
two multivalued dependencies in this relation:
{course}  {book} and equivalently
{course} {lecturer}.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 3


Deliveries
Restaurant Food Area
Kebap 39 Doner Cankaya
Kebap 39 Doner Etlik
Kebap 39 Doner Bilkent
Kebap 39 Fish Cankaya
Kebap 39 Fish Etlik
Kebap 39 Fish Bilkent

Note that ΠFood, AreaR = ΠFoodR ×ΠAreaR


For R =  Restaurant = ‘Kebap 39’ Deliveries
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 3
Employee Skill | Language

Employee Skill Language


ali Cook Turkish
Ali Cook French
Ali Cook English
ali Driver Turkish
ali Driver French
ali Driver English

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 3


Multivalued dependency
Let R be a relation schema XYZ. The multivalued
dependency XY holds on R if, in any legal
relation r(R), for all pairs of tuples t1 and t2 in r
such that t1[X] = t2[X],
there exist tuples t3 and t4 in r such that
t1[X] = t2[X] = t3[X] = t4[X]
t3[Y] = t1[Y]
X Y Z
t3[Z] = t2[Z]
t4[Y] = t2[Y] t1 x1 y1 z1
t4[Z] = t1[Z] t2 x1 y2 z2
t3 x1 y1 z2
t4 x1 y2 z1
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 3
Join Dependency
keep a record of which agent sells which product
for which company

AGENT COMPANY PRODUCT


Smith Ford car
Smith GM truck

Although GM also sells cars, Smith does not sell it.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 3


Let’s say that we know if an agent sells a certain
product, and he represents a company
making that product, then he sells that
product for that company.
Agent Company Product
Smith ford car
Smith ford truck
Smith MAN Bus
Smith GM motorcycle
ali MAN truck
ali ford car
•What are the violations of the rule?
•Can we say Agent Company | Product ?
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 3
Not a join dependency because…

 Although Smith sells trucks, Smith works


with MAN, and MAN appears to produce
trucks, Smith does not sell MAN trucks.

 Ali works with ford, ford produces trucks, ali


sells trucks, but ali does not sell ford trucks.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 3


Agent Company Product
Smith ford car
Smith GM Bus
Smith ford truck
ali MAN truck
ali ford truck

A relation can be losslessly decomposed if a JD is present

Agent Comp. Comp. Product Agent Product


Smith ford ford car Smith car
Smith GM ford truck Smith truck
ali MAN MAN truck Smith Bus
ali ford GM Bus ali truck

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 4


Psychiatrist Insurer Condition
Dr. James Healthco Anxiety
Dr. James Healthco Depression
Dr. Kendrick Friendly OCD
Dr. Kendrick Friendly Anxiety
Dr. Kendrick Friendly Depression
Dr. Kendrick Friendly Mood Disorder
Dr. Lowe Friendly Anxiety
Dr. Lowe Friendly Schizophrenia
Dr. Lowe Healthco Anxiety
Dr. Lowe Healthco Dementia
Dr. Lowe Victorian Conversion Disorder

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 4


Psychiatrist Condition
Insurer Condition
Dr. James Anxiety
Psychiatrist Insurer Healthco Anxiety
Dr. James Depression
Dr. James Healthco Healthco Depression
Dr. Kendrick OCD
Dr. Kendrick Friendly Healthco Dementia
Dr. Kendrick Anxiety
Dr. Lowe Friendly Friendly OCD
Dr. Kendrick Depression
Dr. Lowe Healthco Friendly Anxiety
Mood
Dr. Kendrick Dr. Lowe Victorian Friendly Depression
Disorder
Dr. Lowe Schizophrenia Friendly Mood Disorder

Dr. Lowe Anxiety Friendly Schizophrenia


Conversion
Dr. Lowe Dementia Victorian
Disorder
Conversion
Dr. Lowe
Disorder

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 4

You might also like