DataBases
DataBase Design
Normalization Process
DataBase course notes 8
Database Design
Conceptual Data Modeling
Logical Database Design
Normalization Process
Implementing Base Table Structures
DataBase course notes 8
NORMALIZATION PROCESS
DataBase course notes 8
Normalization
process of taking entities and attributes
that have been discovered and making
them suitable for the relational database
process does this by removing
redundancies and shaping data in manner
that the relational engine desires
DataBase course notes 8
Normalization
based on a set of levels, each of which
achieving a level of correctness or adherence to
a particular set of rules
rules formally known as forms, normal forms
First Normal Form(1NF)
which eliminates data redundancy and continues
through to
Fifth Normal Form (5NF)
which deals with decomposition of ternary
relationships
DataBase course notes 8
Normalization
each level of normalization indicates an
increasing degree of adherence to the
recognized standards of database design
as you increase degree of normalization of
your data, youll naturally tend to create an
increasing number of tables of decreasing
width (fewer columns)
DataBase course notes 8
Why Normalize?
eliminate data thats duplicated, chance it wont match
when you need it
avoid unnecessary coding needed to keep duplicated
data in sync
keep tables thin, increase number of values that will fit
on a page (8K) decrease number of reads that will be
needed
maximizing use of clustered indexes allow for
optimum data access and joins
lowering number of indexes per table - indexes are
costly to maintain
DataBase course notes 8
Eliminating duplicated data
any piece of data that occurs more than once in
the database => increased probability for errors
to happen
Eliminating anomalies
INSERT
DELETE
UPDATE
Easy to keep database consistent;
Easy to preserve the integrity of the database
DataBase course notes 8
Functional dependencies
R(A1,A2,,An) a relation schema
X,Y (A1, A2,, An)
Consider
Definition:
The attribute X functionally determines the attribute Y, X ->
Y, if and only if for any value of X, there is only one value
of Y corresponding to X.
The functional dependency X->Y is total if there isnt any
Z, ZX, Z -> Y; otherwise, it is partial
Observations:
If X->Y, then, for any Z, Z Y, we have: X->Z
If X->Y and X is a simple attribute, then Y is totally
(functionally) dependent on X.
If Y is totally dependent on Z, then we have X->Y for
every composed attribute X that contains Z.
DataBase course notes 8
Armstrongs axioms
A1 (Reflexivity)
If Y X => X->Y
A2 (Augmentation)
If X->Y => XZ -> Y Z
A3 (Transitivity)
If X->Y and Y->Z => X->Z
Process of Normalization
take entities that are complex and extract
simpler entities from them
continues until every table in database
represents one thing (simple entity) and
every column describes that thing
DataBase course notes 8
11
3 categories of normalization steps
entity and attribute shape
relationships between attributes
multi-valued and join dependencies in
entities
DataBase course notes 8
12
Entity and attribute shape
First Normal Form
all attributes must be atomic, that is, only a
single value represented in a single
attribute in a single instance of an entity
all instances of an entity must contain the
same number of values
all instances of an entity must be different
DataBase course notes 8
13
First Normal Form
violations =>
data handling not optimal - having to
decode multiple values stored where
a single one should be
having duplicated rows that cannot be
distinguished from one another
DataBase course notes 8
14
for example, consider group of data like 1, 2, 3, 5, 7
likely represents five separate values
atomicity is to consider whether you would ever need to
deal with part of column without other parts of data in
that same column
1, 2, 3, 5, 7 list always treated as single value, it might
be acceptable to store value in single column
if you might need to deal with value 3 individually, then
the list is definitely not in First Normal Form
if there is not plan to use list elements individually, you should
consider whether it is still better to store each value individually to
allow for future possible usage
DataBase course notes 8
15
E-Mail Addresses
name1@domain1.com
AccountName: name1
Domain: domain1.com
DataBase course notes 8
16
E-Mail Addresses
if all youll ever do is send e-mail, then
single column is perfectly acceptable
If you need to consider what domains you
have e-mail addresses stored for =>
access individual parts, then its a
completely different matter
DataBase course notes 8
17
Telephone Numbers
AAA-EEE-NNNN (XXXX):
AAA area code indicates calling area located
within a state
EEE exchange - indicates a set of numbers
within an area code
NNNN number - used to make individual phone
numbers unique
XXXX extension - number that must be dialed
after connecting
DataBase course notes 8
18
Mailing Addresses
DataBase course notes 8
19
Mailing Addresses
DataBase course notes 8
20
All instances in entity contain
same number of values
entities have a fixed number of attributes
and tables have a fixed number of columns
entities should be designed such that
every attribute has a fixed number of
values associated with it
example of a violation of this rule in entities that
have several attributes with same base name
suffixed (or prefixed) with a number, such as
Payment1, Payment2, and so on
DataBase course notes 8
21
Programming Anomalies
avoided by First Normal Form
modifying lists in single column
modifying multipart values
dealing with a variable number of facts in
an instance
DataBase course notes 8
22
Clues that design is not in First
Normal Form
string data that contains separator-type
characters
attribute names with numbers at the end
tables with no or poorly defined keys
DataBase course notes 8
23
Relationships Between
Attributes
Second Normal Form
relationships between non-key attributes and part of
the primary key
Third Normal Form
relationships between non-key attributes
BCNF (Boyce Codd Normal Form)
relationships between non-key attributes and any key
Non-key attributes must provide a detail about the key,
the whole key, and nothing but the key.
DataBase course notes 8
24
Second Normal Form
entity must be in First Normal Form.
each attribute must be a fact describing
the entire key
technically relevant only when a composite
key (a key composed of two or more
columns) exists in the entity
Definition A relation R is in the second
normal form (FN2) if it is in FN1 and every
nonkey attribute is totally dependent on every
relationship key
DataBase course notes 8
25
Each non-key attribute must
describe entire key
DataBase course notes 8
26
BookIsbnNumber attribute uniquely identifies
book
AuthorSocialSecurityNumber uniquely identifies
author
two columns create key that uniquely identifies
an author for book
BookTitle describes book
but doesnt describe author at all
AuthorFirstName and AuthorLastName, describe
author, but not book
DataBase course notes 8
27
BookIsbnNumber BookTitle
AuthorSocialSecurityNumber
AuthorFirstName
AuthorSocialSecurityNumber
AuthorLastName
BookIsbnNumber,
AuthorSocialSecurityNumber
RoyaltyPercentage
DataBase course notes 8
28
DataBase course notes 8
29
Programming problems avoided
all programming issues that arise with
Second Normal Form (as well as Third
and Boyce-Codd Normal Forms) deal with
functional dependencies that can end up
corrupting data
DataBase course notes 8
30
DataBase course notes 8
31
same authors information would have to
be duplicated amongst all books
cannot delete only book and keep author
around
cannot insert only author whitout book
DataBase course notes 8
32
Anomalies
UPDATE
duplicate data, have to update multiple rows
INSERT
cannot insert data for an entity without
relationship to any other entity
DELETE
cannot delete data for an entity without risk of
looseing info about related entity
DataBase course notes 8
33
Clues that entity is not in
Second Normal Form
repeating key attribute name prefixes,
indicating that values are probably
describing some additional entity
data in repeating groups, showing signs of
functional dependencies between
attributes
composite keys without foreign key, which
might be sign you have key values that
identify multiple things
DataBase course notes 8
34
Third Normal Form
entity must be in Second Normal Form.
non-key attributes cannot describe other
non-key attributes
Definition: A relation R is in the third
normal form (FN3) if it is in FN2 and none
of the non-key attributes is not functionally
dependent on another non-key attribute of
the relation.
DataBase course notes 8
35
non-key attributes cannot describe
other non-key attributes
PublisherName -> PublisherCity
DataBase course notes 8
36
Title defines title for the book defined by
BookIsbnNumber
Price indicates price of the book
PublisherName describes the books publisher
PublisherCity also sort of describes something
about the book, in that it tells where the
publisher was located
doesnt make sense in this context, because
location of publisher is directly dependent on
what publisher is represented by PublisherName
DataBase course notes 8
37
Anomalies
INSERT
- cannot register a publisher unless there is a book that belongs
to that publisher
DELETE
- if we delete the only book of a certain publisher, we lose
all the information referring to that publisher
UPDATE
- the information referring to a certain publisher is redundant;
if we want to update the information of a publisher, we must
perform the same operation for all the books that belong to that
publisher
DataBase course notes 8
38
DataBase course notes 8
39
Publisher entity has data concerning only
the publisher
Book entity has book information
now if we want to add information to our
schema concerning the publisher, contact
information or address, its obvious where we
add that information
City attribute clearly identifying publisher
not the book
DataBase course notes 8
40
Clues that entities are not in
Third Normal Form
multiple attributes with same prefix
much like Second Normal Form, only this time
not in the key
repeating groups of data
summary data that refers to data in a
different entity altogether
Price in Invoice as
SUM(Quantity*ProductCost) from LineItems
DataBase course notes 8
41
Boyce-Codd Normal Form
Ray Boyce, Edgar F. Codd
entity is in First Normal Form.
all attributes are fully dependent on a key
every determinant is a key
DataBase course notes 8
42
Entity in BCNF if every
Determinant is key
Determinant Any attribute or
combination of attributes on which any
other attribute or combination of attributes
is functionally dependent.
BCNF extends previous normal forms by
saying that each entity might have many
keys, and all attributes must be dependent
on one of these keys
DataBase course notes 8
43
Third Normal Form table which does not have
multiple overlapping candidate keys is
guaranteed to be in BCNF
Third Normal Form table with two or more
overlapping candidate keys may or may not
be in BCNF
Definition A relation R is in the Boyce-Codd
Normal Form (BCNF), if, for every functional
dependency X->A from R, where A is an attribute
that doesnt belong to X => X is a key, or includes a
key from R.
DataBase course notes 8
44
Court Bookings
Court
Start Time
End Time
Rate Type
09:30
10:30
SAVER
11:00
12:00
SAVER
14:00
15:30
STANDARD
10:00
11:30
PREMIUM-B
11:30
13:30
PREMIUM-B
15:00
16:30
PREMIUM-A
DataBase course notes 8
45
Court Bookings
hard court (Court1) and grass court (Court2)
booking defined by Court and period for
which the Court is reserved
booking has Rate Type associated
SAVER for hard made by members
STANDARD for hard made by non-members
PREMIUM-A for grass made by members
PREMIUM-B for grass made by non-members
DataBase course notes 8
46
Court Bookings - candidate keys
{Court, Start Time}
{Court, End Time}
{Rate Type, Start Time}
{Rate Type, End Time}
DataBase course notes 8
47
table adheres to both 2NF and 3NF
table does not adhere to BCNF
because of dependency Rate Type
Court, in which the determining attribute
(Rate Type) is neither a candidate key, nor
a superset of a candidate key
DataBase course notes 8
48
Rate Types
Court Bookings
Rate Type
Court
Member
Flag
Court
Start
Time
End
Time
Member
Flag
SAVER
Yes
09:30
10:30
Yes
STANDARD 1
No
11:00
12:00
Yes
PREMIUM2
A
Yes
14:00
15:30
No
10:00
11:30
No
PREMIUM2
B
No
11:30
13:30
No
15:00
16:30
Yes
DataBase course notes 8
49
candidate keys for Rate Types table are
{Rate Type} and {Court, Member Flag}
candidate keys for Court Bookings table
are {Court, Start Time} and {Court, End
Time}
both tables are in BCNF
having one Rate Type associated with two
different Courts is now impossible
anomaly affecting original table has been
eliminated
DataBase course notes 8
50
Multivalue Dependencies
Third Normal Form is generally considered
pinnacle of proper database design
serious problems might still remain in
logical design
DataBase course notes 8
51
Definition
We say that there exists a multi-value dependency of
the attribute Z on Y, or that Y performs a multidetermination on Z, Y->->Z, if, for every values x1, x2, y,
z1, z2, where x1x2, z1 z2, such that the tuples (x1,y,z1)
and (x2,y,z2) belong to R, then also the tuples (x1, y, z2)
and (x2, y, z1) belong to R.
Fourth Normal Form
entity must be in BCNF
there must not be more than one
multivalue dependency between an
attribute and the key of the entity
Definition A relationship R is in the fourth
normal form if, for every multivalue dependency,
X->->Y, then X is a key or includes a key in R.
DataBase course notes 8
53
Fourth Normal Form
table is in 4NF if and only if, for every one
of its non-trivial multivalued dependencies
X Y, X is a super key, X is either
candidate key or a superset thereof
DataBase course notes 8
54
Fourth Normal Form violations
ternary relationships
lurking multivalued attributes
DataBase course notes 8
55
Restaurant
Pizza Variety
Delivery Area
A1 Pizza
Thick Crust
Springfield
A1 Pizza
Thick Crust
Shelbyville
A1 Pizza
Thick Crust
Capital City
A1 Pizza
Stuffed Crust
Springfield
A1 Pizza
Stuffed Crust
Shelbyville
A1 Pizza
Stuffed Crust
Capital City
Elite Pizza
Thin Crust
Capital City
Elite Pizza
Stuffed Crust
Capital City
Vincenzo's Pizza
Thick Crust
Springfield
Vincenzo's Pizza
Thick Crust
Shelbyville
Vincenzo's Pizza
Thin Crust
Springfield
Vincenzo's Pizza
Thin Crust
Shelbyville
DataBase course notes 8
56
table has no non-key attributes
meets all normal forms up to BCNF
not in 4NF, non-trivial multivalued
dependencies
{Restaurant} {Pizza Variety}
{Restaurant} {Delivery Area}
eliminate possibility of anomalies
DataBase course notes 8
57
Anomalies
INSERT
If we add a certain kind of pizza, delivered to a certain
restaurant, then we have to repeat this information for
every delivery area corresponding to that restaurant
DELETE
If we delete the information that corresponds to the only pizza
delivered by a certain restaurant, then we have to delete the
information that refers to all the areas that restaurant is delivering to.
UPDATE
If we want to update the name of the pizza delivered by a certain
restaurant, then we have to update this name for all the
corresponding delivery areas of that restaurant
Restaurant
Pizza Variety
A1 Pizza
Thick Crust
A1 Pizza
Stuffed Crust
Elite Pizza
Thin Crust
Elite Pizza
Stuffed Crust
Vincenzo's Pizza
Thick Crust
Vincenzo's Pizza
Thin Crust
Restaurant
Delivery Area
A1 Pizza
Springfield
A1 Pizza
Shelbyville
A1 Pizza
Capital City
Elite Pizza
Capital City
Vincenzo's Pizza
Springfield
Vincenzo's Pizza
Shelb
4th NORMAL FORM (4NF) - OK
DataBase course notes 8
59
in contrast, if pizza varieties offered by restaurant
sometimes did legitimately vary from one delivery area to
another, the original three-column table would satisfy
4NF
DataBase course notes 8
60
Fifth Normal Form
not every ternary relationship can be broken
down into two entities related to a third
aim of 5NF is to ensure that any ternary
relationships that still exist in 4NF, can be
decomposed into entities without loss of
information
eliminates problems with update anomalies due
to multivalve dependencies
DataBase course notes 8
61
Decomposition
R=(Professor, Discipline, Language) assume to be in the 4-th normal
form
R1=(Professor, Discipline)
R2=(Professor, Language)
R1|><| R2 R
R3= (Discipline, Language)
R1 |><| R2 |><| R3 = R
Join Dependency Consider R(A1,A2,..,An) a relation schema and
R1, R2, .., Rk subsets of {A1, A2,.., An}. There is a join dependency
called *(R1, R2, , Rk) if and only if any instantiation r of R is the
result of coupling between its projections R1, R2,,Rk,
r = R1( r ) |><| R2( r ).. |><| Rk( r )
=> *(R1, R2, R3) is a join dependency on the relation R
A relation is in FN5 if and only if the coupling
dependencies that exist in a relation are
implied by a key of the relation
Evidence(Professor, Student, Discipline, Language,
Mark)
Key: Student, Discipline
decomposed, without loss of information, in
SDP(Student, Discipline, Professor)
SDL(Student, Discipline, Language)
SDM (Student, Discipline, Mark)
Denormalization
used primarily to improve performance in cases
where over-normalized structures are causing
overhead to query processor
whether slightly slower (but 100 percent
accurate) application is not preferable to a faster
application of lower accuracy
during logical modeling, we should never step
back from our normalized structures to
performance-tune our applications proactively
DataBase course notes 8
64