Chapter 5 -Database
Chapter 5 -Database
12/29/2024 1
Outline
n 1 Informal Design Guidelines for Relational Databases
n 1.1Semantics of the Relation Attributes
n 1.2 Redundant Information in Tuples and Update
Anomalies
n 1.3 Null Values in Tuples
n 1.4 Spurious Tuples
n 2 Functional Dependencies (FDs)
n 2.1 Definition of FD
n 2.2 Inference Rules for FDs
n 2.3 Equivalence of Sets of FDs
n 2.4 Minimal Sets of FDs
12/29/2024 2
Outline
n 3 Normal Forms Based on Primary Keys
n 3.1 Normalization of Relations
n 3.2 Practical Use of Normal Forms
n 3.3 Definitions of Keys and Attributes Participating in
Keys
n 3.4 First Normal Form
n 3.5 Second Normal Form
n 3.6 Third Normal Form
12/29/2024 3
Informal Design Guidelines for Relational
Databases (1)
n What is relational database design?
n The grouping of attributes to form "good" relation
schemas
n Two levels of relation schemas
n The logical "user view" level
n This level represents how users perceive the data in the database.
n providing a conceptual view of the data for users and applications
n This level describes the entire database structure, including all the entities,
attributes, relationships, and constraints.
n The storage "base relation" level
n Handles the physical implementation and management of that
data.
n This level describes how the data is actually stored in the database system.
n Design is concerned mainly with base relations
n What are the criteria for "good" base relations?
12/29/2024 4
Informal Design Guidelines for
Relational Databases (2)
n We first discuss informal guidelines for good relational
design
n Then we discuss formal concepts of functional
dependencies and normal forms
n - 1NF (First Normal Form)
n - 2NF (Second Normal Form)
n - 3NF (Third Normal Form)
n - BCNF (Boyce-Codd Normal Form)
12/29/2024 5
1.1: Semantics of the Relation
Attributes
n GUIDELINE 1: Informally, each tuple in a relation
should represent one entity or relationship instance.
(Applies to individual relations and their attributes).
n Attributes of different entities (EMPLOYEEs,
DEPARTMENTs, PROJECTs) should not be mixed in
the same relation
n Only foreign keys should be used to refer to other
entities
n Entity and relationship attributes should be kept apart
as much as possible.
n Design a schema that can be explained easily relation
by relation. The semantics of attributes should be
easy to interpret.
12/29/2024 6
Simplified company Relational DB Schema
• Eg.
12/29/2024 7
1.2 Redundant Information in Tuples and
Update Anomalies
n Information is stored redundantly
n Wastes storage
n Causes problems with update anomalies
n Insertion anomalies
n Deletion anomalies
n Modification anomalies
n Update Anomaly(deviation, abnormal):
n Update anomalies occur when updating data in a table leads to
inconsistencies or undesired effects in the database.
n This can happen when a single piece of information is stored in multiple
places in the database, and only some instances of that information are
updated, leaving others unchanged.
n Eg. Consider the relation:
n EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
n Changing the name of project number P1 from “Billing” to
“Customer-Accounting” may cause this update to be made for all
100 employees working on project P1.
12/29/2024 8
Cont..
n Consider the relation:
n EMP_PROJ(Emp#, Proj#, Ename, Pname,
No_hours)
n Insert Anomaly:
n Cannot insert a project unless an employee is
assigned to it.
n Conversely
n Cannot insert an employee unless he/she is
assigned to a project.
12/29/2024 9
Cont.…
n Consider the relation:
n EMP_PROJ(Emp#, Proj#, Ename, Pname,
No_hours)
n Delete Anomaly:
n When a project is deleted, it will result in
deleting all the employees who work on that
project.
n Alternately, if an employee is the sole/only
employee on a project, deleting that employee
would result in deleting the corresponding
project.
12/29/2024 10
Modification Anomalies
n Modification anomalies happen when changing
data in a table results in inconsistencies or
difficulties in maintaining data accuracy.
n This occurs when there are redundant or overlapping
attributes in a table, leading to issues when modifying
certain pieces of information.
n Let's say you have a Product table with attributes (ProductID,
ProductName, and Category). If a product's category is
updated, but there are multiple entries for the same product
with different categories, maintaining consistency becomes
challenging and may lead to modification anomalies.
12/29/2024 11
Figure 10.3 Two relation schemas
suffering from update anomalies
• Eg.
12/29/2024 12
• Figure 10.4 Example States A natural join is an
for EMP_DEPT and operation in relational
databases that
EMP_PROJ combines two tables
based on columns
with the same name
and data type.
It automatically
matches and merges
rows from the two
tables where the
values in the common
columns are equal.
SELECT * FROM
Customers
NATURAL JOIN
Orders;
12/29/2024 13
Guideline to Redundant Information
in Tuples and Update Anomalies
n GUIDELINE 2:
n Design a schema that does not suffer
from the insertion, deletion and update
anomalies.
n If there are any anomalies present,
12/29/2024 14
1.3 Null Values in Tuples
n GUIDELINE 3:
n Relations should be designed such that their
tuples will have as few NULL values as
possible
n Attributes that are NULL frequently could be
placed in separate relations
n Reasons for nulls:
n Attribute not applicable or invalid
n Attribute value unknown (may exist)
n Value known to exist, but unavailable
12/29/2024 15
1.4 Spurious Tuples
n Spurious tuples are unintended or incorrect tuples
that can arise due to improper database design or
inconsistent dependencies.
n They can occur when there are transitive
dependencies or when data is improperly
normalized, leading to unexpected results in
queries or operations.
n Bad designs for a relational database may result
in erroneous results for certain JOIN operations
n The "lossless join" property is used to guarantee
meaningful results for join operations
n GUIDELINE 4:
n The relations should be designed to satisfy the
lossless join condition.
n No spurious/fake tuples should be generated by
doing a natural-join of any relations.
12/29/2024 16
Cont…
n There are two important properties of
decompositions:
a) Non-additive or losslessness of the corresponding join
b) Preservation of the functional dependencies.
n Note that:
n Property (a) is extremely important and cannot be
sacrificed.
n Property (b) is less stringent and may be sacrificed.
12/29/2024 17
2.1 Functional Dependencies
• Functional dependencies are constraints in a
relational database that describe the relationship
between attributes within a table. A functional
dependency is denoted as X→Y, where X and Y
are sets of attributes in a table.
• In simple terms, a functional dependency X→Y means
that for every unique value of X, there is only one
corresponding value of Y.
• A set of attributes X functionally determines a set of
attributes Y if the value of X determines a unique value
for Y
• Example: In a table with attributes A, B, and C, if
A uniquely determines B (i.e., A→B), then for every
value of A, there is only one corresponding value of
B.
12/29/2024 18
Cont…
n X -> Y holds if whenever two tuples have the same
value for X, they must have the same value for Y
n For any two tuples t1 and t2 in any relation instance
r(R): If t1[X]=t2[X], then t1[Y]=t2[Y]
n X -> Y in R specifies a constraint on all relation
instances r(R)
n Written as X -> Y; can be displayed graphically on a
relation schema as in Figures. ( denoted by the
arrow: ).
n FDs are derived from the real-world constraints on the
attributes
12/29/2024 19
Examples of FD constraints (1)
n Social security number determines employee
name
n SSN -> ENAME
n Project number determines project name and
location
n PNUMBER -> {PNAME, PLOCATION}
n Employee ssn and project number determines
the hours per week that the employee works
on the project
n {SSN, PNUMBER} -> HOURS
12/29/2024 20
Cont…
n FD is a property of the attributes in the
schema R
n The constraint must hold on every relation
instance r(R)
n If K is a key of R, then K functionally
determines all attributes in R
n (since we never have two distinct tuples with
t1[K]=t2[K])
12/29/2024 21
2.2 Inference Rules for FDs
n Given a set of FDs F, we can infer additional FDs that
hold whenever the FDs in F hold
n Armstrong's inference rules:
n IR1. (Reflexive) If Y subset-of X, then X -> Y
n IR2. (Augmentation) If X -> Y, then XZ -> YZ
n (Notation: XZ stands for X U Z)
n IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z
12/29/2024 22
Cont…
n Some additional inference rules that are
useful:
n Decomposition: If X -> YZ, then X -> Y and X
-> Z
n Union: If X -> Y and X -> Z, then X -> YZ
n Psuedotransitivity: If X -> Y and WY -> Z,
then WX -> Z
12/29/2024 23
Inference Rules for FD…
n Closure of a set F of FDs is the set F+ of all
FDs that can be inferred from F
n Closure of a set of attributes X with respect to
F is the set X+ of all attributes that are
functionally determined by X
n Example: If A→B is a FD, then B is in the closure
of A (i.e., �+ ={A,B}).
n X+ can be calculated by repeatedly applying
IR1, IR2, IR3 using the FDs in F
12/29/2024 24
2.3 Equivalence of Sets of FDs
n Two sets of FDs F and G are equivalent if:
n Every FD in F can be inferred from G, and
n Every FD in G can be inferred from F
n Hence, F and G are equivalent if F+ =G+
n Definition (Covers):
n F covers G if every FD in G can be inferred from F
n (i.e., if G+ subset-of F+)
n F and G are equivalent if F covers G and G covers F
n There is an algorithm for checking equivalence of sets
of FDs
12/29/2024 25
2.4 Minimal Sets of FDs (1)
n A minimal set of FDs eliminates any redundant or
unnecessary dependencies while preserving the same
closure or implied constraints.
n Finding minimal sets of FDs is important in database
design as it helps in simplifying the schema without losing
essential information.
n A set of FDs is minimal if it satisfies the following
conditions:
1. Every dependency in F has a single attribute for its
Right Hand Side.
2. We cannot remove any dependency from F and have
a set of dependencies that is equivalent to F.
3. We cannot replace any dependency X -> A in F with a
dependency Y -> A, where Y proper-subset-of X ( Y
subset-of X) and still have a set of dependencies that
is equivalent to F.
12/29/2024 26
Minimal Sets of FDs (2)
n Every set of FDs has an equivalent minimal
set
n There can be several equivalent minimal sets
n There is no simple algorithm for computing a
minimal set of FDs that is equivalent to a set F
of FDs
n To synthesize/create a set of relations, we
assume that we start with a set of
dependencies that is a minimal set
12/29/2024 27
3. Normal Forms
• Normal forms in the context of relational databases
are rules or guidelines that help organize database
tables and reduce redundancy, anomalies, and
inconsistencies.
12/29/2024 28
3.1 Normalization of Relations
n Normalization:
n The process of decomposing unsatisfactory "bad" relations by breaking
12/29/2024 29
Normalization of Relations (2)
n 1NF,2NF, 3NF, BCNF
n based on keys and FDs of a relation schema
n 4NF
n based on keys, multi-valued dependencies :
MVDs;
n 5NF
n based on keys, join dependencies : JDs
n Additional properties may be needed to
ensure a good relational design (lossless join,
dependency preservation;)
12/29/2024 30
3.2 Practical Use of Normal Forms
n Normalization is carried out in practice so that the
resulting designs are of high quality and meet the
desirable properties
n The practical utility of these normal forms becomes
questionable when the constraints on which they are
based are hard to understand or to detect
n The database designers need not normalize to the
highest possible normal form
n (usually up to 3NF, BCNF or 4NF)
n Denormalization:
n Is a database optimization technique used to improve
query performance by reducing the number of joins
needed to retrieve data
n The process of storing the join of higher normal
form relations as a base relation—which is in a
lower normal form
12/29/2024 31
3.3 Definitions of Keys and Attributes
Participating in Keys (1)
n A superkey of a relation schema R = {A1,
A2, ...., An} is a set of attributes S subset-of R
with the property that no two tuples t1 and t2
in any legal relation state r of R will have t1[S]
= t2[S]
12/29/2024 32
Cont…
n If a relation schema has more than one key,
each is called a candidate key.
n One of the candidate keys is arbitrarily
designated to be the primary key, and the
others are called secondary keys.
n A Prime attribute must be a member of some
candidate key
n A Nonprime attribute is not a prime
attribute—that is, it is not a member of any
candidate key.
12/29/2024 33
3.2 First Normal Form
• Definition: a table (relation) is in 1NF.
• If
ØThere are no duplicated rows in the table. Unique identifier
ØEach cell is single-valued (i.e., there are no repeating groups).
ØEntries in a column (attribute, field) are of the same kind.
n Disallows
n composite attributes
n multivalued attributes
n nested relations; attributes whose values for an
individual tuple are non-atomic
12/29/2024 34
Cont…
• 1NF
12/29/2024 35
Figure 10.9 Normalization nested relations into 1NF
• E.g.
12/29/2024 36
3.3 Second Normal Form (1)
• Definition: a table (relation) is in 2NF.
• If
ØIt is in 1NF and
ØIf all non-key attributes are fully functionally dependent
on the entire primary key. i.e. no partial dependency.
n Uses the concepts of FDs, primary key
n Prime attribute: An attribute that is member of the primary key K.
n Full functional dependency: a FD Y -> Z where removal of any attribute from
Y means the FD does not hold any more
n Examples:
n If a table has a composite primary key (e.g., {StudentID, CourseID}),
any attribute related to a specific course (e.g., CourseName) should
depend on the entire composite key, not just on StudentID.
n {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor
PNUMBER -> HOURS hold
n {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency )
since SSN -> ENAME also holds
12/29/2024 37
Cont…
n A relation schema R is in second normal
form (2NF) if every non-prime attribute A in R
is fully functionally dependent on the primary
key
12/29/2024 38
Conti.
Example for 2NF :-
EMP_PROJ
• PROJECT
ProjNo ProjName ProjLoc ProjFund ProjMangID
EMP_PROJ
EmpID ProjNo Incentive
12/29/2024 40
3.4 Third Normal Form
• Definition: a Table (Relation) is in 3NF
• If
ØIt is in 2NF and
ØThere are no transitive dependencies
between a primary key and non-
primary key attributes.
n This means that non-key attributes should not
depend on other non-key attributes within the
same table.
n Transitive functional dependency: a FD X ->
Z that can be derived from two FDs X -> Y
and Y -> Z
12/29/2024 41
Cont…
• Consider the following example:
• In the table able, [Book ID] determines [Genre ID], and [Genre ID]
determines [Genre Type]. Therefore, [Book ID] determines [Genre Type]
via [Genre ID] and we have transitive functional dependency, and this
structure does not satisfy third normal form.
To bring this table to third normal form, we split the table into two as
follows:
12/29/2024 42
Cont…
• Now all non-key attributes are fully functional dependent only
on the primary key. In [TABLE_BOOK], both [Genre ID]
and [Price] are only dependent on [Book ID]. In
[TABLE_GENRE], [Genre Type] is only dependent on [Genre
ID].
12/29/2024 43
4 Other levels of Normalization
BCNF (Boyce-Codd Normal Form)
n A relation schema R is in Boyce-Codd Normal Form (BCNF) if
whenever an FD X -> A holds in R, then X is a superkey of R
n BCNF is a stricter form of 3NF, where every determinant is a
candidate key.
n It ensures that there are no non-trivial functional dependencies
where a non-key attribute determines another non-key attribute.
• Example: If a table has columns for EmployeeID,
ProjectID, and ProjectName, where ProjectName
depends only on ProjectID (which is a candidate key),
it would be in BCNF.
n Each normal form is strictly stronger than the previous one
n Every 2NF relation is in 1NF
n Every 3NF relation is in 2NF
n Every BCNF relation is in 3NF
n There exist relations that are in 3NF but not in BCNF
n The goal is to have each relation in BCNF (or 3NF)
12/29/2024 44
Cont…
Forth Normal form (4NF)
Ø Isolate Semantically Related Multiple Relationships - There may be
practical constrains on information that justify separating logically
related many-to-many relationships.
Def: A table is in 4NF if it is in BCNF and if it has no multi-valued
dependencies.
12/29/2024 45