0% found this document useful (0 votes)
9 views25 pages

DBMS 3

The document discusses schema refinement and normalization in database management systems, focusing on the issues caused by redundancy such as update, insertion, and deletion anomalies. It explains the use of functional dependencies to identify and resolve these problems through decomposition into smaller relations, and outlines various normal forms (1NF, 2NF, 3NF, BCNF, etc.) that help ensure data integrity and minimize redundancy. Additionally, it covers reasoning about functional dependencies and the process of normalization to create well-structured relations.

Uploaded by

Lahari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views25 pages

DBMS 3

The document discusses schema refinement and normalization in database management systems, focusing on the issues caused by redundancy such as update, insertion, and deletion anomalies. It explains the use of functional dependencies to identify and resolve these problems through decomposition into smaller relations, and outlines various normal forms (1NF, 2NF, 3NF, BCNF, etc.) that help ensure data integrity and minimize redundancy. Additionally, it covers reasoning about functional dependencies and the process of normalization to create well-structured relations.

Uploaded by

Lahari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

DATABASE MANAGEMENT SYSTEMS

UNIT – III

 SCHEMA REFINEMENT

 NORMAL FORMS
UNIT – III
8

SCHEMA REFINEMENT
Introduction to schema refinement
Functional dependencies
Reasoning about FDs
NORMAL FORMS
1NF, 2NF, 3NF, BCNF
Properties of decompositions, normalization,
schema refinement in database design
Other kinds of dependencies: 4NF, 5NF, DKNF
Case studies
The Evils of Redundancy
9

 Redundancy is at the root of several problems associated with


relational schemas:
 redundant storage, insert/delete/update anomalies
 Integrity constraints, in particular functional dependencies, can be
used to identify schemas with such problems and to suggest
refinements.
 Main refinement technique: decomposition (replacing ABCD with,
say, AB and BCD, or ACD and ABD).
 Decomposition should be used judiciously:
 Is there reason to decompose a relation?
 What problems (if any) does the decomposition cause?
INTRODUCTION TO SCHEMA REFINEMENT
10

Problems Caused by Redundancy


Storing the same information redundantly, that is, in more than one
place within a database, can lead to several problems:
 Redundant storage: Some information is stored repeatedly.

 Update anomalies: If one copy of such repeated data is

updated, an inconsistency is created unless all copies are


similarly updated.
 Insertion anomalies: It may not be possible to store some

information unless some other information is stored as well.


 Deletion anomalies: It may not be possible to delete some

information without losing some other information as well.


INTRODUCTION TO SCHEMA REFINEMENT
11

Problems Caused by Redundancy (cont.)


 Consider a relation obtained by translating a variant of the
Hourly_Emps entity set
Ex: Hourly_Emps(ssn, name, lot, rating, hourly wages, hours worked)
 The key for Hourly_Emps is ssn.

 In addition, suppose that the hourly wages attribute is

determined by the rating attribute. That is, for a given rating


value, there is only one permissible hourly wages value.
 This IC is an example of a functional dependency.

 It leads to possible redundancy in the relation Hourly_Emps


Use of Decomposition
12

 Intuitively, redundancy arises when a relational schema forces


an association between attributes that is not natural.
 Functional dependencies (ICs) can be used to identify such
situations and to suggest revetments to the schema.
 The essential idea is that many problems arising from
redundancy can be addressed by replacing a relation with a
collection of smaller relations.
 Each of the smaller relations contains a subset of the attributes
of the original relation.
 We refer to this process as decomposition of the larger relation
into the smaller relations
Use of Decomposition (cont.)
13

 We can deal with the redundancy in Hourly_Emps by


decomposing it into two relations:
 Hourly_Emps2(ssn, name, lot, rating, hours worked)
 Wages(rating, hourly wages)

rating hourly wages


8 10

5 7
Use of Decomposition (cont.)
14

ssn name lot rating hours worked


123-22-3666 Attishoo 48 8 40

231-31-5368 Smiley 22 8 30

131-24-3650 Smethurst 35 5 30

434-26-3751 Guldu 35 5 32

612-67-4134 Madayan 35 8 40
Problems related to Decomposition
15

 Unless we are careful, decomposing a relation schema can


create more problems than it solves.
 Two important questions must be asked repeatedly:
1. Do we need to decompose a relation?
2. What problems (if any) does a given decomposition cause?
 To help with the rst question, several normal forms have been
proposed for relations.
 If a relation schema is in one of these normal forms, we know
that certain kinds of problems cannot arise.
FUNCTIONAL DEPENDENCIES (FDs)
16

 A Functional Dependency (FD) X Y (read as X determines


Y) (X ⊆ R, Y ⊆ R) holds over relation R if, for every allowable
instance r of R:
 t1 ∈r, t2 ∈r, πX(t1) = πX(t2) implies πY(t1) = πY(t2)

 i.e., given two tuples in r, if the X values agree, then the Y values
must also agree. (X and Y are sets of attributes.)
 An FD is a statement about all allowable relations.
 Must be identified based on semantics of application.

 Given some allowable instance r1 of R, we can check if it


violates some FD f, but we cannot tell if f holds over R!
 K is a candidate key for R means that K R
 However, K R does not require K to be minimal!
FUNCTIONAL DEPENDENCIES (FDs) - Examples
17

Consider the schema:


Student ( studName, rollNo, sex, dept, hostelName, roomNo)

Since rollNois a key, rollNo → {studName, sex, dept, hostelName,


roomNo}
Suppose that each student is given a hostel room exclusively, then
hostelName, roomNo → rollNo
Suppose boys and girls are accommodated in separate hostels,
then hostelName → sex
FDs are additional constraints that can be specified by designers
Trivial / Non - Trivial FDs
18

 An FD X →Y where Y ⊆ X
-called a trivial FD, it always holds good

 An FD X →Y where Y ⊈ X
-non-trivial FD

 An FD X →Y where X ∩Y = Ø
-completely non-trivial FD
FUNCTIONAL DEPENDENCIES (FDs) cont.
19

Example: Constraints on Entity Set


 Consider relation obtained from Hourly_Emps:
 Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked)

 Notation: We will denote this relation schema by listing the


attributes: SNLRWH
 This is really the set of attributes {S, N, L, R, W, H}.

 Sometimes, we will refer to all attributes of a relation by using


the relation name. (e.g., Hourly_Emps for SNLRWH)
 Some FDs on Hourly_Emps:
 ssn is the key: S SNLRWH
 rating determines hrly_wages: R W
Wages R W
Example (Contd.) 8 10
Hourly_Emps2 5 7
20

Problems due to R → W :
S N L R H

123-22-3666 Attishoo 48 8 40
 Update anomaly: Can
we change W in just 231-31-5368 Smiley 22 8 30
the 1st tuple of 131-24-3650 Smethurst 35 5 30
SNLRWH? 434-26-3751 Guldu 35 5 32
 Insertion anomaly: What
612-67-4134 Madayan 35 8 40
if we want to insert an
employee and don’t know S N L R W H
the hourly wage for his 123-22-3666 Attishoo 48 8 10 40
rating?
231-31-5368 Smiley 22 8 10 30
 Deletion anomaly: If we
delete all employees with 131-24-3650 Smethurst 35 5 7 30
rating 5, we lose the 434-26-3751 Guldu 35 5 7 32
information about the 612-67-4134 Madayan 35 8 10 40
wage for rating 5!
Constraints on a Relationship Set
21

 Suppose that we have entity sets Parts, Suppliers, and


Departments, as well as a relationship set Contracts that involves
all of them. We refer to the schema for Contracts as CQPSD. A
contract with contract id
 C species that a supplier S will supply some quantity Q of a part
P to a department D.
 We might have a policy that a department purchases at most
one part from any given supplier.
 Thus, if there are several contracts between the same supplier
and department,
 we know that the same part must be involved in all of them. This
constraint is an FD, DS ! P.
Reasoning about Functional Dependencies (FDs)
22

 Given some FDs, we can usually infer additional FDs:


 ssn did, did lot implies ssn lot
 An FD f is implied by a set of FDs F if f holds whenever all FDs
in F hold.
+
 F = closure of F is the set of all FDs that are implied by F.

 Armstrong’s Axioms (X, Y, Z are sets of attributes):


 Reflexivity: If X ⊆ Y, then Y X
 Augmentation: If X Y, then XZ YZ for any Z
 Transitivity: If X Y and Y Z, then X Z
 These are sound and complete inference rules for FDs!
Reasoning About FDs (Contd.)
23

 Couple of additional rules (that follow from AA):


 Union: If X → Y and X → Z, then X → YZ
 Decomposition: If X → YZ, then X → Y and X → Z

 Example: Contracts(cid, sid, jid, did, pid, qty, value), and:


 C is the key: C → CSJDPQV
 Project purchases each part using single contract:
 JP → C
 Dept purchases at most one part from a supplier: S
D → P

 JP → C, C → CSJDPQV imply JP → CSJDPQV


 SD → P implies SDJ → JP
 SDJ → JP, JP → CSJDPQV imply SDJ → CSJDPQV
Reasoning About FDs (Contd.)
24

 Computing the closure of a set of FDs can be expensive. (Size


of closure is exponential in # attrs!)
 Typically, we just want to check if a given FD X → Y is in the
closure of a set of FDs F. An efficient check:
 Compute attribute closure of X (denoted X + ) wrt F:
 Set of all attributes A such that X → A is in F +
 There is a linear time algorithm to compute this.

 Check if Y is in X +
 Does F = {A → B, B → C, C D →E } imply A → E?
 i.e, is A → E in the closure F + ? Equivalently, is E in A+ ?
Closure of a Set of FDs
25

 The set of all FDs implied by a given set F of FDs is called the
closure of F and is denoted as F+.

 An important question is how we can infer, or compute, the


closure of a given set F of FDs.

 The following three rules, called Armstrong's Axioms, can be


applied repeatedly to infer all FDs implied by a set F of FDs.

 We use X, Y, and Z to denote sets of attributes over a relation


schema R:
Closure of a Set of FDs (or Armstrong’s Inference Rules)
26

 Reflexive Rule:
F ⊨{X →Y | Y ⊆ X} for any X. Trivial FDs
 Augmentation Rule:
{X →Y} ⊨ {XZ →YZ}, Z ⊆ R. Here XZ denotes X ⋃ ⋃Z
 Transitive Rule:
{X →Y, Y →Z} ⊨ {X →Z}
 Armstrong's Axioms are sound in that they generate only FDs in F+
when applied to a set F of FDs.
 They are complete in that repeated application of these rules will
generate all FDs in the closure F+.
Closure of a Set of FDs (or Armstrong’s Inference Rules)
27

 It is convenient to use some additional rules while reasoning about


F+:
 Union or Additive Rule:
{X →Y, X →Z} ⊨ {X →YZ}
 Decomposition or Projective Rule:
{X →YZ} ⊨ {X →Y, X →Z}
 Pseudo Transitive Rule:
{X →Y, WY →Z} ⊨ {WX →Z}
Attribute Closure
28

 If we just want to check whether a given dependency, say, X → Y, is


in the closure of a set F of FDs,
 we can do so eciently without computing F+. We rst compute the
attribute closure X+ with respect to F,
 which is the set of attributes A such that X → A can be inferred
using the Armstrong Axioms.
 The algorithm for computing the attribute closure of a set X of
attributes is
 closure = X;
repeat until there is no change: {
if there is an FD U → V in F such that U subset of closure,
then set closure = closure union of V
}
Database Normalization
29

 The main goal of Database Normalization is to restructure the


logical data model of a database to:
 Eliminate redundancy

 Organize data efficiently

 Reduce the potential for data anomalies.


Database Normalization definitions
30

 How to take a raw collection of data and break it up into


more logical units or tables, in order to reduce the occurrence
of redundant data in the database. This process of reducing
data redundancy is referred to as Normalization.
 Normalization is a body of rules addressing analysis and
conversion of data structures into relations that exhibit more
desirable properties of internal consistency, minimal
redundancy and maximum stability.
Database Normalization definitions
31

 Normalization is the process by which attributes are grouped


together to form a well-structured relation.
 We focused on the characteristics of a good relation:

 Analyzing sample relations


 Identifying design flaws
 And learning how to eliminate them
This is called Normalizing a relation
 Normalization is a process of decomposing relations to
produce smaller, well-structured relations.
 Normalization is a tool to validate and improve a logical
design, so that it satisfies certain constraints that avoid
unnecessary duplication of data.

You might also like