0% found this document useful (0 votes)
1K views23 pages

Unit-Iv: I. Pitfalls in Relational Database Design

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views23 pages

Unit-Iv: I. Pitfalls in Relational Database Design

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

UNIT-IV

I. PITFALLS IN RELATIONAL DATABASE DESIGN

Pitfalls in Relational Database Design

 Relational database design requires that we find a


“good” collection of relation schemas. A bad design
may lead to
 Repetition of Information.
 Inability to represent certain information.
 Design Goals:
 Avoid redundant data
 Ensure that relationships among attributes are
represented
 Facilitate the checking of updates for violation of
database integrity constraints.

Database System Concepts 7.5 ©Silberschatz, Korth and Sudarshan

Example
 Consider the relation schema:
Lending-schema = (branch-name, branch-city, assets,
customer-name, loan-number, amount)

 Redundancy:
 Data for branch-name, branch-city, assets are repeated for each loan that a
branch makes
 Wastes space
 Complicates updating, introducing possibility of inconsistency of assets value
 Null values
 Cannot store information about a branch if no loans exist
 Can use null values, but they are difficult to handle.
Database System Concepts 7.6 ©Silberschatz, Korth and Sudarshan

II. ANOMALIES
REDUNDANT INFORMATION IN TUPLES AND UPDATE ANOMALIES

 One of the schema designs is to minimize the storage space used by the base relations.
 Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs) should
not be mixed in the same relation.
o Mixing attributes of multiple entities may cause problems
 Only foreign keys should be used to refer to other entities.
 Entity and relationship attributes should be kept apart as much as possible.
 Redundant information leads to wastage of memory and inconsistency.
 Problems with update anomalies
o Insertion anomalies
o Deletion anomalies
o Modification anomalies

Consider the relation in Figure 1. 35.

EMP_PROJ(Ssn, Pnumber, Ename, Pname, Hours)

 Update Anomaly:
Changing the name of project number 20 from “Reorganization” to “Customer-
Accounting” may cause this update to be made for all 100 employees working on
project 20.
 Insert Anomaly:

Cannot insert a project unless an employee is assigned to it.

Conversely, Cannot insert an employee unless a he/she is assigned to a project.


 Deletion Anomaly:

When a project is deleted, it will result in deleting all the employees who work on that
project. Alternately, if an employee is the sole employee on a project, deleting that
employee would result in deleting the corresponding project.

GOAL

Design a schema that does not suffer from insertion, deletion and update anomalies.

Design Goals:
1. Avoid redundant data
2. Ensure that relationships among attributes are represented
3. Facilitate the checking of updates for violation of database integrity constraints.

III. FUNCTIONAL DEPENDENCIES


Functional dependencies (FDs) are used to specify formal measures of the "goodness"
of relational database designs.FDs and keys are used to define normal forms for relations.
 The whole database is described by a single universal relation schema R = { A1,
A2, ..., An }
 A functional dependency, denoted by X → Y, between two sets of attributes X and Y
that are subsets of R specifies a constraint on the possible tuples that can form a
relation state r of R.
 The values of the Y component of a tuple in r depend on, or are determined by, the
values of the X component.
 The values of the X component of a tuple uniquely (or functionally) determine the
values of the Y component.
 The constraint on R states that there cannot be more than one tuple with a given X-
value in any relation instance r(R) . (i.e) X is a candidate key of R—this implies that
X → Y for any subset of attributes Y of R.
 The constraint is that, for any two tuples t1 and t2 in r that have t1[X] = t2[X], they
must also have t1[Y] = t2[Y].
 If X→Y in R, this does not say whether or not Y→X in R.
 Relation extensions r(R) that satisfy the functional dependency constraints are called
legal relation states (or legal extensions) of R.
Examples of FD

Consider the relation schema EMP_PROJ in Figure 1.35, from the semantics of the
attributes and the relation, the following functional dependencies should hold:
 FD1. SSN -> ENAME
o Social security number determines employee name
 FD2. PNUMBER -> {PNAME, PLOCATION}
o Project number determines project name and location
 FD3. {SSN, PNUMBER} -> HOURS
o Employee ssn and project number determines the hours per week that the
employee works on the project.

If K is a key of R, then K functionally determines all attributes in R.

Fig. 1.35. EMP_DEPT and EMP_PROJ Relations

Inference Rules for FDs

 IR1. (Reflexive) If X is a set of attributes and Y is a subset-of X, then X -> Y


 IR2. (Augmentation) If X -> Y, then XZ -> YZ
(Notation: XZ stands for X U Z)

 IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z

IR1, IR2, IR3 form a sound and complete set of inference rules

The complete set of all possible dependencies can be deduced from IR1, IR2, and IR3
(completeness property)

Some additional inference rules that are useful:

 (Decomposition) If X -> YZ, then X -> Y and X -> Z


 (Union) If X -> Y and X -> Z, then X -> YZ
 (Psuedotransitivity) If X -> Y and WY -> Z, then WX -> Z

IV. NORMALIZATION

The normalization process, as first proposed by Codd (1972a), takes a relation schema through
a series of tests to certify whether it satisfies a certain normal form.

Definition: The normal form of a relation refers to a defined standard structure for relational
databases in which a relation may not be nested within another relation.

Normalization of data:

 Definition: The normalization is a process of analyzing the given relation schemas


based on their FDs and primary keys to achieve the desirable properties of (1)
minimizing redundancy and (2) minimizing the insertion, deletion, and update
anomalies.
 Unsatisfactory relation schemas that do not meet certain conditions—the normal
form tests are decomposed into smaller relation schemas that meet the tests and
hence possess the desirable properties.
 Normalization must confirm the existence of additional properties:
 The non additive join or lossless join property, which guarantees that the
spurious tuple generation problem does not occur with respect to the relation
schemas created after decomposition.
 The dependency preservation property, which ensures that each functional
dependency is represented in some individual relation resulting after
decomposition.
 This process decomposes unsatisfactory "bad" relations by breaking up their attributes
into smaller relations.
 Normalization is carried out in practice so that the resulting designs are of high
quality and meet the desirable properties.

LIST OF NORMAL FORMS


(i) 1NF based on atomic values.
(ii) 2NF based on keys and FDs of a relation schema.
(iii) 3NF, BCNF based on keys and FDs of a relation schema.
(iv)4NF based on keys, multi-valued dependencies : MVDs
(v) 5NF based on keys, join dependencies : JDs

The database designers need not normalize to the highest possible normal form. (Usually
up to 3NF, BCNF or 4NF)

V. NEED FOR DECOMPOSITION


WHAT IS DECOMPOSITION?
 Decomposition is the process of breaking down the table into multiple tables in a
database..
 It replaces a relation with a collection of smaller relations.
 It should always be lossless, because it confirms that the information in the original
relation can be accurately reconstructed based on the decomposed relations.
 If there is no proper decomposition of the relation, then it may lead to problems like
loss of information.

VI. DESIRABLE PROPERTIES OF DECOMPOSITION


Following are the properties of Decomposition,
1. Lossless Decomposition
2. Dependency Preservation
3. Lack of Data Redundancy

1. LOSSLESS DECOMPOSITION
 Decomposition must be lossless. It means that the information should not get lost
from the relation that is decomposed.
 It gives a guarantee that the join will result in the same relation as it was decomposed.

Example:
Let's take 'E' is the Relational Schema, With instance 'e'; is decomposed into: E1, E2, E3, . . .
. En; With instance: e1, e2, e3, . . . . en, If e1 ⋈ e2 ⋈ e3 . . . . ⋈ en, then it is called
as 'Lossless Join Decomposition'.

 In the above example, it means that, if natural joins of all the decomposition give the
original relation, then it is said to be lossless join decomposition.

Example: <Employee_Department> Table


Eid Ename Age City Salary Deptid DeptName
E001 ABC 29 Pune 20000 D001 Finance
E002 PQR 30 Pune 30000 D002 Production
E003 LMN 25 Mumbai 5000 D003 Sales
E004 XYZ 24 Mumbai 4000 D004 Marketing
E005 STU 32 Bangalore 25000 D005 Human Resource

 Decompose the above relation into two relations to check whether a decomposition is
lossless or lossy.
 Now, we have decomposed the relation that is Employee and Department.

Relation 1 : <Employee> Table

Eid Ename Age City Salary


E001 ABC 29 Pune 20000
E002 PQR 30 Pune 30000
E003 LMN 25 Mumbai 5000
E004 XYZ 24 Mumbai 4000
E005 STU 32 Bangalore 25000

 Employee Schema contains (Eid, Ename, Age, City, Salary).

Relation 2 : <Department> Table

Deptid Eid DeptName


D001 E001 Finance
D002 E002 Production
D003 E003 Sales
D004 E004 Marketing
D005 E005 Human Resource

 Department Schema contains (Deptid, Eid, DeptName).


 So, the above decomposition is a Lossless Join Decomposition, because the two
relations contains one common field that is 'Eid' and therefore join is possible.
 Now apply natural join on the decomposed relations.
Employee ⋈ Department

Eid Ename Age City Salary Deptid DeptName


E001 ABC 29 Pune 20000 D001 Finance
E002 PQR 30 Pune 30000 D002 Production
E003 LMN 25 Mumbai 5000 D003 Sales
E004 XYZ 24 Mumbai 4000 D004 Marketing
E005 STU 32 Bangalore 25000 D005 Human Resource

Hence, the decomposition is Lossless Join Decomposition.

 If the <Employee> table contains (Eid, Ename, Age, City, Salary) and <Department>
table contains (Deptid and DeptName), then it is not possible to join the two tables or
relations, because there is no common column between them. And it becomes Lossy Join
Decomposition.
2. DEPENDENCY PRESERVATION
 Dependency is an important constraint on the database.
 Every dependency must be satisfied by at least one decomposed table.
 If we decompose a relation R into relations R1 and R2, All dependencies of R either
must be a part of R1 or R2 or must be derivable from combination of FD’s of R1 and R2.

For Example,
A relation R (A, B, C, D) with FD set{A->BC} is decomposed into
R1(ABC) and R2(AD) which is dependency preserving.
because FD A->BC is a part of R1(ABC).

3. LACK OF DATA REDUNDANCY


 Lack of Data Redundancy is also known as a Repetition of Information.
 The proper decomposition should not suffer from any data redundancy.
 The careless decomposition may cause a problem with the data.
 The lack of data redundancy property may be achieved by Normalization process.

VII. DEFINITIONS OF KEYS AND ATTRIBUTES PARTICIPATING IN


KEYS
 A superkey of a relation schema R = {A1, A2, ...., An} is a set of attributes S subset-
of R with the property that no two tuples t1 and t2 in any legal relation state r of R
will have t1[S] = t2[S]
 A key K is a superkey with the additional property that removal of any attribute
from K will cause K not to be a superkey any more.
 If a relation schema has more than one key, each is called a candidate key. One of
the candidate keys is arbitrarily designated to be the primary key, and the others
are called secondary keys or alternate keys.
 A Prime attribute is an attribute that is a part of a candidate key.
 A Nonprime attribute is not a prime attribute—that is, it is not a member of any
candidate key. It is a non-key column.

VIII. FIRST NORMAL FORM (1NF)


 It states that the domain of an attribute must include only atomic (simple, indivisible)
values and that the value of any attribute in a tuple must be a single value from the
domain of that attribute.
 It disallows having a set of values or a combination of both as an attribute value for a
single tuple.
 The relation schema ‘DEPARTMENT’ where “Dlocation” attribute assigns multiple
values, is not in 1NF is shown in Fig. 1.36.

Fig. 1.36. A relation schema that is not in 1NF

Fig. 1.37. Relation ‘Department’ not in 1NF. Relation ‘Department’ in 1NF.

There are three main techniques to achieve first normal form:\

First technique:
1. Remove the attribute Dlocations and place it in a separate relation DEPT_LOCATIONS,
along with the primary key Dnumber.
2. The primary key of this relation is the combination {Dnumber, Dlocation}.
3. A distinct tuple in DEPT_LOCATIONS exists for each location of a department.
4. This decomposes the non-1NF relation into two 1NF relations.
Second Technique:
1. Expand the key so that there will be a separate tuple, in the original DEPARTMENT
relation for each location of a DEPARTMENT and is depicted in Fig. 1.37.
2. The primary key becomes the combination {Dnumber, Dlocation}.
3. Disadvantage: introducing redundancy in the relation.

Third technique:
1. If a maximum number of values is known for the attribute—for example, if it is known that
at most three locations can exist for a department—replace the Dlocations attribute by three
atomic attributes: Dlocation1, Dlocation2, and Dlocation3.
2. Disadvantage: Introducing NULL values if most departments have fewer than three
locations.

The first solution is considered best because it does not suffer from redundancy and it
is completely general, having no limit placed on a maximum number of values.

IX. SECOND NORMAL FORM (2NF)


 Uses the concepts of FDs, primary key
 Based on Full functional dependency.
 Full functional dependency - It is a FD Y -> Z, where removal of any attribute from
Y means the FD does not hold any more
o Example:
{SSN, PNUMBER} -> HOURS is a full FD
since neither SSN -> HOURS nor PNUMBER -> HOURS hold
 Partial functional dependency - It is a FD Y -> Z ,where removal of any attribute
from Y means that the dependency still holds.
o Example:

{SSN, PNUMBER} -> ENAME is d a partial dependency.


since SSN -> ENAME also holds

Definition of 2NF:

 A relation schema R is in Second Normal Form (2NF) if every non-prime attribute


A in R is fully functionally dependent on the primary key
 R can be decomposed into 2NF relations via the process of 2NF normalization.
Example – I . (Ref Fig 1.35)

With EMP_PROJ (Ssn, Pnumber, Ename, Pname, Hours)

FDs are

FD1 {Ssn, Pnumber} → Hours It is Full FD

FD2 Ssn → Ename, It is partial FD

FD3 Pnumber → Pname, Plocation It is Partial FD

primary key is {Ssn, Pnumber}

 The non-prime attrubutes Ename, Pname, Hours are not fully functional dependent on
the primary key
 Each Ssn and Pnumber can appear multiple times in Emp_Proj, but this is OK as the
combination is the primary key.
 The redundancy of Ename, Pname and Plocation is, however, avoidable by breaking
down or decomposing the original relation (EMP_PROJ ).
 It is represented in the Fig 1.38 as follows.

EMP_PROJ

Ssn Pnumber Hours Ename Pname Plocation


FD1

FD2

FD3

R1 R2 R3

Ssn Pnumber Hours Ssn Ename Pnumber Pname Plocation

Fig. 1.38. EMP_PROJ in 2NF

Reasons are:

(i) {Ssn, Pnumber} → Hours is a full FD


since neither SSN → HOURS nor PNUMBER → HOURS hold

(ii) SSN → ENAME is not a full FD (it is called a partial dependency).


(iii) Pnumber → Pname is not a full FD (it is called a partial dependency).

Example - II

Second normal form says, that every non-prime attribute should be fully functionally
dependent on prime key attribute. That is, if X → A holds, then there should not be any
proper subset Y of X, for that Y → A also holds.

Fig. 1.39. Relation not in 2NF

We see here in Student_Project relation (Ref Fig 1.39) that the prime key attributes
are Stu_ID and Proj_ID. According to the rule, non-key attributes, i.e. Stu_Name and
Proj_Name must be dependent upon both and not on any of the prime key attribute
individually. But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be
identified by Proj_ID independently. This is called partial dependency, which is not allowed
in Second Normal Form. So the above relation is decomposed as follows.

Fig. 1.40. Relation in 2NF

We broke the relation as in Fig 1.40 into two relations to bring it into 2NF.So there
exists no partial dependency and the relation is in 2NF.

X. THIRD NORMAL FORM (3NF)


 A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime
attribute A in R is transitively dependent on the primary key
 Transitive functional dependency - a FD X -> Z that can be derived from two FDs X
-> Y and Y -> Z

Definition of 3NF:

A relation schema R is in Third Normal Form (2NF) if every non-prime attribute ‘A’ in
R is non-transitively dependent on every super key of R.
Example - I:

EMP (SSN , ENAME, DOB,ADDRESS, DNUMBER, DMGRSSN, DLOC )

FDs are

FD1 SSN → {ENAME,DOB,ADDRESS,DNUMBER}


FD2 DNUMBER → { DMGRSSN,DLOC)

Where
SSN → DNUMBER → DMGRSSN
is a transitive dependency and can be decomposed into

R1 ( ENAME,DOB,ADDRESS,DNUMBER)
R2 (DNUMBER,DMGRSSN,DLOC)

SSN ENAME DOB AD DRESS DNUMBER DMGRSSN DLOC

FD1

FD2

R1 R2
DNUMBER DMGRSSN DLOC
SSN ENAME DOB ADDRESS

Fig. 1.41. Relation in 3NF

Example II :

Fig. 1.42. Relation not in 3NF

The above Fig 1.42 depicts Student_detail relation, Stu_ID is key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a
superkey nor City is a prime attribute. Additionally, Stu_ID → Zip → City, so there
exists transitive dependency.
Fig. 1.43. Relation in 3NF

We broke the relation as in Fig 1.43 into two relations to bring it into 3NF.

XI. BOYCE CODD NORMAL FORM (BCNF)

Definition: A relation schema R is in BCNF if whenever a nontrivial functional


dependency X→A holds in R, then X is a superkey of R. A table is in Boyce-Codd
normal form (BCNF) if and only if it is in 3NF and every determinant is a candidate
key.
Trivial Functional Dependency
 Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it
is called a trivial FD. Trivial FDs always hold.
 Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-
trivial FD.
 Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is said to be
a completely non-trivial FD.

Example:

Consider a relation TEACH ( Ref Fig 1.44) with the following dependencies:

FD1: {Student, Course} → Instructor


FD2: Instructor → Course

 {Student, Course} is a candidate key for this relation.


 Every determinant should be a candidate key. But in FD2 { Instructor } is not a candidate
key. Hence this relation is in 3NF but not BCNF.

 Decomposition of this relation schema into two schemas is not straightforward because it
may be decomposed into one of the three following possible pairs:
1. {Student, Instructor} and {Student, Course}
2. {Course, Instructor} and {Course, Student}
3. {Instructor, Course} and {Instructor, Student}
 All three decompositions lose the functional dependency FD1.
 Out of the above three, only the 3rd decomposition will not generate spurious tuples after a
join. (hence has the non-additivity property).
 A relation not in BCNF should be decomposed so as to meet this property. Non additive
decomposition is a must during normalization.
 The BCNF relation of fig 1.44 is shown in Fig 1.44(a).

Fig . 1.44. A relation TEACH that is in 3NF, but not in BCNF

Ins_Course

Instructor Course

Ins_Student

Instructor Student

Fig . 1.44(a) . The relations Ins_Course, Ins_Student in BCNF

XII. FOURTH NORMAL FORM (4NF)


 A relation schema R is in 4NF with respect to a set of dependencies F if, for every
nontrivial multivalued dependency X →→ Y in F+, X is a super key for R.

 Formal Definition Of Multivalued Dependencies(Mvd):


The MVD x →→ Y is said to hold for R(X,Y,Z) if, whenever t1 and t2 are two rows in R
that have the same values for attributes X and therefore t1[x]=t2[x] then R also contains
t3 and t4,such that

t3 [X] = t4 [X] = t1 [X] = t2 [X]


t3 [Y] = t1 [Y] and t4[Y] = t2 [Y]
t3 [Z] = t2 [Z] and t4 [Z] = t1[Z]
 Consider the EMP relation in Fig 1.45. EMP is not in 4NF because in the nontrivial
MVDs Ename→→ Pname and Ename →→ Dname, and Ename is not a superkey of
EMP.

Fig 1.45. The EMP relation with two MVDs: Ename →→ Pname and Ename →→ Dname

 Decompose EMP into EMP_PROJECTS and EMP_DEPENDENTS, shown in Fig 1.46.


 Both EMP_PROJECTS and EMP_DEPENDENTS are in 4NF, because the MVDs
Ename →→ Pname in EMP_PROJECTS and Ename →→ Dname in
EMP_DEPENDENTS are trivial MVDs.
 No other nontrivial MVDs hold in either EMP_PROJECTS or EMP_DEPENDENTS. No
FDs hold in these relation schemas either.

Fig1.46. Decomposing the EMP relation into two 4NF relations EMP_PROJECTS and
EMP_DEPENDENTS

XIII. EQUIVALENCE OF SETS OF FDS


 Two sets of FDs F and G are equivalent if:
 every FD in F can be inferred from G, and
 every FD in G can be inferred from F
 Hence, F and G are equivalent if F + =G +

Definition: F covers G if every FD in G can be inferred from F (i.e., if G + subset-of F +)


F and G are equivalent if F covers G and G covers F.
There is an algorithm for checking equivalence of sets of FDs.

Minimal Sets of FDs


 A set of FDs is minimal if it satisfies the following conditions:
 Every dependency in F has a single attribute for its RHS.
 We cannot remove any dependency from F and have a set of dependencies that is
equivalent to F.
 We cannot replace any dependency X -> A in F with a dependency Y -> A, where Y
proper-subset-of X ( Y subset-of X) and still have a set of dependencies that is
equivalent to F.
 Every set of FDs has an equivalent minimal set
 There can be several equivalent minimal sets
 There is no simple algorithm for computing a minimal set of FDs that is equivalent to
a set F of FDs
 Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F
 Closure of a set of attributes X with respect to F is the set X + of all attributes that are
functionally determined by X
 X + can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F.
Closure of a Set of Functional
Dependencies
 Given a set F set of functional dependencies, there are certain
other functional dependencies that are logically implied by F.
 E.g. If A  B and B  C, then we can infer that A  C
 The set of all functional dependencies logically implied by F is the
closure of F.
 We denote the closure of F by F+.
 We can find all of F+ by applying Armstrong’s Axioms:
 if   , then    (reflexivity)
 if   , then      (augmentation)
 if   , and   , then    (transitivity)
 These rules are
 sound (generate only functional dependencies that actually hold) and
 complete (generate all functional dependencies that hold).

Database System Concepts 7.15 ©Silberschatz, Korth and Sudarshan

Example
 R = (A, B, C, G, H, I)
F={ AB
AC
CG  H
CG  I
B  H}
 some members of F+
 AH
 by transitivity from A  B and B  H
 AG  I
 by augmenting A  C with G, to get AG  CG
and then transitivity with CG  I
 CG  HI
 from CG  H and CG  I : “union rule” can be inferred from
– definition of functional dependencies, or
– Augmentation of CG  I to infer CG  CGI, augmentation of
CG  H to infer CGI  HI, and then transitivity

Database System Concepts 7.16 ©Silberschatz, Korth and Sudarshan


Procedure for Computing F+

 To compute the closure of a set of functional dependencies F:

F+ = F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F+
for each pair of functional dependencies f1and f2 in F+
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F+
+
until F does not change any further

NOTE: We will see an alternative procedure for this task later

Database System Concepts 7.17 ©Silberschatz, Korth and Sudarshan


Closure of Functional Dependencies
(Cont.)

 We can further simplify manual computation of F+ by using


the following additional rules.
 If    holds and    holds, then     holds (union)
 If     holds, then    holds and    holds
(decomposition)
 If    holds and     holds, then     holds
(pseudotransitivity)
The above rules can be inferred from Armstrong’s axioms.

Database System Concepts 7.18 ©Silberschatz, Korth and Sudarshan

Closure of Attribute Sets

 Given a set of attributes a , define the closure of a under F


(denoted by a+) as the set of attributes that are functionally
determined by a under F:
a   is in F+    a+
 Algorithm to compute a+, the closure of a under F
result := a;
while (changes to result) do
for each    in F do
begin
if   result then result := result  
end

Database System Concepts 7.19 ©Silberschatz, Korth and Sudarshan


Example of Attribute Set Closure
 R = (A, B, C, G, H, I)
 F = {A  B
AC
CG  H
CG  I
B  H}
 (AG)+
1. result = AG
2. result = ABCG (A  C and A  B)
3. result = ABCGH (CG  H and CG  AGBC)
4. result = ABCGHI (CG  I and CG  AGBCH)
 Is AG a candidate key?
1. Is AG a super key?
1. Does AG  R? == Is (AG)+  R
2. Is any subset of AG a superkey?
1. Does A  R? == Is (A)+  R
2. Does G  R? == Is (G)+  R

Database System Concepts 7.20 ©Silberschatz, Korth and Sudarshan

Uses of Attribute Closure


There are several uses of the attribute closure algorithm:
 Testing for superkey:
 To test if  is a superkey, we compute +, and check if + contains
all attributes of R.
 Testing functional dependencies
 To check if a functional dependency    holds (or, in other words,
is in F+), just check if   +.
 That is, we compute + by using attribute closure, and then check if
it contains .
 Is a simple and cheap test, and very useful
 Computing closure of F
 For each   R, we find the closure +, and for each S  +, we
output a functional dependency   S.

Database System Concepts 7.21 ©Silberschatz, Korth and Sudarshan

XIV. DENORMALIZATION

Denormalization is the process of storing the join of higher normal form relations as a
base relation, which is in a lower normal form. It is the process of attempting to optimize the
read performance of a database by adding redundant data or by grouping data. It is also a
means of addressing performance or scalability in relational database software.

Denormalization is a strategy used on a previously-normalized database to increase


performance. The idea behind it is to add redundant data where we think it will help us the
most. We can use extra attributes in an existing table, add new tables, or even create instances
of existing tables. The usual goal is to decrease the running time of select queries by making
data more accessible to the queries or by generating summarized reports in separate tables.
This process can bring some new problems, and we’ll discuss them later.

A normalized database is the starting point for the denormalization process. It’s important to
differentiate from the database that has not been normalized and the database that was
normalized first and then denormalized later. The second one is okay; the first is often the
result of bad database design or a lack of knowledge.

Methods of De-normalization
There are few of denormalization method discussed below.
1. Adding Redundant columns
2. Adding derived columns
3. Collapsing the tables
4. Snapshots
5. VARRAYS
6. Materialized Views
Adding Redundant columns
The redundant column which is frequently used in the joins is added to the main table. The
other table is retained as it is.
For example, consider EMPLOYEE and DEPT tables.  Suppose we have to generate a report
where we have to show employee details and his department name. Here we need to have
join EMPLOYEE with DEPT to get department name.
Adding derived columns
Suppose we have STUDENT table with student details like his ID, name, address and course.
Another table MARKS with his internal marks in different subjects. There is a need to
generate a report for individual student in which we need to have his details, total marks and
grade. In this case, we have to query STUDENT table, then join the MARKS table to
calculate the total of marks in different subjects. Based on the total, we have to decide the
grade too in the select query. Then it has to be printed on the report.
Collapsing the tables
 In this method, frequently used tables are combined into one table to reduce the joins among
the table. Thus it increases the performance of the retrieval query. By joining the redundant
column into one table may cause the redundancy in the table. But it is ignored as far as it does
not affect the meaning of other records in the table.
 For example, after denormalization of STUDENT and ADDRESS, it should have all the
students with correct address. It should not lead to wrong address of students.
In addition to collapsing the tables, we can duplicate or even split the table, if they increase
the performance of the query. But duplicating and splitting are not methods of
denormalization.
Snapshots
This is one of the earliest methods of creating data redundancy. In this method, the database
tables are duplicated and stored in various database servers. They are refreshed at specific
time periods to maintain the consistency among the database server tables. By using this
method, users are located at different places were able to access the servers which are nearer
to them, and hence retrieving the data quickly. They need not access the tables located at
remote servers in this case. This helps in faster access.
VARRAYS
In this method tables are created as VARRAY tables, where repeating groups of columns are
stored in single table. This VARRAY method over-rules the condition of 1NF. According to
1NF, each column value should be atomic. But this method allows same data to be stored in
different columns for each record.
Consider the example of STUDENT and MARKS. Say MARKS table has marks of 3
subjects for each student.
Materialized Views
Materialized views are similar to tables where all the columns and derived values are pre-
calculated and kept. Hence if there is any query with same query used in the materialized
view, then the query will be replaced by this materialized view. Since this view has all the
columns as a result of join and pre-calculated value, there is no need to calculate the values
again. Hence it reduces the time consumed by the query.
The only problem with materialized view is it will not get refreshed like any other views
when there is change in table data. We have to explicitly refresh them to get the correct data
in the materialized view.
Advantages and Disadvantages of De-normalization
Advantages of De-normalization
 Minimizes the table joins
 It reduces the number of foreign keys and indexes. This helps in saving the memory
usage and less data manipulation time.
 If there is any aggregation columns are used to denormalize, then these computations
are carried out at the data manipulation time rather than at the retrieval time. i.e.;, if
we have used ‘total marks’ as the denormalized column, then the total is calculated
and updated when other related column entries – say student details and his marks are
inserted. Hence when we query STUDENT table for his details and marks, we need
not calculate his total. Hence it saves the retrieval time.
 It reduces number of tables in the database. As the number of table increases, the
mapping increases; joins increases; memory space increases and so on.
Disadvantages of De-normalization
 Although it supports faster retrieval, it slows down the data manipulation. If the
column is frequently updated, then it reduces the speed of updation.
 If there is any change in the requirement, then we need to analyze the data and tables
again to understand the performance. Hence denormalization is specific the
requirement or application that a user is using.
 Complexity of coding and number table depends on the requirement / application. It
can increase or decrease the tables. There can be chance that the code will get more
complex because of redundancy in the table. Hence it needs thorough analysis of
requirement, query, data etc.
Difference between Normalization and Denormalization:
 Normalization and denormalization are two processes that are completely opposite.
 Normalization is the process of dividing larger tables in to smaller ones reducing the
redundant data, while denormalization is the process of adding redundant data to
optimize performance.
 Normalization is carried out to prevent databases anomalies.
 Denormalization is usually carried out to improve the read performance of the
database, but due to the additional constraints used for denormalization, writes (i.e.
insert, update and delete operations) can become slower. Therefore, a denormalized
database can offer worse write performance than a normalized database.
 It is often recommended that you should “normalize until it hurts, denormalize until it
works.

You might also like