DBMS Module 4
DBMS Module 4
Module 4
NORMALIZATION
Each relation schema consists of a number of attributes, and the relational database schema consists of
a number of relation schemas. There are two levels at which we can discuss the "goodness" of relation
schemas:
a) Logical (or conceptual) level:
➢ It discusses how users interpret the relation schemas and the meaning of their attributes.
➢ Having good relation schemas at this level enables users to understand clearly the meaning of the
data in the relations, and hence to formulate their queries correctly.
➢ At this level we are interested in schemas of both base relations and views (virtual relations).
➢ The meaning of the EMPLOYEE relation schema is quite simple: Each tuple represents an
employee, with values for the employee's name (ENAMEl. social security number (SSN), birth
date (BDATE), and address (ADDRESS), and the number of the department that the employee
works for (DNUMBER). The DNUMBER attribute is a foreign key that represents an implicit
relationship between EMPLOYEE and DEPARTMENT.
➢ The semantics of the DEPARTMENT and PROJECT schemas are also straightforward: Each
DEPARTMENT tuple represents a department entity, and each PROJECT tuple represents a
project entity. The attribute DMGRSSN of DEPARTMENT relates a department to the
employee who is its manager, while DNUM of PROJECT relates a project to its controlling
department; both are foreign key attributes. The ease with which the meaning of a relation's
attributes can be explained is an informal measure of how well the relation is designed.
Each tuple in DEPT_LOCATIONS gives a department number (DNUMBER) and one of the
locations of the department (DLOCATION). Each tuple in WORKS_ON gives an employee social
security number (SSN), the project number of one of the projects that the employee works on
(PNUMBER), and the number of hours per week that the employee works on that project (HOURS).
Figure 4.2 : Example database state for the relational database schema of Figure 6.1.
Examples of violating Guideline 1:The relation schemas in Figures 4.3aand4.3balso have clear
semantics. A tuple in the EMP_DEPT relation schema of Figure 4.3a represents a single employee
but includes additional information-namely, the name (DNAME) of the department for which the
employee works and the social security number (DMGRSSN) of the department manager. For the
EMP_PROJ
relation of Figure 4.3b, each tuple relates an employee to a project but also includes the employee
name (ENAME), project name (PNAME), and project location (PLOCATION).
FIGURE 6.4 Example states for EMP_DEPT and EMP_PROJ resulting from applying
NATURAL JOIN to the relations in
Figure 4.2.
➢ The various update anomalies in non normalized database (DB) can be classified into
a) Insertion anomalies
b) Deletion anomalies
c) Modification anomalies.
a) Insertion Anomalies: Insertion anomalies can be differentiated into two types, illustrated
by the following examples based on the EMP_DEPT relation.
➢ To insert a new employee tuple into EMP_DEPT, we must include either all the
attribute values for the department that the employee works for, or nulls (if the
employee does not work for a department as yet).
➢ For example, to insert a new tuple for an employee who works in department number
5, we must enter the attribute values of department 5 correctly so that they are
consistent with values for department 5 in other tuples in EMP_DEPT.
➢ In the design of Figure 4.2, we do not have to worry about this consistency problem
because we enter only the department number in the employee tuple; all other
attribute values of department 5 are recorded only once in the database, as a single
tuple in the DEPARTMENT relation.
c) Modification Anomalies:
➢ In EMP_DEPT, if we change the value of one of the attributes of a particular
department-say, the manager of department 5-we must update the tuples of all
Prof. S Mamatha Jajur, Prof Soumya N G, Prof. Ancy Thomas Page 5
Database Management Systems (21CS53) CS&E, RNSIT
employees who work in that department; otherwise, the database will become
inconsistent.If we fail to update some tuples, the same department will be shown to
have two different values for manager in different employee tuples, which would be
wrong.
Based on the preceding three anomalies, we can state the guideline that follows:
GUIDELINE 2:Design the base relation schemas so that no insertion, deletion, or
modification anomalies are present in the relations. If any anomalies are present, note them
clearly and make sure that the programs that update the database will operate correctly.
➢ If many of the attributes do not apply to all tuples in the relation, we end up with many
nulls in those tuples.
➢ This can waste space at the storage level and may also lead to problems with
understanding the meaning of the attributes and with specifying JOIN operations at the
logical level.
➢ Another problem with nulls is how to account for them when aggregate operations such
as COUNT or SUM are applied.
FIGURE 4.5 Particularly poor design for the EMP_PROJ relation of Figure 4.3b. (a) The two relation
schemas EMP_LOCS and EMP_PROJ1. (b) The result of projecting the extension of EMP_PROJ
from Figure 4.4 onto the relations EMP_LOCS and EMP_PROJI.
➢ If we attempt a NATURAL JOIN operation on EMP_PROJ1 and EMP_LOCS, the result produces
many more tuples than the original set of tuples in EMP_PROJ. In Figure 4.6, the result of applying the
join to only the tuples above the dotted lines in Figure 4.5b is shown.
FIGURE 4.6: Result of applying NATURAL JOIN to the tuplesabovethe dotted lines in
EMP_PROJ1 and EMP_LOCS of Figure 6.5. Generated spurious tuples are
marked by asterisks.
➢ Additional tuples in theFigure 4.6 that were not in EMP_PROJ (Figure 4.4)are called spurious
tuples because they represent spurious or wrong information that is not valid.
We can now informally state another design guideline as follows:
GUIDELINE 4 :Design relation schemas so that they can be joined with equality conditions on
attributes that are either primary keys or foreign keys in a way that guarantees that no spurious
tuples are generated.
Definition:A functional dependency denoted by, Y between two sets of attributes X and Y that are
subsets of R specifies a constraint on the possible tuples that can form a relation state ‘r’ of ‘R’. The
constraint is that, for any two tuples ‘t1’ and ‘t2’ in ‘r’ that have t1[X] = t2[X],they must also have
t1[Y] = t2[Y].
➢ This means that the values of the ‘Y’ component of a tuple in ‘r’ depend on, or are determined by, the
values of the ‘X’ component; alternatively, the values of the ‘X’ component of a tuple uniquely (or
functionally) determine the values of the Y component.
➢ We also say that there is a functional dependency from X to Y, or that Y is functionally dependent on X.
Ex: State (By knowing the vehicle id it is possible to determine the state that vehicle belongs
to).
➢ The abbreviation for functional dependency is FD or f.d.
➢ The set of attributes X is called the left-hand side of the FD, and Y is called the right-hand side.
➢ Thus, ‘X’ functionally determines ‘Y’ in a relation schema ‘R’ if, and only if, when every
two tuples of r(R)agree on their X-value, they must necessarily agree on their Y-value.
➢ Consider the relation schema EMP_PROJ and EMP_DEPT given below. From the semantics of the
attributes, we know that the following functional dependencies should hold:
a. SSN →ENAME
b. PNUMBER → {PNAME,PLOCATION}
c. {SSN, PNUMBER} →HOURS
These functional dependencies specify that(a)the value of an employee's social security number
(SSN) uniquely determines the employee name (ENAME),(b)the value of a project's number
(PNUMBER)u n i q u e l y d e t e r m i n e s t h e p r o j e c t n a m e ( PNAME) a n d l o c a t i o n
( P L O C A T I O N ) , a n d ( c ) a combination of SSN and PNUMBER values uniquely determines the
number of hours the employee currently works on the project per week (HOURS).
➢ Therefore, formally it is useful to define a concept called closure that includes all possible
dependencies that can be inferred from the given set F.
Definition of closure:Formally, the set of all dependencies that include F as well as all dependencies that
can be inferred from F is called the closure of F; it is denoted by F+.
➢ For example, suppose that we specify the following set F of obvious functional dependencies on the
relation schema of EMP_DEPT:
Some of the additional functional dependencies that we can inferfrom F are the following:
SSN → {DNAME,
DMGRSSN} SSN → SSN
DNUMBER → DNAME
➢ A set of inference rules can be used to infer new dependencies from a given set of dependencies.
➢ We use the notation F |= X → Y to denote that the functional dependency X→Y is inferred from the set
of functional dependencies F.
If X⸧Y, then Y
Ex:- X={ SSN,FNAME,LNAME} , Y={ FNAME,LNAME}
Therefore {SSN, FNAME, LNAME} {FNAME,LNAME}
b) IR2 (augmentation rule):
➢ The augmentation rule (IR2) says that adding the same set of attributes to both the left- and right-
hand sides of a dependency results in another valid dependency.
{X Y} |= XZ YZ
{X Y, Y Z}|= X Z
Ex:- X= {SSN}, Y={DNUMBER}, Z={DNAME}
Therefore {SSN} {DNAME}
➢ IR4 (decomposition, or projective,rule):
➢ The decomposition rule (IR4) says that we can remove attributes from the right-hand side of a
dependency;
➢ Applying this rule repeatedly can decompose the FDX {A1, A2, .... , An}into the set of
dependencies{X A1, X A2 , .... , X An}.
{X YZ} |= X Y
Ex:- X= {SSN}, Y= {FNAME}, Z={LNAME}
Therefore {SSN} {FNAME}
➢ IR5 (union, or additive,rule):
➢ The union rule(IRS) allows us to do the opposite; we can combine a set of dependencies
A1),X A2, .... , An}into the single FD {A1, A2, ....,An}
X Y, X Z}|= X YZ
Ex:- X={SSN}, Y={FNAME}, Z={LNAME}
Therefore{SSN} {FNAME,LNAME}
➢ IR6 (pseudotransitiverule):
{X Y, Z}|= Z
Ex:- X={SSN}, W={DNAME}, Y={DNAME}, Z={MGRSSN}
Therefore {DNAME, SSN} {MGRSSN}
PROOF OF INFERENCERULES:
➢ Each of the inference rules can be proved from the definition of functional dependency, either by
directproof or bycontradiction.
➢ A proof by contradiction assumes that the rule does not hold and shows that this is not possible.
PROOF OF IR1:
Suppose that X Y and that two tuplest1andt2exist in some relation instance ‘r’ of ‘R’ such that t1 [X]
= t2 [X]. Then t1[Y] = t2[Y] because X Y; hence, Y must hold in ‘r’.
Assume that X Y holds in a rel ation inst ance ‘r’ of ‘R ’ but that XZ YZ does not hol d.
Then there m ust exist t wo t upl es ‘t 1’ and ‘t 2’ in ‘r’ such that:
1.t1[X] =t2[X]
2.t1[Y] =t2[Y]
3.t1[XZ] = t2[XZ]
4.t1[Y] ≠'t2[YZ].
This is not possible because from(1)and(3)we deduce (5)t1[Z] = t2 [Z], and from(2)and(5)we deduce
(6) t1 [YZ] = t2 [Y], contradicting(4).
PROOF OF IR3:
Assume that (1) X Yand (2) Y Z both hold in a relation ‘r’. Then for any two tuples ‘t1’ and
‘t2’ in ‘r’ such that t1[X] = t2 [X]. We must have (3) t1[Y] = t2[Y], from assumption (1);
hence we must also have (4) t1 [Z] = t2[Z], from (3) and assumption (2); hence X Z must hold
in‘r’.
➢ The set of dependencies F+, which we called the closure of F, can be determined from F by using only
inference rules IRI throughIR3.
➢ Inference rules IR1 through IR3 are known as Armstrong's inference rules.
➢ A systematic way to determine these additional functional dependencies is:
First determine each set of attributes ‘X’ that appears as a left-hand side of some functional
dependency in F and then to determine the set of all attributes that are dependent on X.
Definition: For each attributes ‘X’ that appears as a left-hand side of some functional dependency in ‘F’,
we determine the set ‘X+’ of attributes that are functionally determined by ‘X’ based on ‘F’; here ‘X+’
is called the closure of X under F. Algorithm 4.1 can be used to calculate‘X+’.
Algorithm 4.1: Determining X+, the Closure of X under F
X+ := X;
Repeat
old X+ := X+;
for each functional dependency Y Z in F do
if X+ Y then X+ := X+ ỤZ;
until (X+ = old X+),
a) Algorithm 4.1 starts by setting X+ to all the attributes in X. By IR1, we know that all these
attributes are functionally dependent on X.
b) Using inference rules IR3 and IR4, we add attributes to X+, using each functional dependency in
F.
c) We keep going through all the dependencies in F (the repeat loop) until no more attributes are
added to X+during a complete cycle(of the for loop) through the dependencies in F.
For example, consider the relation schema EMP_PROJ. From the semantics of the attributes, we
specify the following set F of functional dependencies that should hold on EMP_PROJ;
a. SSN →ENAME
b. PNUMBER → {PNAME,PLOCATION}
c. {SSN, PNUMBER} →HOURS
Using Algorithm 4.1, we calculate the following closure sets with respect to F;
{SSN}+ ={SSN, ENAME}
{PNUMBER}+ ={PNUMBER, PNAME, PLOCATION}
{SSN, PNUMBER}+ ={SSN, PNUMBER, ENAME, PNAME, PLOCATION,
HOURS}
Example:Let the given set of FD’s be E:{B→A, D→A,AB→D}. Find the minimal
cover of E.
Answer:
• All the above dependencies are in canonical form, so we have completed step 1 of the algorithm and
can proceed to step2.Instep2 we need to determine if AB→D has any redundant attributes on the left
hand side; that is can it be replaced by B→D or A→D?
• Since B→A, by augmenting with B on both sides (IR2), we get BB→AB, or B→AB (i). However
AB→D as given(ii).
• Hence by the transitive rule (IR3), we get from (i) and (ii), B→D. Hence AB→D may be replaced by
B→D.
• Now we have a set E’= {B→ A, D→ A, B→ D}. No further reduction is possible in step 2 since all
FDs have a single attribute on the left hand side.
• In step 3 we look for a redundant FD in E’. By using the transitive rule on B→ D and D→ A, we
derive B→ A. hence B→ A is redundant in E’ and can be eliminated.
• Hence the minimum cover of E is { B→ D, D→ A}
➢ Thus, the normalization procedure provides database designers with the following:
i) A formal framework for analyzing relation schemas based on their keys and on the functional
dependencies among their attributes.
ii) A series of normal form tests that can be carried out on individual relation schemas so that the
relational database can be normalized to any desired degree.
➢ The normal form of a relation refers to the highest normal form condition that it meets, and hence
indicates the degree to which it has been normalized.
➢ The process of normalization through decomposition must also confirm the existence of additional
properties that the relational schemas should possess. These would include two properties:
a) The lossless join or nonadditive join property:This guarantees that the spurious tuple generation
problem does not occur with respect to the relation schemas created after decomposition.
b) The dependency p r e s e r v a t i o n p r o p e r t y : This ensures that each functional dependency
is represented in some individual relation resulting after decomposition.
➢ The process of storing the join of higher normal form relations as a base relation-which is in a lower
normal form-is known as “denormalization”. This is sometimes done for some performance reasons.
4.3.2 Practical Use of Normal Forms
• Normalization is carried out in practice so that the resulting designs are of high quality and meet the
desirable properties stated previously.
• Database design as practiced in industry today pays particular attention to normalization only up to
3NF, BCNF, or at most4NF.
4.3.3 Definitions of Keys and Attributes Participating in Keys:
a) Definition:A superkey of a relation schema R = {A1, A2, ... , An}is a set of attributes SCR with the
property that no two tuples ‘t1’ and ‘t2’ in any legal relation state ‘r’ of ‘R’ will have t1[S] =t2[S].
b) Definition:A key‘K’ is a superkey with the additional property that removal of any attribute
from ‘K’ will cause ‘K’ not to be a superkey anymore.
The difference between a key and a superkey is that a key has to be minimal ;that is, if we have a key
K = {A1, A2, ... , Ak}of ‘R’, then K – {Ai} is not a key of ‘R’ for anyAiwhere1 ≤ i ≤ k.
In the following figure,{SSN} is a key for EMPLOYEE, whereas{SSN}, {SSN, ENAME}, {SSN,
ENAME, BDATE},and any set of attributes that includes SSN are all superkeys.
c) Definition: If a relation schema has more than one key, each is called a candidate key. One of the
candidate keys is arbitrarily designated to be the primary key, and the others are called secondary
keys. Each relation schema must have a primary key.
{SSN} is the only candidate key for EMPLOYEE, so it is also the primary key.
d) Definition: An attribute of relation schema R is called a prime attribute of R if it is a member of
some candidate key of R. An attribute is called nonprime if it is not a prime attribute-that is, if it
is not a member of any candidate key.
In the following figure, both SSN and PNUMBER are prime attributes of WORKS_ON, whereas other
attributes of WORKS_ON are nonprime.
FIGURE 4.7: Normalization into 1NF. (a) A relation schema that is not in 1NF.
FIGURE 4.7: Normalization into 1NF. (b) Example state of relation DEPARTMENT without
normalization.
➢ As we can see, state of relation DEPARTMENT in Figure 4.7 is not in 1NFbecauseDLOCATIONS
is not an atomic attribute.
There are three main techniques to achieve first normal form for such a relation:
a) Remove the attribute DLOCATIONS that violates 1NF and place it in a separate relation
DEPT_LOCATIONS along with the primary key DNUMBER of DEPARTMENT.
i) The primary key of this relation is the combination {DNUMBER, DLOCATION}, as shown in
following Figure.
ii) A distinct tuple in DEPT_LOCATIONS exists for each location of a department.
iii) This decomposes the non-1NF relation into two 1NF relations.
b) Expand the key so that there will be a separate tuple in the original DEPARTMENT relation for each
location of a DEPARTMENT, as shown in the following Figure6.8.
i) In this case, the primary key becomes the combination {DNUMBER,DLOCATION}.
ii) This solution has the disadvantage of introducing redundancy in the relation.
c) If a maximum number of values is known for the attribute-for example, if it is known that at most three
locations can exist for a department-replace the DLOCATIONS attribute by three atomic attributes:
DLOCATIONl, DLOCATION2, andDLOCATION3.
FIGURE 4.9 Normalizing nested relations into 1NF. (a) Schema of the EMP_PROJ relation with
a "nested relation" attribute PROJS. (b) Example extension of the EMPROJ relation
showing nested relations within each tuple.
To normalize this into INF, we remove the nested relation attributes into a new relation and
propagate the primary key into it; the primary key of the new relation will combine the partial key with
the primary key of the original relation. Decomposition and primary key propagation yield the schemas
EMP_PROJl and EMP_PROJ2.
➢ Definition: A relation schema ‘R’ is in 2NF if every nonprime attribute ‘A’ in ‘R’ is fully
functionally dependent on the primary key of ‘R’. or A relation schema ‘R’ is in second normal
form (2NF) if every nonprime attribute ‘A’ in R is not partially dependent on anykey of‘R’.
➢ The test for 2NF involves testing for functional dependencies whose left-hand side is a primary key
composed of multiple attributes. If the primary key contains a single attribute, the test need not be
applied at all.
➢ The EMP_PROJ relation in the above figure is in 1NF but is not in2NF.
a) Then on prime attribute ENAME vi ol at e s 2NF because of FD2. ENAME i s p art i al l y
dependent on
{SSN,PNUMBER} and not dependent on PNUMBER.(Given ENAME can be determined only by
SSN. So the other attributes are not needed for that table)
b) The nonprime attributes PNAME and PLOCATION violates 2NF because of FD3. PNAME and
PLOCATION are partially dependent on {SSN,PNUMBER} and not dependent on SSN.
➢ The functional dependenciesFD1, FD2andFD3 in Figure 4.10 hence lead to the decomposition of
EMP_PROJ into the three relation schemas EPl, EP2, and EP3 shown below, each of which is in2NF.
Database Management Systems (21CS53) CS&E, RNSIT
Figure 4.11 Relation EMP_DEPT is in 1NF and 2NF but not in 3NF
➢ The relation schema EMP_DEPT in Figure 4.11 is in 2NF, since no partial dependencies on a key exist.
However, EMP_DEPT is not in 3NF because of the transitive dependency of DMGRSSN (and also
DNAME) on SSN via DNUMBER.
➢ We c a n n o r m a l i z e E M P _ D E P T b y d e c o m p o s i n g i t i n t o t h e t w o 3 N F r e l a t i o n
s c h e m a s E d l a n d E D 2 shown in following Figure.
Table 6.1informally summarizes the three normal forms based on primary keys, the tests used in each
case, and the corresponding "remedy" or normalization performed to achieve the normal form.
TABLE 6.1: SUMMARY OF NORMAL FORMS BASED ON PRIMARY KEYS AND
CORRESPONDING NORMALIZATION
4.4. EXAMPLE:
• Suppose that there are two candidate keys: 1)PROPERTY_ID#and 2){COUNTY_NAME, LOT#};
that is, lot numbers are unique only within each county, but PROPERTY_ID numbers are unique
across counties for the entirestate.
Candidate key
Figure 4.12: The LOTS relation with its functional dependencies FD1 and FD2
• Suppose that the following two additional functional dependencies hold in LOTS:
• To normalize LOTS into 2NF, we decompose it into the two relations LOTS1 and LOTS2, shown
below:
➢ To normalize LOTS1 into 3NF, we decompose it into the relation schemas LOTS1A and LOTS1B as
shown in Figure4.14
• The relation schema LOTS1A still is in 3NF because COUNTY_NAME is a prime attribute.
➢ The only difference between the definitions of BCNF and 3NF is that condition (b) of 3NF, which
allows A to be prime, is absent from BCNF.
➢ In our example,FD5 violates BCNF in LOTS1A because AREA is not a superkey ofLOTS1A.
➢ Note thatFD5 satisfies 3NF in LOTSIA because COUNTY_NAME is a prime attribute (condition b),
but this condition does not exist in the definition of BCNF.
➢ We can decompose LOTSIA into two BCNF relations LOTS1AX and LOTS1AY as shown below.
➢ The relation schema R shown in following Figure illustrates the general case of a relation being in 3NF
but not inBCNF.
Whenever holds, we say that X multi determines Y. Because of the symmetry in the definition,
whenever holds in R, s o does .Hence, implies , and therefore it is
sometimes written as .
Definition of 4NF:A relation schema R is in 4NF with respect to a set of dependencies F (that includes
functional dependencies and multivalued dependencies) if, for every nontrivial multivalued dependency X
→→ Y in F+ X is a superkey for R.
In the EMP relation of Figure 4.15(a), the values ‘X’ and ‘Y’ of Pname are repeated with each value of
Dname (or, by symmetry, the values ‘John’ and ‘Anna’ of Dname are repeated with each value of Pname).
In 4.15 (c), not every Sname determines various Part_name and not every Sname determines multiple
Proj_name. so it is not MVD. Therefore it is in 4NF.
Example 1: Figure 4.15
Example 2:
Definition of 5 NF:A relation schema R is in fifth normal form (5NF)(or project-join normal form
(PJNF)) with respect to a set F of functional, multivalued, and join dependencies if, for every nontrivial join
dependency JD(R1,R2, ...,Rn) in F+ (that is, implied by F), every Ri is a superkey ofR.
Figure 4.16(d) shows how the SUPPLY relation with the join dependency is decomposed into three
relations
R1,R2, andR3 that are each in 5NF.
Notice that applying a natural join to any two of these relations produces spurious tuples,but applying a
natural join to all three together does not.
Figure 4.16
Algorithm 4.12(a). Finding a Key K for R Given a set F of Functional Dependencies:
Input:A relation R and a set of functional dependencies F on the attributes of
R.
1. SetK:=R.
2. For each attribute A in K
{compute (K– A)+ with respect to F;
if (K – A)+ contains all the attributes in R, then set K := K – {A} };
➢ The nonadditive join property, which ensures that no spurious tuples are generated when a NATURAL
JOIN operation is applied to the relations resulting from the decomposition.
➢ Because this is a property of a decomposition of relation schemas, the condition of no spurious tuples
should hold on every legal relation state—that is, every relation state that satisfies the functional
dependencies in F.
➢ Hence, the lossless join property is always defined with respect to a specific set F of dependencies.
Definition: Formally, a decomposition D = {R1, R2, ..., Rm} of R has the lossless (nonadditive) join
property with respect to the set of dependencies F on R if, for every relation state r of R that satisfies F,
thefollowing holds, where * is the NATURAL JOIN of all the relations in D: *(πR1(r),..., πRm(r)) = r.
➢ The decomposition of EMP_PROJ(Ssn, Pnumber, Hours, Ename, Pname, Plocation) in into
EMP_LOCS(Ename, Plocation) and EMP_PROJ1(Ssn, Pnumber, Hours, Pname, Plocation) does not
have the nonadditive join property.
➢ Example d e c o m p o s i t i o n o f the TEACH (Instructor, Course, Student) relation into the two
relations
{Instructor, Course} and {Instructor, Student}. These are valid decompositions because they are
nonadditive per the above test.
By applying Algorithm 16.4 to the above Minimal cover G, we get a 3NF design consisting of two
relations with keys Emp_ssn and Pno as follows:
4.9.3 Dependency-Preserving and Nonadditive (Lossless) Join Decomposition into 3NF Schemas
we know that it is not possible to have all three of the following: (1) guaranteed nonlossy design,
(2) guaranteed dependency preservation, and (3) all relations in BCNF Now we give an alternative algorithm
where we achieve conditions 1 and 2 and only guarantee 3NF. A simple modification to Algorithm 16.4,
shown as Algorithm 16.6, yields a decomposition D of R that does the following:
• Preserves dependencies
• Has the nonadditive join property
• Is such that each resulting relation schema in the decomposition is in 3NF
Because the Algorithm 16.6 achieves both the desirable properties, rather than only functional dependency
preservation as guaranteed by Algorithm 16.4, it is preferred over Algorithm 16.4.
Algorithm 16.6. Relational Synthesis into 3NF with Dependency Preservation and Nonadditive Join Property
Input: A universal relation R and a set of functional dependencies F on the attributes of R
1. Find a minimal cover G for F (use Algorithm 16.2).
2. For each left-hand-side X of a functional dependency that appears in G, create a relation schema in D with
attributes {X ∪ {A 1 } ∪ {A 2 } ... ∪ {A k } }, where X → A 1 , X → A 2 , ..., X → A k are the only
dependencies in G with X as left-hand-side (X is the key of this relation).
3. If none of the relation schemas in D contains a key of R, then create one more relation schema in D that
contains attributes that form a key of R. 7 (Algorithm 16.2(a) may be used to find a key.)
4. Eliminate redundant relations from the resulting set of relations in the relational database schema. A
relation R is considered redundant if R is a projection of another relation S in the schema; alternately, R is
subsumed by S. 8
Step 3 of Algorithm 16.6 involves identifying a key K of R. Algorithm 16.2(a) can be used to identify
a key K of R based on the set of given functional dependencies F.
Example 1 of Algorithm 16.6. Let us revisit the example given earlier at the end of Algorithm 16.4. The
minimal cover G holds as before. The second step produces relations R 1 and R 2 as before. However, now in
step 3, we will generate a relation corresponding to the key { Emp_ssn , Pno }. Hence, the resulting design
contains:
R 1 ( Emp_ssn , Esal , Ephone , Dno )
R 2 ( Pno , Pname , Plocation )
R 3 ( Emp_ssn , Pno )
This design achieves both the desirable properties of dependency preservation and nonadditive join
➢ In general, whenever a relational database schema is designed in which two or more relations are
interrelated via foreign keys, particular care must be devoted to watching for potential NULL values
in foreign keys. This can cause unexpected loss of information in queries that involve joins on that
foreign key. Moreover, if NULLs occur in other attributes, such as Salary, their effect on built-in
functions such as SUM and AVERAGE must be carefully evaluated.
➢ A related problem is that of dangling tuples, which may occur if we carry a decomposition too far.
Suppose that we decompose the EMPLOYEE relation in Figure
16.2(a) further into EMPLOYEE_1 and EMPLOYEE_2, shown in Figure 16.3(a)
and 16.3(b). If we apply the NATURAL JOIN operation to EMPLOYEE_1 and
EMPLOYEE_2, we get the original EMPLOYEE relation. However, we may use
the alternative representation, shown in Figure 16.3(c), where wedo not include a
tuplein EMPLOYEE_3 if the employee has not been assigned a department
(instead of including a tuple with NULL for Dnum as in EMPLOYEE_2).
If we use EMPLOYEE_3 instead of EMPLOYEE_2 and apply a NATURAL JOIN on EMPLOYEE_1
and EMPLOYEE_3, the tuples for Berger and Benitez will not appear in the result; these are called
dangling tuples in EMPLOYEE_1 because they are represented in only one of the two relations that
represent employees, and hence are lost if we apply an
(INNER) JOIN operation.
Figure 16.2 (a)
4.10 OTHER DEPENDENCIES AND NORMAL FORMS
Other types of dependencies:
For example, we can specify the following inclusion dependencies on the relational schema in
which represent represent referential integrity constraints:
DEPARTMENT.Dmgr_ssn<EMP
LOYEE.Ssn
WORKS_ON.Ssn<EMPLOYEE.
Ssn
EMPLOYEE.Dnumber<DEPARTME
NT.Dnumber
PROJECT.Dnum<DEPARTMENT.D
number
WORKS_ON.Pnumber<PROJECT.Pnumbe
r
DEPT_LOCATIONS.Dnumber<DEPART
MENT.Dnumber
We can also use inclusion dependencies to represent class/subclass relationships. For example, we
can specify the following inclusion dependencies:
EMPLOYEE.Ssn<PERSON.Ssn STUDENT.Ssn<PERSON.Ssn
As with other types of dependencies, there are inclusion dependency inference rules (IDIRs). The
following are three examples:
IDIR1 (reflexivity):R.X < R.X.
IDIR2(attribute correspondence):If R.X < S.Y, where X = {A1, A2, ..., An} and Y =
{B1, B2, ..., Bn} and Ai corresponds to Bi, then R.Ai < S.Bi for 1 ≤ i ≤ n.
IDIR3 (transitivity):If R.X < S.Y and S.Y < T.Z, then R.X < T.Z.
Hence, there is a unique value for Extended_price for every pair (Quantity, Unit_price ),
and thus it conforms to the definition of functional dependency.
➢ Therefore, we can say
(Item#, Quantity, Unit_price ) → Discounted_price, or
(Item#, Quantity, Extended_price) → Discounted_price.