0% found this document useful (0 votes)
29 views35 pages

Chapter Five

Uploaded by

ffff
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views35 pages

Chapter Five

Uploaded by

ffff
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Chapter 5

Functional Dependency and Normalization


Functional Dependency and Normalization
Discuss some of the theory that has been developed with the goal of
evaluating relational schemas for design quality-that is,
to measure formally why one set of groupings of attributes into relation
schemas is better than another.
There are two levels at which we can discuss the "goodness" of
relation schemas.
1. logical (or conceptual) level
• how users interpret the relation schemas and the meaning of their
attributes
• Having good relation schemas at this level enables users to understand
clearly the meaning of the data in the relations, and hence to formulate
their queries correctly
2. The second is the implementation (or storage) level-
• how the tuples in a base relation are stored and updated
•database design may be performed using two approaches:
A)bottom-up
A bottom-up design methodology (also called design by synthesis)
considers the basic relationships among individual attributes as the
starting point and uses those to construct relation schemas
B) Top-down
•starts with a number of groupings of attributes into relations that
exist together naturally, for example, on an invoice, a form, or a
report.
•The relations are then analyzed individually and collectively, leading
to further decomposition until all desirable properties are met
INFORMAL DESIGN GUIDELINES FOR RELATION SCHEMAS

 Four informal measures of quality for relation schema design


 Semantics of the attributes
 Reducing the redundant values in tuples
 Reducing the null values in tuples
 Disallowing the possibility of generating spurious tuples
 Whenever we group attributes to form a relation schema, we assume
that attributes belonging to one relation have certain real-world
meaning and a proper interpretation associated with them
 This meaning, or semantics, specifies how to interpret the attribute
values stored in a tuple of the relation-in other words, how the
attribute values in a tuple relate to one another.
 If the conceptual design is done carefully, followed by a systematic
mapping into relations, most of the semantics will have been
accounted for and the resulting design should have a clear meaning.
Example
 The meaning of the EMPLOYEE relation schema is quite simple: Each
tuple represents an employee, with values for the employee's name
(ENAME. social security number (SSN), birth date (BDATE), and
address (ADDRESS), and the number of the department that the
employee works for (DNUMBER).
 The DNUMBER attribute is a foreign key that represents an implicit
relationship between EMPLOYEE and DEPARTMENT.
 The semantics of the DEPARTMENT and PROJECT schemas are also
straightforward: Each DEPARTMENT tuple represents a department
entity, and each PROJECT tuple represents a project entity.
 The attribute DMGRSSN of DEPARTMENT relates a department to
the employee who is its manager, while DNUM of PROJECT relates a
project to its controlling department; both are foreign key attributes.
 The ease with which the meaning of a relation's atributes can be
explained is an informal measure of how well the relation is designed
GUIDELINE 1
• Design a relation schema so that it is easy to explain its meaning.
Do not combine attributes from multiple entity types and
relationship types into a single relation.
• Figure 10.3a represents a single employee but includes additional
information-namely, the name (DNAME) of the department for
which the employee works and the social security number
(DMGRSSN) of the department manager.
• For the EMP_PROJ relation of Figure 10.3b, each tuple relates an
employee to a project but also includes the employee name
(ENAME), project name (PNAME), and project location
(PLOCATION).
• Although there is nothing wrong logically with these two
relations, they are considered poor designs because they violate
Guideline 1 by mixing attributes from distinct real-world
entities;
• EMP_DEPT mixes attributes of employees and departments,
and EMP_PROJ mixes attributes of employees and projects
Redundant Information in Tuples and Update Anomalies
 One goal of schema design is to minimize the storage space used by the base
relations
 Grouping attributes into relation schemas has a significant effect on storage
space.
 For example, compare the space used by the two base relations
EMPLOYEE and DEPARTMENT with that for an EMP_DEPT base
relation .
 In EMP_DEPT, the attribute values pertaining to a particular department
(DNUMBER,DNAME, DMGRSSN) are repeated for every employee
who works for that department.
 In contrast, in the normal design , each department's information appears
only once in the DEPARTMENT relation .
 Only the department number (DNUMBER) is repeated in the
EMPLOYEE relation for each employee who works in that department.
 Similar comments apply to the EMP_PRO] relation , which augments the
WORKS_ON relation with additional attributes from EMPLOYEE and
PROJECT.
 Another serious problem with using the relations in Figure 10.4 is as base
relations is the problem of update anomalies.
 These can be classified into insertion anomalies, deletion anomalies, and
modification anomalies.
Insertion Anomalies.
 Insertion anomalies can be differentiated into two types, illustrated by
the following examples based on the EMP_DEPT relation
 To insert a new employee tuple into EMP_DEPT, we must include either
the attribute values for the department that the employee works for, or
nulls (if the employee does not work for a department as yet).
 For example, to insert a new tuple for an employee who works in
department number 5, we must enter the attribute values of department
5 correctly so that they are consistent with values for department 5 in
other tuples in EMP_DEPT.
 In the Normal design we do not have to worry about this consistency
problem because we enter only the department number in the employee
tuple; all other attribute values of department 5 are recorded only once in
the database, as a single tuple in the DEPARTMENT relation.
• It is difficult to insert a new department that has no
employees as yet in the EMP_DEPT relation.
• The only way to do this is to place null values in the
attributes for employee.
• This causes a problem because SSN is the primary key of
EMP_DEPT, and each tuple is supposed to represent an
employee entity-not a department entity.
• This problem does not occur in the design of Figure 10.2,
because a department is entered in the DEPARTMENT
relation whether or not any employees work for it, and
whenever an employee is assigned to that department, a
corresponding tuple is inserted in EMPLOYEE
Deletion Anomalies:
 The problem of deletion anomalies is related to the second insertion
anomaly situation discussed earlier.
 If we delete from EMP_DEPT an employee tuple that happens to
represent the last employee working for a particular department, the
information concerning that department is lost from the database.
 This problem does not occur in the database of Figure 10.2 because
DEPARTMENT tuples are stored separately.
Modification Anomalies:
 In EMP_DEPT, if we change the value of one of the attributes of a
particular department-say, the manager of department 5-we must
update the tuples of all employees who work in that department;
otherwise, the database will become inconsistent.
 If we fail to update some tuples, the same department will be shown
to have two different values for manager in different employee
tuples, which would be wrong.'
 Based on the preceding three anomalies, we can state the guideline
GUIDELINE 2
 Design the base relation schemas so that no insertion, deletion, or
modification anomalies are present in the relations. If any
anomalies are present, note them clearly and make sure that the
programs that update the database will operate correctly.
Null Values in Tuples
 In some schema designs we may group many attributes together
into a "fat" relation.
 If many of the attributes do not apply to all tuples in the relation,
we end up with many nulls in those tuples.
 This can waste space at the storage level and may also lead to
problems with understanding the meaning of the attributes and
with specifying JOIN operations at the logical level.
 Another problem with nulls is how to account for them when
aggregate operations such as COUNT or SUM are applied.
 Nulls can have multiple interpretations, such as the following:
• The attribute does not apply to this tuple.
• The attribute value for this tuple is unknown.
• The value is known but absent; that is, it has not been recorded yet.
 Having the same representation for all nulls compromises the
different meanings they may have. Therefore, we may state
another guideline.
GUIDELINE 3
 As far as possible, avoid placing attributes in a base relation
whose values may frequently be null. If nulls are unavoidable,
make sure that they apply in exceptional cases only and do not
apply to a majority of tuples in the relation.
FUNCTIONAL DEPENDENCIES

 A functional dependency is a constraint between two sets of


attributes from the database.
 Suppose that our relational database schema has n attributes
A1,A2, ••• , An
 Defnition: A functional dependency, denoted by XY, between
two sets of attributes X and Y that are subsets of R specifies a
constraint on the possible tuples that can form a relation state r
of R.
 This means that the values of the Y component of a tuple in r
depend on, or are determined by, the values of the X component;
alternatively, the values of the X component of a tuple uniquely
(or functionally) determine the values of the Y component.
•We also say that there is a functional dependency from X to Y, or that
Y is functionally dependent on X.
•The abbreviation for functional dependency is FD or f.d. The set of
attributes X is called the left-hand side of the FD, and Y is called the
right-hand side.
•Thus, X functionally determines Y in a relation schema R if, and only
if, whenever two tuples of r(R) agree on their X-value, they must
necessarily agree on their Y-value.
Note the following:
•If a constraint on R states that there cannot be more than one tuple
with a given X value in any relation instance r(R)-that is, X is a
candidate key of R-this implies that X ->Y for any subset of attributes
Y of R (because the key constraint implies that no two tuples in any
legal state r(R) will have the same value of X).
• If X-> Y in R, this does not say whether or not Y ->X in R.
 A functional dependency is a property of the semantics or meaning
of the attributes
 Consider the relation schema EMP_PROJ.
 From the semantics of the attributes, we know that the following
functional dependencies should hold
a. SSN  ENAME
b. PNUMBER{PNAME, PLOCATION}
C. {SSN, PNUMBER}HOURS
 These functional dependencies specify that
 (a) the value of an employee's social security number (SSN)
uniquely determines the employee name (ENAME),
 (b) the value of a project's number (PNUMBER) uniquely
determines the project name (PNAME) and location
(PLOCATION), and
 (c) a combination of SSN and PNUMBER values uniquely
determines the number of hours the employee currently works on
Example: TEACHER COURSE, we cannot confirm this unless we
know that it is true for all possible legal states of TEACH
 It is, however, sufficient to demonstrate a single counterexample
to disprove a functional dependency.
 For example, because 'Smith' teaches both 'Data Structures' and
'Data Management', we can conclude that TEACHER does not
functionally determine COURSE.
 Figure 10.3 introduces a diagrammatic notation for displaying FDs:
Each FD is displayed as a horizontal line.
 The left-hand-side attributes of the FD are connected by vertical
lines to the line representing the FD, while the right-hand-side
attributes are connected by arrows pointing toward the attributes,
as shown in Figures 10.3a and 10.3b.
Inference Rules for Functional Dependencies
 We denote by F the set of functional dependencies that are
specified on relation schema R.
 Typically, the schema designer specifies the functional
dependencies that are semantically obvious;
 But still numerous other functional dependencies hold in all legal relation
instances that satisfy the dependencies in F.
 Those other dependencies can be inferred or deduced from the
FDs in F.
 For example, if each department has one manager, so that
DEPT_NO uniquely determines MANAGER_SSN i.e (DEPT_NO 
MGR_SSN ), and a Manager has a unique phone number called
MGR_PHONE i.e (MGR_SSN  MGR_PHONE), then these two
dependencies together imply that DEPT_NO  MGR_PHONE.
 This is an inferred FD and need not be explicitly stated in addition
to the two given FDS.
Definition: Formally, the set of all dependencies that include F as well
as all dependencies that can be inferred from F is called the closure
of F; it is denoted by P+.
•We use the notation F=l XY to denote that the functional
dependency X Y is inferred from the set of functional
dependencies F.
•For example, suppose that we specify the following set F of obvious
functional dependencies on the relation schema of Figure 10.3a:
F= {SSN  {ENAME, BDATE, ADDRESS, DNUMBER},
DNUMBER  {DNAME, DMGRSSN}}
Some of the additional functional dependencies that we can infer
from F are the following:
SSN - {DNAME, DMGRSSN}
SSN  SSN
DNUMBER  DNAME
The following six rules IR1 through IR6 are well known inference
rules for functional dependencies:
IR1 (reflexive rule''}: If X ↃY, then X Y.
IR2 (augmentation rule"): {X Y}=l XZ  YZ.
IR3 (transitive rule): {X  Y, Y  Z} =l X  Z.
IR4 (decomposition, or projective, rule): {X  YZ}l= X Y.
IR5 (union, or additive, rule): {X  Y, X Z}=l X  YZ.

READING ASSIGNMENT PROOF ALL OF THEM


Normalization of Relations
• The normalization process, was first proposed by Codd (l972a)
• Normalization process, which proceeds in a top-down fashion by
evaluating each relation against the criteria for normal forms and
decomposing relations as necessary, is considered as relational
design by analysis.
• Normalization of data can be looked upon as a process of analyzing
the given relation schemas based on their FDs and primary keys to
achieve the desirable properties of:
1. minimizing redundancy
2. minimizing the insertion, deletion, and update anomalies
– Unsatisfactory relation schemas that do not meet certain conditions-the
normal form tests-are decomposed into smaller relation schemas that meet
the tests and hence possess the desirable properties
Definitions of Keys and Attributes Participating in Keys
• Definition: A superkey of a relation schema R = {AI, A2, ... , An} is a
set of attributes S C R with the property that no two tuples tl and
t2 in any legal relation state r of R will have tl[S] = t2[S].
• A key K is a superkey with the additional property that removal of
any attribute from K will cause K not to be a superkey any more.
• The difference between a key and a superkey is that a key has to be
minimal; that is, if we have a key K = {A1, A2, ... , Ak} of R, then K -
{Ai} is not a key of R for any Ai, 1<=i<=k
• Ex: {SSN} is a key for EMPLOYEE, whereas {SSN, ENAME} {SSN,
ENAME, BDATE} and any set of attributes that includes SSN are all
superkeys.
• If a relation schema has more than one key, each is called a
candidate key.
• One of the candidate keys is arbitrarily designated to be the
primary key, and the others are called secondary keys
• Definition. An attribute of relation schema R is called a prime
attribute of R if it is a member of some candidate key of R.
• An attribute is called nonprime if it is not a prime attribute-that is,
if it is not a member of any candidate key.
• Ex both SSN and PNUMBER are prime attributes of WORKS_ON,
whereas other attributes of WORKS_ON are nonprime
First Normal Form
• First normal form (INF) is now considered to be part of the formal
definition of a relation in the basic (flat) relational model
• Historically, it was defined to disallow multivalued attributes,
composite attributes, and their combinations.
• It states that the domain of anattribute must include only atomic
(simple, indivisible) values and that the value of any attribute in a
tuple must be a single value from the domain of that attribute.
• Hence, INF disallows having a set of values, a tuple of values, or a
combination of both as an attribute value for a single tuple
• In other words, INF disallows "relations within relations" or
"relations as attribute values within tuples." The only attribute
values permitted by lNF are single atomic (or indivisible) values
• Consider the DEPARTMENT relation schema shown in Figure 10.1,
whose primary key is DNUMBER, and suppose that we extend it by
including the DLOCATIONS attribute as shown in Figure 10.8a. We
assume that each department can have a number of locations
• As can be seen, this is not in 1NFbecause DLOCATIONS is not an
atomic attribute, as illustrated by the first tuple in Figure 1O.8b.
There are two ways we can look at the DLOCATIONS attribute
– The domain of DLOCATIONS contains atomic values, but some tuples can
have a set of these values.
– In this case, DLOCATIONS is not functionally dependent on the primary key
DNUMBER
– The domain of DLOCATIONS contains sets of values and hence is nonatomic.
In this case, DNUMBER  DLOCATIONS, because each set is considered a
single member of the attribute domain.
• In either case, the DEPARTMENT relation of Figure 10.8 is not in
1NF
Techniques to achieve first normal form for such a relation:
1. Remove the attribute DLOCATIONS that violates 1NF and place it in a
separate relation DEPT_LOCATIONS along with the primary key DNUMBER
of DEPARTMENT. The primary key of this relation is the combination
{DNUMBER, DLOCATION},as shown in Figure 10.2.A distinct tuple in
DEPT_LOCATIONS exists for each location of a department. This
decomposes the non-1NF relation into two 1NFrelations
2. Expand the key so that there will be a separate tuple in the original
DEPARTMENT relation for each location of a DEPARTMENT, as shown in
Figure 10.8c. In this case, the primary key becomes the combination
{DNUMBER, DLOCATION}.
This solution has the disadvantage of introducing redundancy in the
relation
3. If a maximum number of values is known for the attribute-for example, if it
is known that at most three locations can exist for a department-replace
the DLOCATIONS attribute by three atomic attributes: DLOCATION1,
DLOCATION2, and DLOCATION3. This solution has the disadvantage of
introducing null values if most departments have fewer than three
locations
Second Normal Form
• Second normal form (2NF) is based on the concept of full
functional dependency.
• A functional dependency X Y is a full functional dependency if
removal of any attribute A from X means that the dependency
does not hold any more; that is, for any attribute A єX, (X - {A})
does not functionally determine Y.
• A functional dependency X  Y is a partial dependency if some
attribute A є X can be removed from X and the dependency still
holds; that is, for some A E X, (X - {A})  Y.
• For example: in figure 10.3b) {SSN, PNUMBER}  HOURS is a full
dependency (neither SSN  HOURS nor PNUMBER HOURS
holds).
• However, the dependency {SSN, PNUMBER}  ENAME is partial
because SSN  ENAME holds.
• Definition. A relation schema R is in 2NF if every nonprime
attribute A in R is fully functionally dependent on the primary key
of R.
• The test for 2NF involves testing for functional dependencies
whose left-hand side attributes are part of the primary key.
• If the primary key contains a single attribute, the test need not be
applied at all
• The EMP_PROJ relation in Figure 10.3b is in INF but is not in 2NF.
The nonprime attribute ENAME violates 2NF because of FD2, as do
the nonprime attributes PNAME and PLOCATION because of FD3.
• The functional dependencies FD2 and FD3 make ENAME, PNAME,
and PLOCATION partially dependent on the primary key {SSN,
PNUMBER} of EMP_PROJ, thus violating the 2NFtest.
• If a relation schema is not in 2NF, it can be "second normalized" or
"2NFnormalized" into a number of 2NFrelations in which nonprime
attributes are associated only with the part of the primary key on
which they are fully functionally dependent.
• The functional dependencies FDI, m2, and FD3 in Figure IO.3b
hence lead to the decomposition of EMP_PROJ into the three
relation schemas EPl, EP2, and EP3 shown
Third Normal form
• Reading assignment

You might also like