0% found this document useful (0 votes)
18 views31 pages

Functional Dependencies and Normalization For Relational Databases

Uploaded by

navneetccna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views31 pages

Functional Dependencies and Normalization For Relational Databases

Uploaded by

navneetccna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Functional Dependencies and

Normalization for Relational Databases


406.426 Design & Analysis of Database Systems

Jonghun Park
[email protected]
Dept. of Industrial Engineering
Seoul National University
outline
 informal design guidelines for relational databases
 functional dependencies (FDs)
 normal forms based on primary deys
 general normal form definitions (for multiple keys)
 BCNF (Boyce-Codd Normal Form)

2
informal measures of quality for relation schema
 semantics of the attributes
 reducing the redundant values in tuples
 reducing the null values in tuples
 disallowing the possibility of generating spurious tuples

3
semantics of the relation attributes
 guideline 1: Design a relation schema so that it is easy to explain its
meaning. Do not combine attributes from multiple entity types and
relationship types into a single relation. If a relation schema
corresponds to one entity type or one relationship type, it is
straightforward to explain its meaning.
 examples of poor design

4
redundant information in tuples & update anomalies
 one goal of schema design is to minimize the storage space
 example:

5
update anomalies
 insertion anomalies
 to insert a new employee tuple into EMP_DEPT, we must include either the
attribute values for the department that the employee works for, or nulls
 it is difficult to insert a new department that has no employees as yet in the
EMP_DEPT relation
 deletion anomalies
 if we delete from EMP_DEPT an employee tuple that happens to represent the
last employee working for a particular department, the information
concerning that department is lost
 modification anomalies
 in EMP_DEPT, if we change the value of one of the attributes of a
particular department, we must update the tuples of all employees who work
in that department
 guideline 2: design the base relation schemas so that no insertion, deletion, or
modification anomalies are present in the relations

6
null values in tuples
 grouping many attributes together into a fat relation -> if many of the
attributes do not apply to all tuples in the relation, we end up with
many nulls in those tuples
 example
 if only 10% of employees have individual offices, there is little
justification for including an attribute OFFICE_NUMBER in the
EMPLOYEE relation -> A relation EMP_OFFICES(ESSN,
OFFICE_NUMBER) can be created
 guideline 3: as far as possible, avoid placing attributes in a base
relation whose values may frequently be null

7
generation of spurious tuples
 example: consider EMP_LOCS and EMP_PROJ1 instead of
EMP_PROJ
 EMP_LOCS: the employee whose name is ENAME works on some
project whose location is PLOCATION

8
generation of spurious tuples (cont.)
 decomposing EMP_PROJ into EMP_LOCS and EMP_PROJ1 is undesirable
because, when we JOIN them back using NATURAL JOIN, we do not get the
correct original information
 PLOCATION is the attribute that relates EMP_LOCS and EMP_PROJ1, and
PLOCATION is neither a primary key nor a foreign key in either
EMP_LOCS or EMP_PROJ1

9
generation of spurious tuples (cont.)
 guideline 4: design relation schemas so that they can be joined with
equality conditions on attributes that are either primary keys or
foreign keys in a way that guarantees that no spurious tuples are
generated

10
definition
 a functional dependency (FD), denoted by X -> Y, between two sets of attributes
X and Y that are subsets of R specifies a constraint on the possible tuples that can
form a relation state r of R
 for any two tuples t1 and t2 in r that have t1[X] = t2[X], they must also have
t1[Y] = t2[Y]
 the values of the Y component of a tuple in r depend on (or are determined by)
the values of the X component
 if X is a candidate key of R, X -> Y for any subset of attributes Y of R
 if X -> Y in R, this does not say whether or not Y -> X in R
 example
 FD1: {SSN, PNUMBER} -> HOURS
 FD2: SSN -> ENAME
 FD3: PNUMBER -> {PNAME, PLOCATION}

11
inference rules for FDs
 F: the set of functional dependencies that are specified on relation
schema R
 F+ (closure of F): the set of all dependencies that include F as well
as all dependencies that can be inferred from F
 example
 F = {SSN -> {ENAME, BDATE, ADDRESS, DNUMBER},
DNUMBER -> {DNAME, DMGRSSN}}
 SSN -> {DNAME, DMGRSSN}
 SSN -> SSN
 DNUMBER -> DNAME
 notations
 F X -> Y: X -> Y is inferred from F
 {X,Y} -> Z is abbreviated to XY -> Z

12
well-known inference rules
 IR1 (reflexive rule)
 If X Y, then X -> Y
 IR2 (augmentation rule)
 {X -> Y} XZ -> YZ
 IR3 (transitive rule)
 {X -> Y, Y -> Z} X -> Z
 IR4 (decomposition rule)
 { X -> YZ} X -> Y
 IR5 (union rule)
 {X -> Y, X -> Z} X -> YZ
 IR6 (pseudotransitive rule)
 {X -> Y, WY -> Z} WX -> Z

13
closure computation
 closure X+: the set of attributes that are functionally determined by X based on F
 algorithm
 X+ = X
 repeat
 oldX+ = X+
 for each FD Y -> Z in F do
 if X+  Y, then X+ = X+ Z
 until (X+ = oldX+)
 example
 F = {SSN -> ENAME, PNUMBER -> {PNAME, PLOCATION}, {SSN,
PNUMBER} -> HOURS}
 {SSN}+ = {SSN, ENAME}
 {PNUMBER}+ = {PNUMBER, PNAME, PLOCATION}
 {SSN, PNUMBER}+ ={SSN, ENAME, PNUMBER, PNAME, PLOCATION,
HOURS}

14
equivalence of sets of FDs
 F: a set of FDs
 F+: closure of F
 the set of all FDs logically implied by F
 F is said to cover another set of FDs E if every FD in E is also in F+
 F covers E if
 for every FD (X -> Y) in E, X+ (w.r.t. F)  Y
 That is, X+  Y => X+ -> Y => X -> X+; X+ -> Y => X -> Y
 two sets of FDs E and F are equivalent if E+ = F+

15
minimal sets of FDs
 minimal cover of a set of FDs E: a set of FDs F that satisfies the
property that
 every FD in E is in F+
 the above property is lost if any FD from F is removed
 formally, F is minimal if
 every FD in F has a single attribute for its rhs
 we cannot replace any FD X -> A in F with a FD Y -> A, where Y  X,
and still have a set of FDs that is equivalent to F
 we cannot remove any FD from F and still have a set of FDs that is
equivalent to F

16
algorithm for finding a minimal cover F for E
 set F = E
 replace each FD X -> {A1, ..., An} in F by the n functional
dependencies X -> A1, ..., X -> An
 for each FD X -> A in F
 for each attribute B  X
 if {{F – {X -> A}}  {(X – {B}) -> A}} is equivalent to F
 then replace X -> A with (X – {B}) -> A in F
 for each remaining FD X -> A in F
 if {F – {X -> A}} is equivalent to F
 then remove X -> A from F

17
normalization of relations
 first proposed by Codd
 takes a relation schema through a series of tests to certify whether it
satisfies a certain normal form
 a process of analyzing the given relation schemas based on their FDs
and primary keys to achieve the desirable properties of (1)
minimizing redundancy, and (2) minimizing the insertion,
deletion, and update anomalies
 the process of normalization through decomposition must confirm
the existence of additional properties that the relational schemas
should possess: e.g., nonadditive join property, dependency
preservation property
 1NF, 2NF, 3NF, and BCNF: based on the functional dependencies
among the attributes of a relation
 4NF, 5NF: Based on the concepts of multivalued dependencies and
join dependencies respectively

18
keys and attributes participating in keys
 superkey of a relation schema R = {A1, ..., An}
 a set of attributes S  R with the property that no two tuples t1 and t2 in
any legal relation state r of R will have t1[S] = t2[S]
 a key K is a superkey with the additional property that removal of
any attribute from K will cause K not to be a superkey any more
 if a relation schema has more than one key, each is called a
candidate key
 one of the candidate keys is arbitrarily designated to be the primary
key
 an attribute of relation schema R is called a prime attribute of R if it
is a member of some candidate key of R

19
first normal form (1NF)
 to disallow multivalued attributes, composite attributes, and their
combinations
 the domain of an attribute must include only atomic values and the
value of any attribute in a tuple must be a single value from the
domain of that attribute
 example

20
3 main techniques to achieve 1NF
 remove the attribute DLOCATIONS
that violates 1NF and place it in a
separate relation
DEPT_LOCATIONS along with the
primary key DNUMBER of
DEPARTMENT -> generally
considered best
 expand the key so that there will be a
separate tuple in the original
DEPARTMENT relation for each
location of a DEPARTMENT ->
introduces redundancy
 if a maximum number of values is
known: DLOCATION1,
DLOCATION2, ... -> introduces null
values

21
another example: nested relation
 EMP_PROJ(SSN, ENAME, {PROJS(PNUMBER, HOURS)})
 SSN is the primary key of the EMP_PROJ while PNUMBER is the partial key of
the nested relation
 for normalization into 1NF, we remove the nested relation attributes into a new
relation and propagate the primary key into it

22
second normal form (2NF)
 an FD X -> Y is a full functional dependency (FFD) if removal of any attribute A
from X means that the dependency does not hold any more
 an FD X -> Y is a partial dependency if some attribute A X can be removed
from X and the dependency still holds
 a relation schema R is in 2NF if every nonprime attribute NA in R is fully
functionally dependent on the primary key of R
 example: {SSN, PNUMBER} is a primary key for EMP_PROJ
 {SSN, PNUMBER} -> ENAME: FFD?
 {SSN, PNUMBER} -> PNAME: FFD?
 {SSN, PNUMBER} -> PLOCATION: FFD?

23
converting into 2NF
 if a relation schema is not in 2NF, it can be 2NF normalized into a
number of 2NF relations in which nonprime attributes are
associated only with the part of the primary key on which they
are fully functionally dependent

24
third normal form (3NF)
 an FD X -> Y in a relation schema R is a transitive dependency if
there is a set of attributes Z that is neither a candidate key nor a
subset of any key of R, and both X -> Z and Z -> Y hold
 a relation schema R is in 3NF if it satisfies 2NF and no nonprime
attribute of R is transitively dependent on the primary key
 example
 SSN -> DMGRSSN is transitively dependent because DNUMBER is a
nonprime attribute, SSN -> DNUMBER and DNUMBER ->
DMGRSSN hold, and DNUMBER is neither a key nor a subset of the
key of EMP_DEPT

25
example

26
general definitions of 2nd and 3rd normal forms
 the previous definition of 3NF disallows partial and transitive
dependencies on the primary key to avoid update anomalies
 now the partial and full functional dependencies and transitive
dependencies are considered w.r.t. all candidate keys of a relation

27
general definition of 2NF
 prime attribute: an attribute that is part of some candidate key
 a relation schema R is in 2NF if every nonprime attribute A in R is
not partially dependent on any key of R

candidate keys:
PROPERTY_ID#,
{COUNTY_NAME, LOT#}

{COUNTY_NAME, LOT#} -> TAX_RATE: FFD?

28
general definition of 3NF
 def) a relation schema R is in 3NF satisfies the following property
 whenever a nontrivial functional dependency X -> A holds in R,
either (a) X is a superkey of R, or (b) A is a prime attribute of R
 an FD X -> A
 violating (b) => A is a nonprime attribute 
 violating (a) => X is not a superset of any key of R
 => X is either nonprime or a proper subset of a key of R
 X is nonprime => transitive dependency (i.e., a key Y, s.t. Y -> X -> A)
 X is a proper subset of a key => partial dependency (i.e., a partial
dependency “Z(X) -> A” due to the existence of “X -> A”)
 therefore, a relation schema R is in 3NF if for every nonprime
attribute A of R
 it is non-transitively dependent on every key of R, and
 it is fully functionally dependent on every key of R
29
example

 FD4: AREA -> PRICE


 AREA is not a superkey and PRICE is not a prime attribute
 that is, from FD1 and FD2, we know that PRICE is transitively dependent on
each of the candidate keys (PROPERTY_ID#, {COUNTY_NAME, LOT#})
via the nonprime attribute AREA

30
Boyce-Codd normal form (BCNF)
 a relation schema R is in BCNF if whenever a nontrivial functional dependency
X -> A holds in R, then X is a superkey of R
 stricter than 3NF: every relation in BCNF is also in 3NF, but a relation in 3NF is
not necessarily in BCNF
 example
 FD5
 {COUNTY_NAME, LOT#} is a candidate key
 AREA is not a superkey => violates BCNF
 COUNTY_NAME is a prime attribute => satisfies 3NF

31

You might also like