0% found this document useful (0 votes)
14 views

Unit IV Database Normalization

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Unit IV Database Normalization

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Unit IV: Unit IV Database Normalization

 Informal Design Guidelines for Relational Schemas


 Functional Dependencies
 Normal Forms Based on Primary Keys
 First, Second and Third Normal Forms
 Boyce-Codd Normal Form
 Multivalued Dependency and Fourth Normal Form
 Properties of Relational Decomposition
 Informal guidelines for relation schemas
 Semantics of the attributes
 Reducing the redundant values in tuples
 Reducing null values in tuples
 Disallowing the possibility of generating spurious
tuples
 The main goal of relational database is to create / find
good collection of relational schemas. Such that
database should allow us to store data / information
without unnecessary redundancy and should also allow
to retrieve information easily. That is the goal of
relational database design should concentrated
 to avoid redundant data from database.
 to ensure that relationships among attributes are represented.
 to facilitate the checking of updates for the violation of
database integrity constraints.
 A bad relational database design may lead to:
 Repetition of information.
 That is, it leads data redundancy in database, so obviously it requires
much space.
 Inability to represent certain information.
 Example 1: Consider the relational schema
 Branch_loan = ( beranch_name, branch_city, assets, customer_name,
loan_no, amount)
branch_name branch city assets customer name loan no amount
kathmandu baneshwor 25000 rohan L - 15 3000
Lalitpur patan 19000 mohan L - 17 5000
baneshwor 25000 raju L - 19 10000
Pokhara Prithibi nagar 17000 manoj L - 10 7000
Lalitpur patan 19000 swikar L - 30 9000

 Redundancy:
 Data for branch_name, branch_city, assets are repeated for each loan that provides by
bank.
 Storing information several times leads waste of storage space / time.
 Data redundancy leads problem in Insertion, deletion and update.
Anomalies
 Insertion problem
 We cannot store information about a branch without loan information, we
can use null values for loan information, but they are difficult to handle.
 Deletion problem
 In this example, if we delete the information of Manoj (i.e. DELETE FROM
branch_loan WHERE customer_name = ‘Manoj’;), we cannot obtain the
information of pokhara branch (i.e. we don’t know branch city of
pokhara, total asset of pokhara branch etc.
 Update problem
 Since data are duplicated so multiple copies of same fact need to update
while updating one. It increases the possibility of data inconsistency.
When we made update in one copy there is possibility that only some of
the multiple copies are update but not all, which lead data/database in
inconsistent state.
Normalization
 Is the process of minimizing redundancy from a relation or
set of relations
 Redundancy in relation may cause insertion, deletion and
update anomalies.
 Reducing redundancy is done for mainly two purposes
 Reducing the amount of storage needed to store data.
 Avoiding unnecessary data conflicts that may creep in because
of multiple copies of this data getting stored.
 Normal forms are used to eliminate or reduce redundancy
in a database tables.
 It is a database design technique which organizes tables in
manner that reduces redundancy and dependency of data.
 It divides larger tables to smaller tables and links them
using relationships.
Advantages of Normalization
 More efficient data structure
 Avoid redundant fields or columns
 More flexible data structure i.e. we should be able to
add new rows and data values easily
 Better understanding of data
 Ensures that distinct tables exit when necessary
 Minimizes data duplication
 Close modeling of real world entities, processes and
their relationships
 Better performance is ensured as databases become
lesser in size, the passes through the data becomes
faster and shorter thereby improving response time
and speed
Disadvantages of Normalization
 More tables to join as by spreading out data into more tables,
the need to joins tables increases and the task become more
tedious.
 Tables will contain codes rather than real data as the repeated
data will be stored as lines of codes rather than the true data.
 Data model becomes extremely difficult to query against as
the data model is optimized for application
 As the normal form type progresses , the performance
become slower
 It is very time consuming and difficult process in normalizing
relations to higher degree
 Careless decomposition may leads to bad design of data base
which may leads to serious problems
 Proper knowledge is required on the various normal forms to
execute the normalization process efficiently.
Functional Dependencies
 Functional dependencies are constraints on the set of legal
relations. It defines attributes of relation, how they are related to
each other.
 It determines unique value for a certain set of attributes to the
value for another set of attributes That is functional dependency is
a generalization of the notation of key.
 Functional dependencies are interrelationship among attributes of a
relation.
Definition:
 For a given relation R with attribute X and Y, Y is said to be
functionally dependent on X, if given value for each X uniquely
determines the value of the attribute in Y. X is called determinant of
the functional dependency (FD) and functional dependency denoted
by X→ Y.
 Example 1: consider a relation supplier
Supplier(supplier_id#,sname,status,city)
 Here, sname, status and city are functionally dependent on
 Supplier.supplier_id→supplier.sname
 Supplier.supplier_id→supplier.status
 Supplier.supplier_id→supplier.city

Or simply,
 supplier_id→ sname
 supplier_id→ status
 supplier_id→city
 Example 2: Consider a relation student-info
Student-info(name#,course#,phone_no,major,prof,grade)
That is, {name,course} is composite primary key

This relation has the following functional dependencies


{name→phone_no, name→major, name,course→grage,
course→prof}
 Functional dependency X→Y satisfied on the relation R/
hold on R
 FD X→Y is satisfied on relation R if the cardinality of
∏Y( x=x(r)) is at most one. That is if, two tuples ti and tj of
R have the same X value then the corresponding value
of Y must identical.
Types of Functional Dependency
 Trival and Non Trival dependencies
 Fully functional dependency
 Partial functional dependency
 Transitive dependency
 Multi valued dependency
Trival and Non Trival Functional
Dependency
 The dependency of an attribute on a set of
attributes is known as trival functional
dependency if the set of attributes includes
that attribute.
 AB is trival functional dependency if B is a
subset of A.
 AA and BB is also trival
 {Student_id , Student_name }  student_id
 If a functional dependency XY holds true
where Y is not a subset of X then this
dependency is called non trival functional
dependency.
 Emp_id  emp_name
Fully and Partial
 An attribute is fully functional dependent on a
second attribute if and only if it is functionally
dependent on the second attribute but not on
any subset of the second attribute.

 an attribute is partial functionally dependent


on a second attribute if and only if it is
functionally dependent on the second
attribute and also dependency occur on any
subset of the second attribute.
 {emp_id, project_no}  emp_name is partial
FD as emp_id  emp_name
Transitive Dependency
 A FD is said to be transitive if it is indirectly
formed by two functional dependencies. For
example X  Z is transitive dependency if the
following three FDs hold true.
 XY
 YZ
 X Z
Multivalued
 Multivalued dependency occurs when there
are more than one independent multivalued
attribues in a table
Example Database: Supplier/Parts
S# SNAME STATUS CITY S# P# QTY
 S SP
S1 Smith 20 London S1 P1 300
S2 Jones 10 Paris S1 P2 200
S3 Blake 30 Paris S1 P3 400
S4 Clark 20 London S1 P4 200
S5 Adams 30 Athens S1 P5 100
 S1 P6 100
P
P# PNAME COLOUR WEIGHT CITY S2 P1 300
P1 Nut Red 12 London S2 P2 400
P2 Bolt Green 17 Paris S3 P2 200
P3 Screw Blue 17 Rome S4 P2 200
P4 Screw Red 14 London S4 P4 300
P5 Cam Blue 12 Paris S4 P5 400
P6 Cog Red 19 London
18
First Normal Form (1NF)
 A relation is in 1 NF if all the underlying
domains contain only atomic values
 For any relational system this is always
true of the relations
 It is true for SQL, since only one value
allowed in any field
 Normalised sometimes means 1NF but
more often a higher form of normalisation
e.g. 3NF or higher

19
Normal Forms
 There are lots of Normal Forms and a
higher one implies all lower ones as well
 The process is reversible
i.e. going to a higher normal form never loses
information
 Main ones in increasing level are:
1NF, 2NF, 3NF, BCNF, 4NF, 5NF

 A database should be in 3NF or higher

20
Second Normal Form (2NF)
 Suppose in our database there was only one
table called FIRST instead of S and SP
e.g. FIRST (S#, SNAME, STATUS, CITY, P#, QTY)
 Primary Key is now (S#,P#) and there are
FD’s other than from a primary key
 In addition, add the FD
CITY  STATUS
i.e. the status is determined by the location

21
Second Normal Form (cont)

S# STATUS
QTY
P#
CITY

 An illustration of the problems this causes is that


we cannot insert a CITY for a Supplier unless they
supply at least one part
 There are also problems with deletes and updates

22
Second Normal Form(cont)
Definition of 2NF
 A relation is in 2NF iff it is in 1NF and every
nonkey attribute is irreducibly dependent
on the primary key
 A nonkey attribute is a key that does not
participate in the primary key
 This assumes there is only one candidate
key, which we can then assume is the
primary key

23
Second Normal Form(cont)
 A relation in 1NF can always be transformed into a
collection of 2NF relations by means of projections
 In general, a relation R{A, B, C,D} with primary
key {A,B} having a FD A D
can always be decomposed into the projections
R1{A,D} with primary key A
and R2{A,B,C} with primary key {A,B} and
foreign key {A} REFERENCES
R1
 Thus FIRST(S#.STATUS,CITY,P#,QTY) can be
decomposed into:
SECOND(S#, STATUS, CITY) and SP(S#,P#,QTY)

24
Second Normal Form (cont)

S# S# STATUS
QTY
P#
CITY

 This shows the 2NF tables


diagrammatically
 The transformation from 1NF to 2NF is an
example of non-loss decomposition
25
Third Normal Form (3NF)
 However 2NF still causes problems, in that
although the dependency of STATUS on S# is
functional and irreducible, it is also transitive
S#  CITY  STATUS
which has the logical consequence that
S#  STATUS also holds
 In general, if A  B  C then A  C
 This causes problems with insert, delete and
update
 In fact, we want all the nonkey attributes in a
relation to be mutually independent, so that each
can be updated independently of the rest

26
Third Normal Form(cont)
 Definition of 3NF:
A relation is in 3NF iff it is in 2NF and every nonkey
attribute is nontransitively dependent on the
primary key
This implies that there are no mutual
dependencies between the nonkey attributes
 A relation that is in 2NF can always be
transformed by means of projections into a
collection of 3NF relations

27
Third Normal Form(cont)
 Thus given a relation R(A,B,C) with primary key
{A} and assuming the FD B C, we normalise
taking the projections
R1{B,C} with primary key {B}
and R2{A,B} with primary key {A} and foreign
key {B} references R1

e.g. SECOND{S#, STATUS,CITY} becomes


SC{S#,CITY} and CS{CITY,STATUS}

 N.B. The relation SP was already in 3NF

28
Boyce/Codd Normal Form(BCNF)
 The definition of 3NF assumed in a
relation that there was only one
candidate key
 BCNF is a slightly stronger normal form
that addresses this (named from its
authors)
 It deals with the case where a relation
has:
1. Two or more candidate keys such that
2. The candidate keys were composite, and
3. They overlapped (had at least one attribute in
common)
29
Boyce/Codd Normal Form(cont)

 Formal Definition: A relation is in BCNF iff every


nontrivial left-irreducible FD has a candidate key
as its determinant (i.e. LHS)
 Informal Definition: A relation is in BCNF iff the
only determinants are candidate keys
 This means that in a FD diagram, the only
arrows are out of candidate keys
 With no reference to 3NF, this is stronger and
implies 3NF
 It is conceptually simpler than 3NF
 Relations FIRST and SECOND are not in BCNF
 Relations SP , SC and CS are
30
Multi-valued Dependency
(MVD)
 Consider a relation representing courses, teachers
and texts
 There may be several teachers on one course and
several texts, but the set of texts are independent
of the set of teachers
 Could represent by a row for every combination of
teacher and text but this contains redundancy
which could lead to update anomalies, i.e.
CTX(COURSE,TEACHER,TEXT)
 This is an example of two multi-valued
dependencies (MVDs)

COURSE  TEACHER and COURSE  TEXT

31
Multi-valued Dependency
(cont)
 Definition: Let R be a relation, and let A,B, and C
be subsets of the attributes of R, then
B is multi-dependent on A
AB
iff in every possible legal value of R, the set of B
values matching a given (A value, C value) pair
depends only on the A value (i.e. is independent
of the C value)
 Every FD is a MVD (but not the other way round)
 MVDs go in pairs: if MVD AB holds then so
does AC often written as A B | C
 e.g. COURSETEACHER and COURSETEXT

32
Fourth Normal Form (4NF)
 Fagin’s Theorem
Let R{A,B,C} be a relation, where A,B, and C are
sets of attributes. Then R is equal to the join of
its projections on {A,B} and {A,C} iff R satisfies
the MVD A B | C
 Definition of 4NF:
A relation R is in 4NF iff whenever there exists
subsets A and B of the attributes of R such that
the nontrivial MVD AB is satisfied, then all
the attributes of R are also functionally
dependent on A
 An MVD AB is trivial if either A is a superset of
B or the union of A and B is the entire heading

33 Database [ITC218]:
Fourth Normal Form(cont)
 Thus instead of having the relation
CTX(COURSE,TEACHER,TEXT)
which is in BCNF, it is decomposed into two
projections
CT(COURSE,TEACHER)
and CX (COURSE,TEXT)
which are now in 4NF
 Notes:
1. 4NF implies BCNF
2. 4NF is always achievable but not always desirable, due
to other constraints

34 Database [ITC218]:
Decomposition
 Decompose the relation schema Lending-schema
into:
Branch-schema = (branch-name, branch-city,assets)
Loan-info-schema = (customer-name, loan-number,
branch-name,
amount)
 All attributes of an original schema (R) must
appear in the decomposition (R1, R2):
R = R1  R2
 Lossless-join decomposition.
For all possible relations r on schema R
r = R1 (r) R2 (r)
Example of Non Lossless-Join Decomposition

 Decomposition of R = (A, B)
R1 = (A) R2 = (B)

A B A B

 1  1
 2  2
 1
A(r) B(r)
r
A B
A (r) B (r)
 1
 2
 1
 2

You might also like