Unit IV Database Normalization
Unit IV Database Normalization
Redundancy:
Data for branch_name, branch_city, assets are repeated for each loan that provides by
bank.
Storing information several times leads waste of storage space / time.
Data redundancy leads problem in Insertion, deletion and update.
Anomalies
Insertion problem
We cannot store information about a branch without loan information, we
can use null values for loan information, but they are difficult to handle.
Deletion problem
In this example, if we delete the information of Manoj (i.e. DELETE FROM
branch_loan WHERE customer_name = ‘Manoj’;), we cannot obtain the
information of pokhara branch (i.e. we don’t know branch city of
pokhara, total asset of pokhara branch etc.
Update problem
Since data are duplicated so multiple copies of same fact need to update
while updating one. It increases the possibility of data inconsistency.
When we made update in one copy there is possibility that only some of
the multiple copies are update but not all, which lead data/database in
inconsistent state.
Normalization
Is the process of minimizing redundancy from a relation or
set of relations
Redundancy in relation may cause insertion, deletion and
update anomalies.
Reducing redundancy is done for mainly two purposes
Reducing the amount of storage needed to store data.
Avoiding unnecessary data conflicts that may creep in because
of multiple copies of this data getting stored.
Normal forms are used to eliminate or reduce redundancy
in a database tables.
It is a database design technique which organizes tables in
manner that reduces redundancy and dependency of data.
It divides larger tables to smaller tables and links them
using relationships.
Advantages of Normalization
More efficient data structure
Avoid redundant fields or columns
More flexible data structure i.e. we should be able to
add new rows and data values easily
Better understanding of data
Ensures that distinct tables exit when necessary
Minimizes data duplication
Close modeling of real world entities, processes and
their relationships
Better performance is ensured as databases become
lesser in size, the passes through the data becomes
faster and shorter thereby improving response time
and speed
Disadvantages of Normalization
More tables to join as by spreading out data into more tables,
the need to joins tables increases and the task become more
tedious.
Tables will contain codes rather than real data as the repeated
data will be stored as lines of codes rather than the true data.
Data model becomes extremely difficult to query against as
the data model is optimized for application
As the normal form type progresses , the performance
become slower
It is very time consuming and difficult process in normalizing
relations to higher degree
Careless decomposition may leads to bad design of data base
which may leads to serious problems
Proper knowledge is required on the various normal forms to
execute the normalization process efficiently.
Functional Dependencies
Functional dependencies are constraints on the set of legal
relations. It defines attributes of relation, how they are related to
each other.
It determines unique value for a certain set of attributes to the
value for another set of attributes That is functional dependency is
a generalization of the notation of key.
Functional dependencies are interrelationship among attributes of a
relation.
Definition:
For a given relation R with attribute X and Y, Y is said to be
functionally dependent on X, if given value for each X uniquely
determines the value of the attribute in Y. X is called determinant of
the functional dependency (FD) and functional dependency denoted
by X→ Y.
Example 1: consider a relation supplier
Supplier(supplier_id#,sname,status,city)
Here, sname, status and city are functionally dependent on
Supplier.supplier_id→supplier.sname
Supplier.supplier_id→supplier.status
Supplier.supplier_id→supplier.city
Or simply,
supplier_id→ sname
supplier_id→ status
supplier_id→city
Example 2: Consider a relation student-info
Student-info(name#,course#,phone_no,major,prof,grade)
That is, {name,course} is composite primary key
19
Normal Forms
There are lots of Normal Forms and a
higher one implies all lower ones as well
The process is reversible
i.e. going to a higher normal form never loses
information
Main ones in increasing level are:
1NF, 2NF, 3NF, BCNF, 4NF, 5NF
20
Second Normal Form (2NF)
Suppose in our database there was only one
table called FIRST instead of S and SP
e.g. FIRST (S#, SNAME, STATUS, CITY, P#, QTY)
Primary Key is now (S#,P#) and there are
FD’s other than from a primary key
In addition, add the FD
CITY STATUS
i.e. the status is determined by the location
21
Second Normal Form (cont)
S# STATUS
QTY
P#
CITY
22
Second Normal Form(cont)
Definition of 2NF
A relation is in 2NF iff it is in 1NF and every
nonkey attribute is irreducibly dependent
on the primary key
A nonkey attribute is a key that does not
participate in the primary key
This assumes there is only one candidate
key, which we can then assume is the
primary key
23
Second Normal Form(cont)
A relation in 1NF can always be transformed into a
collection of 2NF relations by means of projections
In general, a relation R{A, B, C,D} with primary
key {A,B} having a FD A D
can always be decomposed into the projections
R1{A,D} with primary key A
and R2{A,B,C} with primary key {A,B} and
foreign key {A} REFERENCES
R1
Thus FIRST(S#.STATUS,CITY,P#,QTY) can be
decomposed into:
SECOND(S#, STATUS, CITY) and SP(S#,P#,QTY)
24
Second Normal Form (cont)
S# S# STATUS
QTY
P#
CITY
26
Third Normal Form(cont)
Definition of 3NF:
A relation is in 3NF iff it is in 2NF and every nonkey
attribute is nontransitively dependent on the
primary key
This implies that there are no mutual
dependencies between the nonkey attributes
A relation that is in 2NF can always be
transformed by means of projections into a
collection of 3NF relations
27
Third Normal Form(cont)
Thus given a relation R(A,B,C) with primary key
{A} and assuming the FD B C, we normalise
taking the projections
R1{B,C} with primary key {B}
and R2{A,B} with primary key {A} and foreign
key {B} references R1
28
Boyce/Codd Normal Form(BCNF)
The definition of 3NF assumed in a
relation that there was only one
candidate key
BCNF is a slightly stronger normal form
that addresses this (named from its
authors)
It deals with the case where a relation
has:
1. Two or more candidate keys such that
2. The candidate keys were composite, and
3. They overlapped (had at least one attribute in
common)
29
Boyce/Codd Normal Form(cont)
31
Multi-valued Dependency
(cont)
Definition: Let R be a relation, and let A,B, and C
be subsets of the attributes of R, then
B is multi-dependent on A
AB
iff in every possible legal value of R, the set of B
values matching a given (A value, C value) pair
depends only on the A value (i.e. is independent
of the C value)
Every FD is a MVD (but not the other way round)
MVDs go in pairs: if MVD AB holds then so
does AC often written as A B | C
e.g. COURSETEACHER and COURSETEXT
32
Fourth Normal Form (4NF)
Fagin’s Theorem
Let R{A,B,C} be a relation, where A,B, and C are
sets of attributes. Then R is equal to the join of
its projections on {A,B} and {A,C} iff R satisfies
the MVD A B | C
Definition of 4NF:
A relation R is in 4NF iff whenever there exists
subsets A and B of the attributes of R such that
the nontrivial MVD AB is satisfied, then all
the attributes of R are also functionally
dependent on A
An MVD AB is trivial if either A is a superset of
B or the union of A and B is the entire heading
33 Database [ITC218]:
Fourth Normal Form(cont)
Thus instead of having the relation
CTX(COURSE,TEACHER,TEXT)
which is in BCNF, it is decomposed into two
projections
CT(COURSE,TEACHER)
and CX (COURSE,TEXT)
which are now in 4NF
Notes:
1. 4NF implies BCNF
2. 4NF is always achievable but not always desirable, due
to other constraints
34 Database [ITC218]:
Decomposition
Decompose the relation schema Lending-schema
into:
Branch-schema = (branch-name, branch-city,assets)
Loan-info-schema = (customer-name, loan-number,
branch-name,
amount)
All attributes of an original schema (R) must
appear in the decomposition (R1, R2):
R = R1 R2
Lossless-join decomposition.
For all possible relations r on schema R
r = R1 (r) R2 (r)
Example of Non Lossless-Join Decomposition
Decomposition of R = (A, B)
R1 = (A) R2 = (B)
A B A B
1 1
2 2
1
A(r) B(r)
r
A B
A (r) B (r)
1
2
1
2