0% found this document useful (0 votes)
77 views16 pages

Unit 3 Updated FG

The document discusses database normalization. It defines normalization as the process of removing redundant data to improve storage efficiency, data integrity, and scalability. Normalization involves splitting tables into multiple tables to reduce update, insertion, and deletion anomalies. The document outlines various normal forms including 1NF, 2NF, 3NF, BCNF and provides examples to illustrate how to decompose relations to conform to each normal form. It also discusses functional dependencies and how they relate to normalization.

Uploaded by

Vasantha Kumari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views16 pages

Unit 3 Updated FG

The document discusses database normalization. It defines normalization as the process of removing redundant data to improve storage efficiency, data integrity, and scalability. Normalization involves splitting tables into multiple tables to reduce update, insertion, and deletion anomalies. The document outlines various normal forms including 1NF, 2NF, 3NF, BCNF and provides examples to illustrate how to decompose relations to conform to each normal form. It also discusses functional dependencies and how they relate to normalization.

Uploaded by

Vasantha Kumari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Unit – III

Relational Database Design


Normalization
Database normalization is the process of removing redundant data from the database to improve
storage efficiency, data integrity, and scalability.
In the relational model, methods exist for quantifying how efficient a database is. These
classifications are called normal forms (or NF), and there are algorithms for converting a given
database between them.
Normalization generally involves splitting existing tables into multiple ones, which must be re-
joined or linked each time a query is issued.
Why Normalization?
Wastage of Storage
Causes problems with update anomalies
 Insertion anomalies
 Deletion anomalies
 Modification anomalies
EXAMPLE OF AN UPDATE ANOMALY
Consider the relation:
EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
 Changing the name of project number P1 from “Billing” to “Customer Accounting” may cause this
update to be made for all employees working on project P1.
EXAMPLE OF AN INSERT ANOMALY
Consider the relation:
EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
 Cannot insert a project unless an employee is assigned to it.
 Conversely cannot insert an employee unless an he/she is assigned to a project.
EXAMPLE OF AN DELETE ANOMALY
Consider the relation:
EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
 When a project is deleted, it will result in deleting all the employees who work on that project.

1
 Alternately, if an employee is the sole employee on a project, deleting that employee would result
in deleting the corresponding project.
Benefits of Normalization
Less storage space
Quicker updates
Less data inconsistency
Clearer data relationships
Easier to add data
Flexible Structure
Guidelines
Each tuple in a relation should represent one entity or relationship instance.
Design a schema that does not suffer from the insertion, deletion and update anomalies.
If there are any anomalies present, then note them so that applications can be made to take them
into account.
Relations should be designed such that their tuples will have as few NULL values as possible
Attributes that are NULL frequently could be placed in separate relations
Types of Normalization
First Normal Form
Second Normal Form
Third Normal Form
Boyce Codd Normal Form
Fourth Normal Form (Multi Valued Dependencies)
Fifth Normal Form (Join Dependencies)
Functional Dependencies
An attribute Y is said to have a functional dependency on a set of attributes X (written X →Y) if and only
if each X value is associated with precisely one Y value.
Example

2
Consider the relation EMP_PROJ,
The functional Dependencies are:
{Ssn,Pnumber} -> {Hours}
{Ssn} -> {Ename }
{Pnumber} -> {Pname, Plocation}
It can be represented as:

Types of Functional Dependencies


1. Trivial functional dependency
A trivial functional dependency is a functional dependency of an attribute on a superset of itself.
{Ssn,Pnumber} -> {Hours} Trivial
{Ssn} -> {Ename } Non trivial
2. Full functional dependency
An attribute is fully functionally dependent on a set of attributes X if it is
– functionally dependent on X, and
– not functionally dependent on any proper subset of X.
{Ssn,Pnumber} -> {Hours}
3. Transitive dependency
A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue of X→Y
and Y→Z.
4. Multivalued dependency

3
A multivaluesd dependency is a constraint according to which the presence of certain rows in a table
implies the presence of certain other rows.
5. Join dependency
A table T is subject to a join dependency if T can always be recreated by joining multiple tables each
having a subset of the attributes of T.
Inference Rules for FDs
(Reflexive) If Y subset-of X, then X -> Y
(Augmentation) If X -> Y, then XZ -> YZ
(Transitive) If X -> Y and Y -> Z, then X -> Z
Decomposition: If X -> YZ, then X -> Y and X -> Z
Union: If X -> Y and X -> Z, then X -> YZ
Pseudo transitivity: If X -> Y and WY -> Z, then WX -> Z

First Normal Form (1NF)


A relation is said to be in First Normal Form (1NF) if and only if each attribute of the relation is
atomic.
It does not allows the composite and multi valued attributes.
First Normal Form (1NF) sets the very basic rules for an organized database:
 Eliminate duplicative columns from the same table.
 Create separate tables for each group of related data and identify each row with a unique column
(the primary key).
Example
Consider the relation, EMP_DEPT with the attributes Ename, Ssn, Bdate, Address, Dnumber, Dname and
Dmgr_ssn. The list of records of EMP_DEPT is shown below:

4
In this example, the functional dependencies are:
{Ename -> Ssn, Bdate, Address, Dnumber}
{Dnumber -> Dname, Dmgr_Ssn}
Here, the values of {Dnumber, Dname, Dmgr_ssn} are not atomic. Because, for the value of Dnumber 5,
the Dname is Research and Dmrg_ssn is 333445555. These values are repeated when we have the
Dnumber is 1.
To eliminate this repetition, decompose the relation EMP_DEPT into the sets that contains the attributes
{Ename, Ssn, Bdate, Address, Dnumber}
{Dnumber, Dname, Dmgr_Ssn}
Now the relation is decomposed into:

5
The department details are stored in the separate table. There is only one entry for each department,
since all the repetitive data are removed.

Second Normal Form (2NF)


A relation schema R is in second normal form (2NF) if a relation in 1NF and every non key
attribute A in R is fully functionally dependent on the primary key
Example
Consider the relation EMP_PROJ,

The functional Dependencies are:


{Ssn,Pnumber} -> {Hours}
{Ssn} -> {Ename }
{Pnumber} -> {Pname, Plocation}
It can be represented as:

The property of second normal form which says that every non key attribute is fully functionally
dependent on the primary key. So, decompose the relation into the number of sub relations, each sub
relation is having one functional dependency.
Apply Second Normal Form,

6
Now the sub relation that contains the following records,

Third Normal Form


A relation schema R is in third normal form (3NF) if a table is in second normal form (2NF) and
there are no transitive dependencies.
Meet all the requirements of the 1NF
Meet all the requirements of the 2NF
Remove columns that are not dependent upon the primary key.
Example
Consider the relation, EMP_DEPT with the attributes Ename, Ssn, Bdate, Address, Dnumber, Dname and
Dmgr_ssn. The list of records of EMP_DEPT is shown below:

In this example, the functional dependencies are:


{Ename -> Ssn, Bdate, Address, Dnumber}
{Dnumber -> Dname, Dmgr_Ssn}

7
The transitive dependency in this table is:

{Ssn -> Dnumber-> DMgrSsn}


Now remove this transitive dependency, by decompose the given relation into:

The decomposed table ED1 and Ed2 contains the records such as:

8
Boyce Codd Normal Form
A relation schema R is in BCNF if for every nontrivial FD X-> Y in R, X is a candidate key.
A relation is in BCNF, if and only if, every determinant is a candidate key. A determinant is any
attribute (simple or composite) on which some other attribute is fully functionally dependent.
Each normal form is strictly stronger than the previous one. Every 2NF relation is in 1NF. Every
3NF relation is in 2NF.
Every BCNF relation is in 3NF. There exist relations that are in 3NF but not in BCNF. The goal is
to have each relation in BCNF (or 3NF)

Decomposition Properties
Lossless: Data should not be lost or created when splitting relations up
Dependency preservation: It is desirable that FDs are preserved when splitting relations up

3NF VS BCNF

3NF BCNF
A relation schema R is in 3NF if for every A relation schema R is in Boyce- Codd Normal
nontrivial FD X-> Y in R, X is not a candidate key Form (BCNF) if for every nontrivial FD X-> Y in
R, X is a candidate key
3NF has some redundancy BCNF removes all redundancies caused by FD’s
Performance is Lesser than BCNF Better Performance than 3NF
Normalisation to 3NF is always lossless and Normalisation to BCNF is lossless, but may not
dependency preserving preserve all dependencies

BCNF Example
Consider the relation SCT(Student, Course, Teacher) with following records
Student Course Teacher
S1 C1 A
S1 C2 B
S2 C1 A
S2 C2 C

9
In this example,

 Each student can take a course from one teacher only.


 Each teacher teaches one course only.

The functional dependencies in the relation TCS are:

(S,C) is a primary Key.

This relation is in 3NF. Decompose the relation into ST & TC.

Now the relation is decomposed into the following relations

Student Teacher Teacher Course


S1 A A C1
S1 B B C2
S2 A
C C2
S2 C

Multi Valued Dependencies or Fourth Normal form


 A multivalued dependency on R, X ->>Y, says that if two tuples of R agree on all the attributes of
X, then their components in Y may be swapped, and the result will be two tuples that are also in
the relation.
 Multivalued dependency is also referred as a tuple-generating dependency. The multivalued
dependency also plays a role in 4NF normalization.
 As with functional dependencies, we shall use multivalued dependencies in two ways:
1. To test relations to determine whether they are legal under a given set of functional and
multivalued dependencies

10
2. To specify constraints on the set of legal relations; we shall thus concern ourselves with only
those relations that satisfy a given set of functional and multivalued dependencies

Properties of MVD
 If α ->> β, Then α ->> R − β
 If α ->> β and δ  γ , Then αδ ->> βγ
 If α ->> β and If β ->> γ, then α ->> γ - β
 A decomposition of R into (X, Y) and (X, R-Y) is a lossless-join decomposition if and only if X ->>
Y holds in R.

Decomposition Theorem
The split of relations is guaranteed to be lossless if the intersection of the attributes of the new tables is
a key of at least one of them.

4NF
 A relation R is in 4NF if and only if, for every one of its non-trivial multivalued dependencies
XY, X is a super key—that is, X is either a candidate key or a superset thereof.
 Nontrivial MVD means that:
 Y is not a subset of X, and
 X and Y are not, together, all the attributes

11
 A table is said to be in 4NF if and only if it is in the BCNF and multi-valued dependencies are
functional dependencies. The 4NF removes unwanted data structures: multi-valued
dependencies
Guidelines
 There is no multivalued dependency in the relation
 There are multivalued dependency but the attributes are dependent between themselves
 Either of these conditions must hold true in order to be fourth normal form
 The relation must also be in BCNF
 Fourth normal form differs from BCNF only in that it uses multivalued dependencies

Decomposition into 4NF


If XY is a 4NF violation for relation R, we can decompose R using the same technique as for BCNF.
1. XY is one of the decomposed relations.
2. All but Y – X is the other.

4NF Example1
Consider the relation

The list of Multi Valued dependencies are:


 Deptname->>Textbook
 Deptname->>Professor
Decompose the relation into two sub relations as
 Relation1: Deptname, Textbook
 Relation2: Deptname, Professor
Example2
Consider the relation
Drinkers (name, addr, phones, beersLiked)

12
 FD: name->> addr
 Non Trivial MVD:
o name ->>phones
o name ->> beersLiked
 Only Key: name, phones, beersLiked
 All three dependencies violate 4NF
 Successive Decomposition which yields 4NF:
o D1(name, addr)
o D2(name, phones)
o D3(name, beersLiked)

Join Dependencies or Fifth Normal Form


 An entity is in Fifth Normal Form (5NF) if, and only if, it is in 4NF and every join dependency for
the entity is a consequence of its candidate keys.
 A relation decompose into two relations must have the lossless-join property, which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation.
 Fifth normal form deals with cases where information can be reconstructed from smaller pieces
of information that can be maintained with less redundancy. Second, third, and fourth normal
forms also serve this purpose, but fifth normal form generalizes to cases not covered by the
others.
 Tables are said to be in fifth normal form when:
- The table meets the criteria for fourth normal form.
- The table consists of a key attribute and a non-key attribute only.
Example
Consider the table
AGENT_COMPANY_PRODUCT (Agent, Company, Product _Name)
This table lists agents, the companies they work for and the products they sell for those companies. 'The
agents do not necessarily sell all the products supplied by the companies they do business with. An
example of this table might be:

13
The table is necessary in order to show all the information required. Suneet, for example, sells ABC's
Nuts and Screws, but not ABC's Bolts. Raj is not an age it for CDE and does not sell ABC's Nuts or Screws.
The table is in 4NF because it contains no multi-valued dependency. It does, however, contain an
element of redundancy in that it records the fact that Suneet is an agent for ABC twice. But there is no
way of eliminating this redundancy without losing information. Suppose that the table is decomposed
into its two projections, PI and P2.

The redundancy has been eliminated, but the information about which companies make which products
and which of these products they supply to which agents has been lost. The natural join of these
projections over the 'agent' columns is:

The table resulting from this join is spurious, since the asterisked row of the table contains incorrect
information. Now suppose that the original table were to be decomposed into three tables, the two
projections, P I and P2 which have already shown, and the final, possible projection, P3.

14
If a join is taken of all three projections, first of PI and P2 with the (spurious) result shown above, and
then of this result with P3 over the 'Company' and 'Product name' column, the following table is
obtained:

This still contains a spurious row. The order in which the joins are performed makes no difference to
the final result. It is not simply possible of decompose the 'AGENT_COMPANY_PRODUCT' table,
populated as shown, without losing information. Thus, it has to be accepted that it is not possible· to
eliminate all redundancies using normalization techniques, because it cannot be assumed that all
decompositions will be non-loss.
But now consider the different case where, if an agent is an agent for a company and that company
makes a product, then he always sells that product for the company. Under these circumstances, the
'agent company product' table as shown below:

The assumption being that ABC makes both Nuts and Bolts and that CDE makes Bolts only. This table
can be decomposed into its three projections without loss of information as demonstrated below:

All redundancy has been removed, if the natural join of PI and P2 IS taken, the result is:

15
The spurious row as asterisked. Now, if this result is joined with P3 over the column 'company
'product_name' the following table is obtained:

This is a correct recomposition of the original table and no loss decomposition into the three projections
was achieved. Again, the order in which the joins are performed does not affect the final result.

16

You might also like