0% found this document useful (0 votes)
16 views27 pages

RDBMS Unit3 Informaldesign Guidelines

Informal gd

Uploaded by

yoyo36685
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views27 pages

RDBMS Unit3 Informaldesign Guidelines

Informal gd

Uploaded by

yoyo36685
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

NORMALIZATION: DATABASE DESIGN

THEORY

https://www.youtube.com/watch?v=NFk9sDJk50U

https://www.youtube.com/watch?v=gInecSg-36Y
Contents

• Informal design guidelines for relation schemas


• Functional Dependencies
• Normal Forms Based on Primary Keys
Informal design guidelines for relation
schemas
Four informal guidelines that may be used as measures to determine the quality of
relation schema design:
1. Making sure that the semantics of the attributes is clear in the schema
2. Reducing the redundant information in tuples
3. Reducing the NULL values in tuples
4. Disallowing the possibility of generating spurious tuples
1. Semantics of the Relation Attributes
Semantics, specifies how to interpret the attribute values stored in a tuple of the
relation-in other words, how the attribute values in a tuple relate to one another.

• Whenever we group attributes to form relation, we assume certain meaning


associated with attributes.
• This meaning we call “SEMANTICS”.
• It tells how to interpret the values stored.
• Easier the semantics, better the relation schema would be.
• The DNUMBER attribute is a foreign key that represents an implicit
relationship between EMPLOYEE and DEPARTMENT.

• The ease with which the meaning of a relation's attributes can be explained
is an informal measure of how well the relation is designed.

• In DEPT_LOCATIONS and WORKS_ON, the schema


DEPT_LOCATIONS represents a multi-valued attribute of
DEPARTMENT, where as WORKS_ON represents an M:N relationship
between EMPLOYEE and PROJ ECT

• Hence, all the relation schemas may be considered as easy to explain and
hence good from the standpoint of having clear semantics.
• We can thus formulate the following informal design guideline.
GUIDELINE 1
• Design a relation schema so that it is easy to explain its meaning.
• Do not combine attributes from multiple entity types and relationship types
into a single relation.
There is nothing wrong logically with these two relations, they are considered
poor designs because they violate Guideline 1 by mixing attributes from
distinct real-world entities
2. Redundant Information in Tuples and Update
Anomalies

One goal of schema design is to minimize the storage space used by the base
relations.

• Mixing attributes of multiple entities may cause problems


• Information is stored redundantly wasting storage
• Another serious problem with using the relations in as base relations is the
problem of update anomalies.
• These can be classified into insertion anomalies, deletion anomalies, and
modification anomalies.

Insertion Anomalies:
An Insert Anomaly occurs when certain attributes cannot be inserted into the
database without the presence of other attributes.

Insertion anomalies can be differentiated into two types, illustrated by the


following examples based on the EMP_DEPT relation.

1. To insert a new employee tuple into EMP_DEPT, we must include either


the attribute values for the department that the employee works for, or
nulls.
2. It is difficult to insert a new department that has no employees as yet in
the EMP_DEPT relation.
we can't add a new course unless we have at least one student enrolled on the course.
Deletion Anomalies:
A Delete Anomaly exists when certain attributes are lost because of the
deletion of other attributes.

• If we delete from EMP_DEPT an employee tuple that happens to represent


the last employee working for a particular department, the information
concerning that department is lost from the database

Consider what happens if Student S30 is the last student to leave the course -
All information about the course is lost.
Modification Anomalies:

An Update Anomaly exists when one or more instances of duplicated data is


updated, but not all.

• In EMP_DEPT, if we change the value of one of the attributes of a


particular department-say, the manager of department 5-we must update the
tuples of all employees who work in that department; otherwise, the
database will become inconsistent.

Consider Jones moving address - you need to update all instances of Jones's
address.
Based on the preceding three anomalies, we can state the guideline that
follows:

GUIDELINE 2

• Design the base relation schemas so that no insertion, deletion, or


modification anomalies are present in the relations.

• If any anomalies are present, note them clearly and make sure that the
.
programs that update the database will operate correctly
3. Null Values in Tuples
• In some schema designs, we may group many attributes together into a
"fat" relation (More no. of attributes in a single relation where not all
attributes are totally functionally dependent on prime attribute).
• For Example: In a Student Relation, a student having multiple phone
numbers say phno1,phno2 and phno3. Only few students may have more
than 2 phone nos. so rest of the students will keep that attribute value as a
blank or NULL so we should try to avoid it.
• Another example: Department having multiple locations where not all the
department have more than one location so rest of the tuple values will be
filled with NULL
• If many of the attributes do not apply to all tuples in the relation, we end up
with many nulls in those tuples.
• For Example.: If Apartment no. is there in a relation and if you are not
living in a apartment then the value for that attribute will end up with
NULL as it is not applicable to you.
GUIDELINE 3:

• As far as possible, avoid placing attributes in a base


relation whose values may frequently be null.
• If nulls are unavoidable, make sure that they apply in
exceptional cases only and do not apply to a majority of
tuples in the relation.
4. Generation of Spurious Tuples
•A spurious tuple is, basically, a record in a database that gets created when two
tables are joined badly.

•Spurious tuple means a Generation of an extra tuple without a notice. We should


avoid it.

•Decomposition in a Relation will be based on a Primary key.

•Split the relation based on Non-Primary key results in a generation of Spurious


tuples or Incorrect Information.

For Example:
Let us consider two relation schema
Emp_Locs(ename, plocation)
Emp_proj1(eno, pnumber, hours, pname, plocation)
• If we attempt a natural join operation on above relation schema, the result
produces many more tuples than the original set of tuples.
• Additional tuples that were not there in Emp_proj1 are called spurious tuples
because they represent wrong information which is not valid.
The two relations EMP_PROJ1 and EMP_LOCS as the base relations of EMP_PROJ1,
is not a good schema design.

Problem is if a Natural Join is performed on above two relations it produces more


tuples than origin set of tuple in EMP_PROJ1 based on non-key attributes Hours and
Pname.

These additional tuples that were on it present in EMP_PROJ1 are called Spurious
Tuples because they represented spurious or wrong information that are not valid.

This is because the Plocation attribute which is used for joining the two relations is
neither a primary key, nor a foreign key in either EMP_LOC and EMP_PROJ1.
GUIDELINE 4

• Design relation schemas so that they can be joined with equality conditions
on attributes that are either primary keys or foreign keys in a way that
guarantees that no spurious tuples are generated.

• Avoid relations that contain matching attributes that are not (foreign key,
primary key) combinations, because joining on such attributes may
produce spurious tuples.
Functional Dependencies
Normal Forms Based on Primary Keys

Normalization of Relations

The normalization process, as first proposed by Codd (1972). Codd proposed three
main normal forms, which he called first, second, third normal form and Boyce-
Codd normal form (BCNF- an extension of 3NF).

All these normal forms are based on functional dependencies among the attributes
of a relation.

Later, a fourth normal form (4NF) and a fifth normal form (5NF) were proposed,
based on the concepts of multivalued dependencies and join dependencies,
respectively;
• Normalization of data can be considered a process of analyzing
the given relation schemas based on their FDs and primary keys to
achieve the desirable properties of
(1) minimizing redundancy and
(2) minimizing the insertion, deletion, and update anomalies.

• It can be considered as a “filtering” or “purification” process to


make the design have successively better quality.

You might also like