0% found this document useful (0 votes)
352 views

4th Module DBMS Notes

Normalization is a process for organizing data in tables to minimize redundancy and dependency. It involves decomposing tables to eliminate anomalies like insertion, update, and deletion anomalies. It is done by applying normal forms like 1NF, 2NF, 3NF and BCNF to tables. The goals of normalization are to eliminate redundant data, ensure logical data dependencies, and avoid data anomalies. It results in a cleaner and more flexible database structure.

Uploaded by

Arun Godavarthi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
352 views

4th Module DBMS Notes

Normalization is a process for organizing data in tables to minimize redundancy and dependency. It involves decomposing tables to eliminate anomalies like insertion, update, and deletion anomalies. It is done by applying normal forms like 1NF, 2NF, 3NF and BCNF to tables. The goals of normalization are to eliminate redundant data, ensure logical data dependencies, and avoid data anomalies. It results in a cleaner and more flexible database structure.

Uploaded by

Arun Godavarthi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

NORMALIZATION

Database tables and Normalization:


 The table is the basic building block in the database design process.
 Normalization is a process for evaluating and correcting table structures to minimize data
redundancies, thereby reducing the likelihood of data anomalies.
 A systematic approach of decomposing tables to eliminate data redundancy (repetition)
and undesirable characteristics like INSERTION, UPDATE and DELETION anomalies.
 A multi-step process that puts data into tabular form, removing duplicated data from the
relation tables.
 Normalization is used mainly for two purposes:
1. Eliminating redundant (useless) data.
2. Ensuring data dependencies make sense i.e., data is logically stored.
 Problems without Normalization:
1. If a table is not properly normalized and have data redundancy then it will not only
eat up extra memory space but will also make it difficult to handle and update the
database, without facing data loss.
2. Insertion, Updation and DeletionAnomalies are very frequent.
 Normalization works through a series of stages called normal forms.
 The normal forms are based on FDs:
1. First normal form(1NF).
2. Second normal form(2NF).
3. Third normal form(3NF).
4. BCNF (boycecodd normal form).
 A successful design must also consider end-user demand for fast performance.

 Denormalization results to lower normal form; i.e. a 3NF will be converted to a 2NF
through denormalization.
 However, the price we pay for increased performance through denormalization is
greater data redundancy.

Need for normalization:

 Minimizes data redundancy (duplicate data).


 Minimizes null values.
 Results in a more compact database (due to less data redundancy/null values).
 Minimizes/avoids data modification issues.
 Simplifies queries.
 The database structure is cleaner and easier to understand. You can learn a lot about a
relational database just by looking at its schema.
 You can extend the database without necessarily impacting the existing data.
 Searching, sorting, and creating indexes can be faster, since tables are narrower, and
more rows fit on a data page.

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

Data should be consistent throughout the database i.e. it should not suffer from
following anomalies.
UPDATE ANOMALIES:
If one copy of such repeated data is updated, an inconsistency is created unless all
copies are similarly updated.
INSERTION ANOMALIES:
It may not be possible to store certain information unless some other, unrelated,
information is stored as well.
DELETION ANOMALIES:
It may not be possible to delete certain information without losing some other,
unrelated, information as well.

Consider a relation obtained by translating a variant of the Hourly_Emps entity set


frorn Chapter 2:

Hourly_Emps(ssn, name, lot, rating, hourly_wages, hoursworked)

The key for Hourly_Emps is ssn. In addition, suppose that the hourly_wages attribute
is deterrnined by the rating attribute. That is, for a given rating value, there is only
one permissible hourly_wages value.This IC is an example of a functional
dependency. It leads to possible redundancy in the relation Hourly_Emps, as
illustrated in Figure 19.1.

Redundant Storage: The rating value 8 corresponds to the hourly wage 10,and this
(association is repeated three tirnes.
Update Anomalies: The hourly wages in the first tuple could be updated without making a
sirnilar change in the second tuple.
• Insertion Anamolies: We cannot insert a tuple for an employee unless we know the hourly
wage for the ernployee's rating value.
• Deletion Anamolies: If we delete all tuples with a given rating value (e.g., we delete the
tuples for Snlcthurst and Guldu) we lose the association between that rating value and its
hourly_wage value.

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

The Normalization process: The use normalization to produce a set of normalized tables to
store the data that will be used to generate the required information.
The objective of normalization is to ensure that each table conforms to the concept of well-
formed relations, that is, tables that have the following characteristics:
 Each table represents a single subject. For example, a course table will contain only
data that directly pertains to courses. Similarly, a student table will contain only
student data.
 No data item will be unnecessarily stored in more than one table (in short, tables have
minimum controlled redundancy). The reason for this requirement is to ensure that the
data are updated in only one place.
 All nonprime attributes in a table are dependent on the primary key—the entire
primary key and nothing but the primary key. The reason for this requirement is to
ensure that the data are uniquely identifiable by a primary key value.

EID ENAME DEPARTMENT CONTACT


1 Aravind CSE 9854734597
2 Karthikeya CSE 9873216540
3 Suresh ECE 7893216501
4 Kiran ECE 8794562103
5 Sai Charan EEE 9146721307
 Each table is void of insertion, update, or deletion anomalies. This is to ensure the
integrity and consistency of the data.

To accomplish the objective, the normalization process takes you through the steps that
lead to successively higher normal forms. The most common normal forms and their
basic characteristic are listed in Table.

NORMAL FORM CHARACTERISICS


First normal form (1NF) Every field contains only atomic values i.e. no lists or
sets
Second normal form (2NF) 1NF and no transitive dependencies
Second normal form (2NF) 2NF and no transitive dependencies
Boyce-Codd normal form Every determinant is a candidate key
(BCNF)
Fourth normal form (4NF) 3NF and no independent multivalued dependencies

Improving the design:


 Evaluate the primary keys entered in the table/relation is error free else violations occur
and also check for the dependencies.
 Adhere to the naming conventions. Be precise with the names given to the attributes.
 It is generally a good practice to have atomic attributes.

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

 The designer must take care to place the right attributes in the right tables by using
normalization principles.
 All data of both current and historical has to be maintained properly and correctly.

THE NEED FOR NORMALIZATION

Normalization is the process of removing redundant data from your tables in order to
improve storage efficiency, data integrity and scalability. This improvement is balanced against
an increase in complexity and potential performance losses from the joining of the normalized
tables at query-time.
There are two goals of the normalization process:
 Eliminating redundant data (for example, storing the same data in more than one table)
 Ensuring data dependencies make sense (only storing related data in a table).
Both of these are worthy goals as they reduce the amount of space a database consumes and
ensure that data is logically stored.
Normalization is the aim of well design Relational Database Management System
(RDBMS). It is step by step set of rules by which data is put in its simplest forms.
We normalize the relational database management system because of the following reasons:
1. Minimize data redundancy i.e. no unnecessarily duplication of data.
2. To make database structure flexible i.e. it should be possible to add new data values and
rows without reorganizing the database structure.
Data should be consistent throughout the database i.e. it should not suffer from following
anomalies.
 Insert Anomaly - Due to lack of data i.e., all the data available for insertion such that
null values in keys should be avoided. This kind of anomaly can seriously damage a
database
 Update Anomaly - It is due to data redundancy i.e. multiple occurrences of same values
in a column. This can lead to inefficiency.
 Deletion Anomaly - It leads to loss of data for rows that are not stored elsewhere. It
could result in loss of vital data.
Complex queries required by the user should be easy to handle.
On decomposition of a relation into smaller relations with fewer attributes on normalization the
resulting relations whenever joined must result in the same relation without any extra rows. The
join operations can be performed in any order. This is known as Lossless Join decomposition.
The resulting relations (tables) obtained on normalization should possess the
properties such as each row must be identified by a unique key, no repeating groups,
homogenous columns, each column is assigned a unique name etc .
Functional Dependency

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

 Functional dependency is a relationship that exists when one attribute uniquely determines
another attribute.
 If R is a relation with attributes X and Y, a functional dependency between the attributes is
represented as X->Y, which specifies Y is functionally dependent on X. Here X is a determinant
set and Y is a dependent attribute. Each value of X is associated with precisely one Y value.
 Functional dependency in a database serves as a constraint between two sets of attributes.
Defining functional dependency is an important part of relational database design and
contributes to aspect normalization.
 For example:
If X->Y and t1.X=t2.X, then t1.Yshould be equal to t2.Y i.e t1.Y=t2.Y where t1 and t2 are the
tuples.
The table given below shows an instance where AB->C.
A B C D
a1 b1 c1 d1
 Full Functional Dependency: In a relation,
a1 b1 c1 d2 there exists Full Functional Dependency between any
a1 b2 c2 d1 two attributes X and Y, when X is functionally
dependent on Y and is not functionally dependent on
a2 b1 c3 d1 any proper subset of Y.

 Partial Functional Dependency: In a relation, there exists Partial Dependency, when a non
prime attribute (the attributes which are not a part of any candidate key ) is functionally
dependent on a proper subset of Candidate Key.

 For example : Let there be a relation R ( Course, Sid , Sname , fid, schedule , room , marks )
 Full Functional Dependencies: {Course , Sid) -> Sname , {Course , Sid} -> Marks, etc.
 Partial Functional Dependencies : Course -> Schedule , Course -> Room
Closure set of Attributes:
The set of all those attributes which can be functionally determined from an attribute set is
called as a closure of that attribute set.
Closure of attribute set {X} is denoted as {X}+.
Inference rules:
These are the rules to find additional functional dependencies:
1. Reflexive property: If X⊇Y (X is a superset of Y), then X->Y.

Ex: AB->A as A is a subset of AB.


2. Transitive property: If A->B and B->C, then A->C.

3. Augmentation property: If A->B, then AC->BC for any C.

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

4. Union: If A->B and A->C, then A->BC.

5. Decomposition: If A->BC, then A->B and A->C.

6. Composition: If A->B and C->D, then AC->BD.

To find Closure of attributes:


Example-Consider a relation R(A,B,C) with the functional dependencies-
AB->C
C->A
Here B+={B}
AB+={ABC}
CB+={CBA}
Candidate Key:
Candidate key is an attribute, or a set of attributes that determine all other columns in the table.
Candidate key should not have any redundant attributes.
To find a candidate key
Example:
Consider a relation R={A,B,C,D,E,F,G} with functional dependencies:-
AB->CD
AF->D
DE->F
C->G
F->E
G->A
As candidate key cannot be determined by any other attribute, we need to check for the keys that
are not determined by any other key.
In the above example, B uniquely defines the tuple.
We need to find the closure of the attribute which uniquely defines a tuple.
B+= {B}
As B cannot determine all the attributes, we should find the exterior relation.
AB+= {ABCDG}
CB+= {CBGAD}

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

DB+= {DB}
EB+= {EB}
FB+= {FBE}
GB+= {GBACD}
As AB, CB, DB, EB, FB, GB cannot determine all the attributes, we should again find the
exterior relation.
ACB+= {ACBDG}
CFB+= {CFBGEAD}
CGB+= {CGBAD}
CDB+= {CDBGA}
CEB+= {CEBGADF}
AFB+= {AFBCDEG}
AGB+= {AGBCD}
AEB+= {AEBCDFG}
ADB+= {ADBCG}
Out of these AFB, AEB, CFB, CEB are candidate key
Refer to first Normal form before 2nd normal form
2NF
 To Avoid Partial Dependencies
 In Relation schema R is in 2NF,If every non-prime attribute is not dependent
on any key
 A functional dependency x->y is partial dependency, if some attribute A ∈
X can be removed from X and the dependency still holds
 An attribute that is not part of any candidate key is known as non-prime
attribute.
RULES FOR 2NF
 Table should be in 1NF
 Single column primary key
 All non-key attributes are fully functional dependent on the primary key

Example: Suppose a school wants to store the data of teachers and the subjects they teach.
They create a table that looks like this: Since a teacher can teach more than one subjects, the
table can have multiple rows for a same teacher.

Teacher_id Subject Teacher_age

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40

Candidate Keys: {Teacher_id, subject}


Non prime attribute: Teacher_age
 The table is in 1 NF because each attribute has atomic values. However, it is not in
2NF because non-prime attribute Teacher_age is dependent on Teacher_id alone
which is a proper subset of candidate key. This violates the rule for 2NF as the rule
says “no non-prime attribute is dependent on the proper subset of any candidate key
of the table”.
To make the table complies with 2NF we can break it in two tables like this:
Teacher_details table:

Teacher_id Teacher_age
111 38
222 38
333 40

teacher_subject table:

Teacher_id Subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry

3NF
 Table must be in 2NF form
 Transitive functional dependency of non-prime attribute on any super key should be
removed
 When an indirect relationship causes functional dependency it is called Transitive
Dependency.

 If P -> Q and Q -> R is true, then P-> R is a transitive property.

 A transitive is a type of functional dependency which happens when t is indirectly formed


by two functional dependencies is known as Transitive Dependency.

OR

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

 Determining Non-Key attribute to another Non-Key attribute is known as Transitive


Dependency.

 We have to eliminate these Transitive Dependencies in 3NF

RULES
 A table is in 3NF, if it is in 2NF and for each functional dependency X-> Y at least one of
the following conditions hold:
 X is Candidate key of table
 Y is prime attribute of table

Example: Suppose a company wants to store the complete address of each employee, they
create a table named employee_details that looks like this:

Emp_i Emp_nam Emp_PI Emp_Stat Emp_city Emp_dist


d e N e
1001 A 7985 UP Noida Noida
1002 B 4567 TN Chennai M-city
1003 C 1254 AP Vizag Vizag
1004 D 78914 Karnataka Bengalur Bengalur
u u
1005 E 32174 Kerala Punalur Kollam

Candidate Keys: {emp_id}


Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any
candidate keys.
 Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is
dependent on emp_id that makes non-prime attributes (emp_state, emp_city &
emp_district) transitively dependent on super key (emp_id). This violates the rule of 3NF.
To make this table complies with 3NF we have to break the table into two tables to remove
the transitive dependency:
employee table:

Emp_id Emp_name Emp_PIN


1001 A 7985
1002 B 4567
1003 C 1254
1004 D 78914
1005 E 32174

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

employee_PIN table:

Emp_PIN Emp_state Emp_city Emp_dist


7985 UP Noida Noida
4567 TN Chennai M-city
1254 AP Vizag Vizag
78914 Karnataka Bengaluru Bengaluru
32174 Kerala Punalur Kollam

BCNF
 A Relation schema R is in BCNF, if whenever a non-trival functional dependencies X->A
hold inn R then ‘X’ is candidate key of R i.e, the determinant of all functional
dependencies must be super key
 A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X
should be the candidate key of the table.
Example: Suppose there is a company wherein employees work in more than one
department. They store the data like this:

Emp_i Emp_natio Emp_dept Dept_typ Dept_no.em


d n e p
1001 Indian Productio D001 200
n
1001 Indian Stores D001 150
1002 American Designing D134 100
1002 American Purchase D134 50

Functional dependencies in the table above:


emp_id->emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}

The table is not in BCNF as neither emp_id nor emp_dept alone are keys.

To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:

Emp_i Emp_nation
d
1001 Indian
1002 American
emp_dept table:

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

Emp_dept Dept_type Dept_no.em


p
Production D001 200
Stores D001 150
Designing D134 100
Purchase D134 50

dept_mapping table:

Emp_id Emp_dept
1001 Production
1001 Stores
1002 Designing
1002 Purchase

Functional dependencies:
emp_id->emp_nationality
emp_dept -> {dept_type, dept_no.emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies determinant( left side part)is a
key
4th Normal Form
Rules for 4th Normal Form
For a table to satisfy the Fourth Normal Form, it should satisfy the following two conditions:
1. It should be in the Boyce-Codd Normal Form.

2. And, the table should not have any Multi-valued Dependency.

What is Multi-valued Dependency?


A table is said to have multi-valued dependency, if the following conditions are true,
1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the
table may have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.

3. And, for a relation R(A, B,C), if there is a multi-valued dependency between, A and B, A
and C then B and C should be independent of each other.

Examples 2- consider a table with Subject, Lecturer who teaches each subject and
recommended Books for each subject.
If we observe the data in the table above it satisfies 3NF.

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

But LECTURER and BOOKS are two independent entities here.


The attributes books and lecturers have multiple values.
So,there is no relationship between Lecturer and Books.
-> In the above example, either Alex or Bosco can teach Mathematics. For Mathematics
subject, student can refer either 'Maths Book1' or 'Maths Book2'. i.e.;

SUBJECT  LECTURER
SUBJECTBOOKS
->After decomposition of table into 4th normal form:

The relation R (subject, lecturer, books) is decomposed into 2 relations:


1.R(subject, lecturer)
2.R(subject, books) and there is a common attribute (subject) among the two relations.

EQUIVALENCE
Relational schemas F and R are said to be equivalent if F covers R and R covers F.
Example 1:
F:{A->C,AC->D,E->AD,E->H}
G:{A->CD,E->AH}
In F:
A+= {ACD}
AC+= {ACD}
E+= {EAHCD}
F covers G.
In G:
A+= {ACD}

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

E+= {EADHC}
G covers F.
Hence F and G are equivalent.
Example 2:
F:{A->B,B->C,C->D}
G:{A->BC,C->D}
In F:
A+= {ABC}
B+= {B}
F does not cover G.
In G:
A+= {ABCD}
C+= {CD}
G covers F.
Hence F and G are not equivalent.

MINIMAL COVER:
We say that a set of functional dependencies F covers another set of functional dependencies
G, if every functional dependency in G can be inferred from F. More formally, F covers G if
G+ ⊆ F+. F is a minimal cover of G if F is the smallest set of functional dependencies that
cover G. We won’t prove it, but every set of functional dependencies has a minimal cover.
Also, note that there may be more than one minimal cover.
STEPS TO FIND MINIMAL SET:
1.put the FD's in a standard form, split the FD's based on the decomposition rule such that
RHS contain single attribute.
2. find the redundant FD's and delete from the set.
3.find the redundant attributes on LHS and delete them.
EXAMPLE1
Let us apply these properties to F={A->B, C->B, D->ABC, AC->D}
STEP1: Right hand side (R.H.S) there should be single attribute so decompose D->ABC as
follows;
D->A, D->B, D->C;

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

F={A->B, C->B, D->A, D->B, D->C, AC->D}


STEP 2: Eliminate redundant FD's
(A)+ ={A} (on removing A->B)
(C)+ ={C} (on removing C->B)
(D)+={BCD} (on removing D->A)
(D)+={ACDB} (on removing D->B)
The FD D->B is redundant i.e. (even if we remove the FD D->B we can get the FD D->B
from other functional dependency.
(D)+={{ABD} (on removing D->C)
(AC)+={ACB} (on removing AC->D)
Hence, we can delete D->B functional dependency.
The FD’s obtained from the 2nd step is:
F={A->B, C->B, D->A, D->C, AC->D}
STEP 3:
Delete the redundant attributes on the left-hand side (LHS).
For AC->D, we need to find closure for A and C;
If A contains C, then remove C.
If C contains A, then remove A.
(A)+={AB}, (C)+={CB} here, A closure doesn’t contain C so, we cannot remove C attribute
from AC->D. And, C closure doesn’t contain A, so we cannot remove An attribute A from
AC->D.
The minimal functional dependencies obtained are:
F={A->B, C->B, D->A, D->C, AC->D}.
EXAMPLE2
FD={A->C, AC->D, E->AD, E->H} find minimal FD set.
Step1:
Right hand side there should be single attribute for FD’s so decompose E->AD as follows;
E->A, E->D
Now FD’s obtained from step 1 are: F={A->C, AC->D, E->A, E->D, E->H}
STEP 2:

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

Eliminate redundant FD’s;


(A)+={A} (on removing A->C)
(AC)+={AC} (on removing AC->D)
(E)+={EDH} (on removing E->A)
(E)+={EACDH} (on removing E->D)
The FD E->D is redundant i.e. (even if we remove the FD E->D we can get the FD E->D
from other functional dependency).
(E)+={EADC} (on removing E->H)
Hence, we can delete E->D functional dependency.
The FD’s obtained from 2nd step is:
F={A->C, AC->D, E->A, E->H}
STEP3:
Delete the redundant attributes on left-hand side (LHS).
For AC->D, we need to find closure for A and C;
If A contains C, then remove C.
If C contains A, then remove A.
(A)+={ACD}, (C)+={C} here A closure contains C so, we can remove C attribute from AC-
>D, And, C closure doesn’t contain A, so we cannot remove an attribute A from AC->D.
The minimal functional dependencies obtained are:
F={A->C, A->D, E->A, E->H}

5th Normal Form

● A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
● 5NF is satisfied when all the tables are broken into as many tables as possible in order
to avoid redundancy. (till each table contains two columns).
● 5NF is also known as Project-join normal form (PJ/NF)

● If we can decompose table further to eliminate redundancy and anomaly, and when
we re-join the decomposed tables by means of candidate keys, we should not be

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

losing the original data or any new record set should not arise. In simple words,
joining two or more decomposed table should not lose records nor create new records.
● A relation R is in 5NF if and only if every join dependency in R is implied by the
candidate keys of R. A relation decomposed into two relations must have loss-less join
Property, which ensures that no spurious or extra tuples are generated, when relations
are reunited through a natural join.
Properties – A relation R is in 5NF if and only if it satisfies following conditions:
1. R should be already in 4NF.
2. It cannot be further non loss decomposed (join dependency)
● If you divide table in a way that it has only two columns then it doesnot have any
redundancy and when you join two tables it should form original table then only we
get lossless decomposition.
Example:

A B C

1 2 1

2 2 2

3 3 2

Fig (a):original table


The above table is in 4NF,to reduce redundancy we go for 5NF by decomposing above table
into two tables as follows:

A B

1 2

2 2

3 3
Fig (b)

B C

2 1

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

2 2

3 2

Fig(c)
In fig(a) A is determining B&C so A is a candidate key. So we are decomposing original table
into fig(b) and fig(c).In fig(b) A is candidate key ,in fig(c) B is candidate key ang B is
common attribute for both of the tables.If we perform natural join on both the tables we
should get the original table but in this example we can find invalid tuples/unneccessary data
which is not present in the original table and this leads to lossey decomposition(In lossey
decomposition,after performing natural join we will get invalid tuples without losing actual
tuples).
Natural join for fig(b) and (c):

A B C

1 2 1

1 2 2

2 2 1

2 2 2

3 3 2

fig(d):natural join on (AB) and (BC)


In fig(d) 2nd and 3rd tuples are invalid tuples.To avoid this lossey decomposition and to
maintain lossless decomposition ,the table should be decomposed in such a way that the both
tables after decomposition should contain atleast one common candidate key .so fig(a) is
decompose as:

A B

1 2

2 2

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

3 3

fig(e)

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

A C

1 1

2 2

3 2
fig(f)

If we perform natural join on fig(e) & fig(f) then we get the original table fig(a).
Rules to be followed for decomposing:
Let R be the table and is decomposed into R1 and R2
1. R1UR2 →R,when R1UR2 is performed we should get R i.e, all attributes should be
covered.
2. R1ՈR2= ɸ , when R1ՈR2 is done then it should not be null i.e,there should exist a
common attribute.
3. R1ՈR2→R1 (or) R1ՈR2→R2 i.e, the common attribute that came from intersection
should be a candidate key.

INTRODUCTION TO SCHEMA REFINEMENT


Schema refinement is an approach based on decompositions. Redundant storage of
information is the root cause of these problems. Although decomposition can eliminate
redundancy, it can lead to problems of its own and should be used with caution.
Problems Caused by Redundancy
Storing the same information redundantly, that is, in more than one place within a database,
can lead to several problems:
 Redundant Storage: source information is stored repeatedly
1) Update Anomalies: If one copy of such repeated data is updated, an inconsistency is
created unless all copies are similarly updated.
2) Insertion Anomalies: It may not be possible to store certain information unless some
other, unrelated, information is stored as well.
3) Deletion Anomalies: It may not be possible to delete certain information without
losing some other, unrelated, information as well.
CONSTRAINTS ON ENTITY SET:
Consider hourly_employes relation,
Hourly_employes(eid,ename,rating,hourlywages,hoursworked)
The key for hourly_employes is eid. In addition,suppose that the hourlywages attribute is
determined by the rating attribute. That is, for a given rating value ,there is only one
possible hourlywages value this IC is an example of functional dependency. It lead to
possible redundancy in the relation.

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

Eid ename Rating hourlywage Hours


s worke
d
1 A 8 1000 40
2 B 8 1000 30
3 C 5 500 30
4 D 5 500 32
5 E 8 1000 40

If the same value appears in the rating column of two tuples, the IC tells us that the same
value must appear in the hourlywages column as well. This redundancy has the same
negative consequences as before:
 Redandant Storage: the rating value 8 corresponds to the hourly wage 1000, and this
association is repeated three times.
 Update anomalies: The hourlywages in the first tuple could be updated without
making a similar change in the second tuple.
• Insertion anomalies: We cannot insert a tuple for an employee unless we know the
hourlywage for the employee's rating value.
• Deletion anomalies: If we delete all tuples with a given rating value (e.g., we delete the
tuples for c and d) we lose the association between that rating value and its hourlywage
value.
CONSTRAINTS ON RELATIONSHIP SET:
Suppose that we have entity sets Parts, Suppliers and departments, as well as a
relationship set Contracts that involves all of them. we refer to the schema
Contracted(cid) specifies that a supplier S will supply some quantity Q of a part P to a
department D
Consider a policy that a department purchases at most one part from any given supplier.
Therefore, if there are several contracts between the same supplier and department, we
know that the same part must be involved in all of them. This constraint is an FD :DS ->P
Again we have redundancy and its associated problems. we can address this situation by
decomposing Contracts into two relations with attributes CQSD and DP. Intuitively, the
relation DP records the part supplied to a department by a supplier, and the relation
CQSD records additional information about a contract.
IDENTIFYING ATTRIBUTES OF ENTIES:

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

the ER diagram in Figure 19.11 shows a relationship set called Works_In with an
additional key constraint indicating that an employee can work in at most one department.
(Observe the arrow connecting Employees to Works_In.)
Using the key constraint, we can translate this ER diagram into two relations:
workersCssn, name, lot, did, since)
Departments(did, dname, budget)
The entity set Employees and the relationship set works_in are mapped to a single
relation, workers.
Suppose employees are assigned parking lots based on the departments are assigned to
same lot these constraints is not expressible in ER diagram. It is an example of FD :did
->lot.
The redundancy in this design can be eliminated by decomposing the workers relation
into two relations.
Workers(ssn,name,did,since)
Dept_lots(did,lot)
Now the two relations departments and dept_lots have the same key and describes the
same entity.
Workers(ssn,name,did,since)
Departments(did,dname,budget,lot)

IDENTIFYING ENTITY SET:


Consider sailors, reserves, and boat relations with the schema
Sailors(sid,sname,rating,age)
Boats(bid,bname,color)

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

Reserves(sid,bid,day)
In addition let us add an attribute C denoting the credit card to which the reservation is
charged.
Every sailor uses a unique credit card for reservation this constraint is expressed by FD :
sailors ->creditcard. we store credit card number as sailor as often as we have reservation
for that sailor this leads to redundancy and update anomalies a solution is to decompose
reserves into 2 relations with schema of attributes (sid, bid, day) and (sid, credit card). By
this one holds information about reservations and other holds information about credit
card.

Sid CreditcarDay
Bid
d

sid

Denormalization:
● Denormalization is a strategy that database manages use to increase the performance
of a database in infrastructure.It involves adding redundant data to a normalized
database to reduce certain type of problems with database queries that combine data
from various tables into a single table.
● In a traditional normalized database we store data in separate logical tables and
attempt to minimize redundant data we may strive to have only one copy of each
piece of data in database.
For Example,in normalized database we might have a course table and a teachers table.Each
entry in courses would store the teacherID for a course but not the teacherName.When we
need to retrieve a list of all courses with the Teacher name,we would do a join between these
two tables.In some ways,this is great ia a teacher changes is his/her name we only have to
update the name in one place.The drawback is that if tables are large,we may spend an
unnecessarily long time doing joins on tables.

cid Cname teacherID

101 CSE 201

102 ECE 202

103 EEE 203

Module-4 DBMS Notes 5th sem C section


NORMALIZATION

104 MECH 204

105 CIVIL 205


Fig:course table

teacherID tName Address Contact no

201 Raju banglore 89229375

202 Rama hyderabad 78382899

203 Ravi dodballapur 67368299

204 Balu devanahalli 89929902

205 Ramya banglore 67537687


Fig:Teacher table
Denormalization then,strikes a different compromise. Under denormalization,we
decide that we are okay with some redundancy and some extra effort to update the
database inorder to get the efficiency advantages of fewer joins.

cid Cname teacherId teacherName

101 CSE 201 raju

102 ECE 202 rama

103 EEE 203 ravi

104 MECH 204 balu

105 CIVIL 205 ramya


fig(d):course table after adding teacherName attribute

Pros of denormalization:
1. Retrieving data is faster since we do fewer joins.
2. Queries to retrieve can be simpler(and therefore less likely to have bugs),since we
need to look at fewer tables.
Cons of denormalization:
1. Updates and inserts are more expensive.
2. Denormalization can make update and insert code harder to write.
3. Data may be inconsistent identifying which is the “correct ” value for piece of data is
difficult
4. Data redundancy necessitates more storage.

Module-4 DBMS Notes 5th sem C section

You might also like