0% found this document useful (0 votes)
43 views51 pages

DBMS Unit-Iv

The document discusses various concepts related to schema refinement and normalization in database design. It covers: 1) The steps of schema refinement which includes requirement analysis, conceptual design, logical design, and schema refinement to remove redundancies and anomalies. 2) Problems caused by data redundancy like insertion, deletion, and updation anomalies. It also discusses functional dependencies and different types of functional dependencies. 3) The process of normalization to avoid anomalies and different normal forms like 1NF, 2NF, 3NF, BCNF and examples of tables to explain them. It emphasizes removing transitive dependencies and having superkeys as left sides of functional dependencies to satisfy higher normal forms.

Uploaded by

Shitan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views51 pages

DBMS Unit-Iv

The document discusses various concepts related to schema refinement and normalization in database design. It covers: 1) The steps of schema refinement which includes requirement analysis, conceptual design, logical design, and schema refinement to remove redundancies and anomalies. 2) Problems caused by data redundancy like insertion, deletion, and updation anomalies. It also discusses functional dependencies and different types of functional dependencies. 3) The process of normalization to avoid anomalies and different normal forms like 1NF, 2NF, 3NF, BCNF and examples of tables to explain them. It emphasizes removing transitive dependencies and having superkeys as left sides of functional dependencies to satisfy higher normal forms.

Uploaded by

Shitan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

UNIT-IV

Schema refinement and


Normalization
Introduction to Schema Refinement

• Schema refinement is generally used for refining


or polishing tables.
• It is the last step before physical design.
1) Requirement analysis : user needs
2) Conceptual design : high-level description, often
using E/R diagrams
3) Logical design : from graphs to tables (relational
schema)
4) Schema refinement : checking tables for
redundancies and anomalies
▪ Data redundancy occurs when the same piece
of data is stored in two or more separate
places.
▪ This problem arises when a database is not
normalized.
▪ Problems caused due to redundancy are:
Insertion anomaly, Deletion anomaly, and
Updation anomaly.
Example:Student
In this table student id and course are
primary keys
• Insertion Anomaly –
If a student detail has to be inserted whose course is not being decided yet then
insertion will not be possible till the time course is decided for student.

• Deletion Anomaly –
If the details of students in this table is deleted then the details of college will also
get deleted.

• Updation Anomaly –
Suppose if the rank of the college changes then changes will have to be all over
the database which will be time-consuming and computationally costly.
Functional Dependency:
▪ Functional Dependency (FD) is a new
constraint that determines the relation of one
attribute to another attribute in a Database
Management System (DBMS).
▪ Functional Dependency helps to maintain the
quality of data in the database.
▪ Introduced by E. F. Codd, it helps in preventing
data redundancy and gets to know about bad
designs.
▪ A functional dependency is denoted by an
arrow "→".
▪ The functional dependency of attrbute X on Y
is represented by X → Y.
Types of Functional Dependency
Functional Dependency has three forms −
• Trivial Functional Dependency
• Non-Trivial Functional Dependency
• Completely Non-Trivial Functional
Dependency
Example table:
DeptId DeptName
001 Finance
002 Marketing
003 HR
Trivial Functional Dependency
• It occurs when B is a subset of A in −
• A ->B
Non –Trivial Functional Dependency
• It occurs when B is not a subset of A in −
• A ->B
Completely Non - Trivial Functional Dependency
• It occurs when A intersection B is null in −
• A ->B
Reasoning about FDs:
Armstrong’s Axioms Property of Functional Dependency
• Armstrong’s Axioms property was developed by William
Armstrong in 1974 to reason about functional
dependencies.
• The property suggests rules that hold true if the following
are satisfied:
• Transitivity
If A->B and B->C, then A->C i.e. a transitive relation.
• Reflexivity
A-> B, if B is a subset of A.
• UNION
• If A->B and A->C then A->B,C i.e. union property
• Augmentation
The last rule suggests: AC->BC, if A->B
Normalization
• Normalization is the process of organizing the data in
the database to avoid data redundancy, insertion
anomaly, update anomaly & deletion anomaly.
• It was proposed by Edger F Codd as part of his
relational data base.
• If the relation or table is having redundant data then it
is necessary to normalize the data by arranging it
properly in database.
• Needs to reduce redundancy and improves integrity.
• Types of Normal Forms
• There are the six types of normal forms:
• Normal Form Description
• 1NFA relation or table is in 1NF if it contains an atomic value.
• 2NFA relation will be in 2NF if it is in 1NF and all non-key attributes
are fully functional dependent on the primary key.
• 3NFA relation will be in 3NF if it is in 2NF and no transition
dependency exists.
• BCNF:Boyce-codd normal form. It should be in the Third Normal
Form and for any dependency A → B, A should be a super key.
• 4NFA relation will be in 4NF if it is in Boyce Codd normal form and
has no multi-valued dependency.
• 5NFA relation is in 5NF if it is in 4NF and not contains any join
dependency and joining should be lossless.
• First normal form (1NF)
• As per the rule of first normal form, an
attribute (column) of a table cannot hold
multiple values. It should hold only atomic
values.
Example:
• Example-1:Student
Relation STUDENT in table 1 is not in 1NF
because of multi-valued attribute
STUD_PHONE. Its decomposition into 1NF has
been shown in table 2.
• Example-2:
ID Name Courses ------------------
1 A c1, c2
2 E c3
3 M C2, c3
To Convert this
ID Name Course ------------------
• 1 A c1
• 1 A c2
• 2 E c3
• 3 M c2
• 3 M c3
Second normal form (2NF)
A table is said to be in 2NF if both the following
conditions hold:
• Table is in 1NF (First normal form)
• 2NFA relation will be in 2NF if it is in 1NF and all non-
key attributes are fully functional dependent on the
primary key.

Example: Suppose a school wants to store the data of


teachers and the subjects they teach. They create a
table that looks like this: Since a teacher can teach
more than one subjects, the table can have multiple
rows for a same teacher.
teacher_id subject teacher_age
111 Maths 38
111 Physics 38
222 Biology 38

333 Physics 40

333 Chemistry 40

Candidate Keys: {teacher_id, subject}


Non prime attribute: teacher_age
The table is in 1 NF because each attribute has atomic values.
However, it is not in 2NF because non prime attribute teacher_age is dependent on
teacher_id alone which is a proper subset of candidate key.
• To make the table complies with 2NF we can
decompose or break it in two tables like this:
teacher_details table:
• teacher_id teacher_age
• 111 38
• 222 38
• 333 40
Teacher id->teacher age
• teacher_subject table:
• teacher_id subject
• 111 Maths
• 111 Physics
• 222 Biology
• 333 Physics
• 333 Chemistry
Third Normal Form (3NF):
A relation is in third normal form, if there is no
transitive dependency for non-prime
attributes as well as it is in second normal
form.
• A relation is in 3NF if at least one of the
following condition holds in every non-trivial
function dependency X –> Y:
• X is a super key.
• Y is a prime attribute (each element of Y is
part of some candidate key).
Example:
EMPLOYEE_DETAILS table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida


333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal
• Super key in the table above:
{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP
_ZIP}....so on
• primary key: {EMP_ID}
• Non-prime attributes: In the given table, all attributes
except EMP_ID are non-prime.
• Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and
EMP_ZIP dependent on EMP_ID.
• Emp_zip->emp_state emp_zip->emp_city
• Emp_id->emp_zip
• The non-prime attributes (EMP_STATE, EMP_CITY) are
transitively dependent on super key(EMP_ID). It violates
the rule of third normal form.
• EMPLOYEE table:
EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010


333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007

Emp-id->emp-name
Emp-id->emp-zip
EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
BCNF:
▪ Boyce-Codd Normal Form or BCNF is an
extension to the third normal form, and is also
known as 3.5 Normal Form.
• Boyce-Codd Normal Form, it should satisfy the
following two conditions:
• It should be in the Third Normal Form.
• And, for any dependency A → B, A should be
a super key.
• For BCNF, the table should be in 3NF, and for
every FD, LHS is super key.
• Example:
• Below we have a college enrolment table with
columns student_id, subject and professor.
• Student-id,subject-candidate keys (primary
keys) student subject professor
_id
101 Java P.Java
101 C++ P.Cpp
102 Java P.Java2
103 C# P.Chash

104 Java P.Java


• This table satisfies the 1st Normal form because
all the values are atomic, column names are
unique and all the values stored in a particular
column are of same domain.
• This table also satisfies the 2nd Normal Form as
their is no Partial Dependency.
• And, there is no Transitive Dependency, hence
the table also satisfies the 3rd Normal Form.
• But this table is not in Boyce-Codd Normal Form.
• In the table above, student_id, subject form
primary key, which means subject column is
a prime attribute.
• (Student_id,subject)->professor
• But, there is one more
dependency, professor → subject.
• And while subject is a prime
attribute, professor is a non-prime attribute,
which is not allowed by BCNF.
• To make this relation(table) satisfy BCNF, we
will decompose this table into two
tables, student table and professor table.
• Below we have the structure for both the
tables.
• Student Table
student_id p_id
101 1
101 2
And, Professor Table
p_id professor subject
1 P.Java Java
2 P.Cpp C++

Hence this relation satisfy Boyce-Codd Normal Form

(Pid,professor)->subject
Fourth Normal Form:
▪Fourth Normal Form comes into picture
when Multi-valued Dependency occur in any
relation.
▪Needs to remove Multi-valued Dependency and
how to make any table satisfy the fourth normal
form.
Rules for 4th Normal Form
• For a table to satisfy the Fourth Normal Form,
it should satisfy the following two conditions:
• It should be in the Boyce-Codd Normal Form.
• And, the table should not have any Multi-
valued Dependency.
What is Multi-valued Dependency?
A table is said to have multi-valued dependency, if the
following conditions are true,
• For a dependency A → B, if for a single value of A,
multiple value of B exists, then the table may have multi-
valued dependency.
• Also, a table should have at-least 3 columns for it to have
a multi-valued dependency.
• And, for a relation R(A,B,C), if there is a multi-valued
dependency between, A and B, then B and C should be
independent of each other.(A->B,A->c then A->BC)
• If all these conditions are true for any relation(table), it is
said to have multi-valued dependency.
• Example
STUDENT

STU_ID COURSE HOBBY

21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
• The given STUDENT table is in 3NF, but the COURSE and
HOBBY are two independent entity. Hence, there is no
relationship between COURSE and HOBBY.
• In the STUDENT relation, a student with STU_ID, 21 contains
two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued
dependency on STU_ID, which leads to unnecessary repetition
of data.
• So to make the above table into 4NF, we can
decompose it into two tables:
• STUDENT_COURSE
• Stu-id->course
STU_ID COURSE

21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
• STUDENT_HOBBY
STU_ID HOBBY

21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
What is Join Dependency?
• If a table can be recreated by joining multiple tables and each of this
table have a subset of the attributes of the table, then the table is in
Join Dependency.
• It is a generalization of Multi-valued Dependency
• Join Dependency can be related to 5NF, wherein a relation is in 5NF,
only if it is already in 4NF and it cannot be decomposed further.
Fifth normal form (5NF)
• A relation is in 5NF if it is in 4NF and not
contains any join dependency and joining
should be lossless.
• 5NF is satisfied when all the tables are broken
into as many tables as possible in order to
avoid redundancy.
• 5NF is also known as Project-join normal form
(PJ/NF).
Example

SUBJECT LECTURER SEMESTER

Computer Anil Sem 1


Computer Jai Sem 1
Math Jai Sem1
Math Akash Sem2
Chemistry Pranay Sem 1

So to make the above table into 5NF, we can decompose it into three
relations P1, P2 & P3:
P1
SEMESTER SUBJECT

Sem 1 Computer
Sem 1 Math
Sem1 Chemistry
Sem2 Math

P2 SUBJECT LECTURER

Computer Anil
Computer Jai
Math Jai
Math Akash
Chemistry Pranay
P3
SEMSTER LECTURER

Semester 1 Anil
Semester 1 Jai
Semester 1 Jai
Semester 2 Akash
Semester 1 Pranay
Properties of decomposition:
What is decomposition?
• Decomposition is the process of breaking down in parts
or elements.
• It replaces a relation or table with a collection of
smaller relations.
• It breaks the table into multiple tables in a database.
• It should always be lossless, because it confirms that
the information in the original relation can be
accurately reconstructed based on the decomposed
relations.
• If there is no proper decomposition of the relation,
then it may lead to problems like loss of information.
Properties of Decomposition
Following are the properties of Decomposition
1.Attribute preservation
If a relation ‘R’ is decomposed in to D(r1,r2,r3)
where D is known as decomposition ,if the
attributes in R appears in any of decomposed
tables then it is known as attribute preservation.
2.No Redundancy:
Decomposition is mainly used to reduce
redundancy,anamolies
3.Lossless Join:
When you decompose a relation into smaller tables
and when you reconstruct the original table by
joining the smaller tables with out any loss of
information.
4.Non additive Join:
Reconstructed table should not have additional
tuples or attributes
5.Dependency preservation:
Reconstructed table should have functional
dependency like A->B
Dependency Preservation:
• Dependency is an important constraint on the
database.
• Every dependency must be satisfied by at least
one decomposed table.
• If {A → B} holds, then two sets are functional
dependent. And, it becomes more useful for
checking the dependency easily if both sets in
a same relation.
Example
▪ Relation ‘R’ or table is having functional
dependency as A->B
▪ Now R is decomposed in to smaller tables
D(R1,R2,R3) where D is decomposition
▪ If the functional dependency A->B is found in
R1 and R3 then it is known as dependency
preservation.
Lossless design:
▪ Lossless design is one of the properties of
decomposition.
▪ If a relation R is divided in to smaller tables
D(r1,r2,r3) when you join D in order to
reconstruct the original table R with out any
loss of information then we can call it as
lossless design.
Example:
Relation R
Decomposed table R1,R2,R3
If R1 join R2 join R3=R, it is lossless join and
If R1UR2UR3=R, it is lossless join and
If R1UR2UR3 is not equal to R it is lossy design

You might also like