DBMS Unit 3
DBMS Unit 3
UNIT-3
The process of breaking up or dividing a single relation into two or more sub relations is called as decomposition of a
relation. Its is natural to decompose a relation into more than one relation to reduce the redundancy during the database
design. The decomposition we apply should be lossless in nature.
Lossless Decomposition
If the information is not lost from the relation that is decomposed, then the decomposition will
be lossless.
The lossless decomposition guarantees that the join of relations will result in the same
relation as it was decomposed.
The relation is said to be lossless decomposition if natural joins of all the decomposition give
the original relation.
o Example:
Let's take 'E' is the Relational Schema, With instance 'e'; is decomposed into: E1, E2, E3, . . .
. En; With instance: e1, e2, e3, . . . . en, If e1 ⋈ e2 ⋈ e3……..⋈ en,
then it is called as 'Lossless Join Decomposition'.
Dependency Preservation
Dependency is an important constraint on the database. Every dependency must be satisfied by at least one
decomposed table.
If {A → B} holds, then two sets are functional dependent. And, it becomes more useful for
checking the dependency easily if both sets in a same relation.
This decomposition property can only be done by maintaining the functional
dependency.
In this property, it allows to check the updates without computing the natural join of the
database structure.
2. DEFINE NORMALIZATION AND TYPES OF NORMALIZATION?
Normalization is the process of organizing the data in the database such that it obeys the standard
forms. Normalization is used to minimize the redundancy from a relation or set of relations.
It is also used to eliminate the undesirable characteristics like Insertion, Update and Deletion
Anomalies.
o Normalization divides the larger table into the smaller table and links them using the
relationship.
o The normal form is used to reduce redundancy from the database table.
According to the Codd, there are Five following normal forms:
Functional Dependency (FD): determines the relation of one attribute to another attribute in a
database system.
Functional dependency helps you to maintain the quality of data in the database. A functional
dependency is denoted by an arrow →. The an attribute X functionally determines Y or in other
words, attribute Y is functionally dependent on X, we represent it by X → Y.
Functional Dependency plays a vital role to find the difference between good and bad database
design.
The Trivial dependency is a set of attributes depending on itself or included in itself. So, we say
X -> Y is a trivial functional dependency if Y is a subset of X.
Example:
Consider a table with two columns Employee_Id and Employee_Name. and the Fds
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency
because Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name →Employee_Name are trivial de
pendencies.
5. EXPLAIN ABOUT SECOND NORMAL FORM (2NF) and THIRD NORMAL FORM(3NF)?
Second Normal Form(2NF): A relation R is said to be in 2NF iff its in 1NF and
every non-key attributes are fully functionally dependent on the Primary key
o In the 2NF, relation must be in 1NF and
o All non-key attributes are fully functional dependent on the primary key
Or
If a relation second normal form, a relation must be in first normal form and relation must not
contain any partial dependency.
A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes which
do not participate in any candidate key) is dependent on any proper subset of any candidate key of
the table.
In other words, No non-prime attribute should depend on the part of a candidate key, equivalently,
if every non key attribute is fully functionally dependent on the primary(or any candidate) key, then
we say that the relation is in 2NF.
Partial Dependency – If the proper subset of candidate key determines non-prime attribute, it is
called partial dependency.
Example 1 – Consider table-3 as following below.
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
{Note that, there are many courses having the same course fee. }
Here,
• COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;
Hence,
• COURSE_FEE would be a non-prime attribute, as it does not belong to the one only
candidate key {STUD_NO, COURSE_NO} ;
• But, COURSE_NO -> COURSE_FEE , i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key. Non-prime attribute
COURSE_FEE is dependent on a proper subset of the candidate key, which is a partial
dependency and so this relation is not in 2NF.
To convert the above relation to 2NF, we need to split the table into two tables such as :
Table1: STUD_NO, COURSE_NO
Table2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
Table: employee_details
Table: employee
emp_id emp_name emp_zip
Table: employee_zip
emp_zip emp_state emp_city emp_district
A relation is in third normal form if it holds at least one of the following conditions for every
non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
DBMS-BPP Unit-3 CSE[Type here]
[Type here] [Type here]
UNIT-3: DATABASE DESIGN
Example:
EMPLOYEE_DETAIL table:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID.
The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID).
It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
o When a relation in the relational model is not in appropriate normal then it might also be
suffering with lots of data duplication. So, bring it into the desired normal form, by
decomposing the given relation into more than one sub relations that may satisfy the given
relation constraints.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of information.
Decomposition is used to eliminate some of the problems of bad designs like data anomalies,
inconsistencies, and redundancy. So, the decomposition could be Lossy or lossless in nature as
shown below.
Lossless Decomposition
o If the information is not lost when its decomposed, then the decomposition is called lossless
decomposition.
o The lossless decomposition guarantees that the join of relations will result in the same
relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the decomposition give
the original relation.
Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every
dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must
be a part of R1 or R2 or must be derivable from the combination of functional dependencies
of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A-
>BC). The relational R is decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of relation R1(ABC).
For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two
conditions:
The second point sounds a bit tricky, right? In simple words, it means, that for a dependency A
→ B, A cannot be a non-prime attribute, if B is a prime attribute.
Example : Below we have a college enrolment table with columns student_id, subject and professor.
103 C# P.Chash
One student can enroll for multiple subjects. For example, student with student_id 101,
And, there can be multiple professors teaching one subject like we have for Java.
This table satisfies the 1st Normal form because all the values are atomic, column names are
unique and all the values stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form as there is no Partial Dependency.
And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.
But this table is not in Boyce-Codd Normal Form. Why?
To make this relation(table) satisfy BCNF, we will decompose this table into two
tables, Student table and Professor tables as given below.
101 1
101 2
1 P.Java Java
2 P.Cpp C++
And now, this relation satisfy Boyce-Codd Normal Form. In the next tutorial we will learn about
the Fourth Normal Form.
Given a relation R() with its schema, how can we say whether it is compliant with 5NF or 4NF or…1NF?
To know the highest normal form compliance of a relation, we can apply the following procedure.
Steps to follow to find the highest normal form of a relation
1. The first step is to find all feasible candidate keys of the relation and its attributes.
2. The second step is to organize into two categories all the attributes of the relation:
a. Prime attributes: Those attribute participate in making of the primary/candidate keys
b. Non-prime attributes : Those which are not found in any of the PK/CKs
3. Third and the last step is to examine to determine for 1st normal form and then 2nd and so on.
If the process is unsuccessful in satisfying nth normal form condition, then the highest normal
form will be n-1.
Examples:
Problem 1) Find the highest normal form of a relation R(P, Q, R, S, T) with Functional dependency
set as (QR->S, PR->QT, Q->T).
Solution:
Step 1:
As the relation (PR)+ = (P, Q, R, S, T) is given, but not a single of its subset can determine all attributes
of relation, So { P,R } will be the candidate key. P or R can’t be derived from any other attribute of
the relation, so there will be only one candidate key (PR).
Step 2:
a. The attributes which are part of candidate key (P, R) are Prime attributes.
b. The others will be non-prime attributes (Q, R, S).
Step 3:
A Relational Database Management System does not enable multi-valued or composite attributes. So,
the relation R(P, Q, R, S, T) is in 1st normal form.
Now let’s test for 2NF: It says that, we should check whether there is partial FDs exixt in between PK attributes and
Non-Primary attributes. Lers check, FD after FD…
Because QR->S is in 2nd normal form (QR is not a proper subset of candidate key PR) and
PR->QT is in 2nd normal form (PR is candidate key) and
Q->T is in 2nd normal form (Q is not a proper subset of candidate key PR).
So, the relation is in 2nd normal form.
Check for 3NF : which says there should not be transitive dependencies between PK and other attributes.
In other words, to satisfy 3rd normal form, either LHS of a Functional Dependency should be super key
or RHS should be prime attribute.
Because in QR->S (neither QR is a super key nor S is a prime attribute) and
DBMS-BPP Unit-3 CSE[Type here]
[Type here] [Type here]
UNIT-3: DATABASE DESIGN
in Q->T (neither Q is a super key nor T is a prime attribute) but to satisfy 3rd normal form,
either LHS of a Functional Dependency should be super key or RHS should be prime attribute.
Example: Suppose there is a bike manufacturer company which produces two colors (white and black)
of each model every year.
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent
of each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The
representation of these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
multidetermined COLOR".
A relation R is in fourth normal form(4NF) iff its already in 3NF and doesn’t contain any multivalued dependency
in it.
So, a 4NF relation should satisfy the following two conditions:
1. It should be in the Boyce-Codd Normal Form.
2. And, the table should not have any Multi-valued Dependency.
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math
and two hobbies, Dancing and Singing.
So there is a Multi- valued dependency on STU_ID, which leads to unnecessary repetition of data.
So to convert the above table into 4NF and reducing the redundancy there by, we can decompose
it into following two tables:
- STUDENT_COURSE(STU_ID, COURSE)
- STUDENT_HOBBY(STU_ID, HOBBY)
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
o Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of R.
The above table can be decomposed into the following three tables; therefore it is not in 5NF:
EmployeeSkills :
EmpName EmpSkills
Tom Networking
Harry Web
Development
Katie Programming
EmployeeJob:
EmpName EmpJob
Tom EJ001
Harry EJ002
Katie EJ002
JobSkills:
EmpSkills EmpJob
Networking EJ001
Programming EJ002
The above relations have join dependency, so they are not in 5NF. That would mean that a join
relation of the above three relations is equal to our original relation <Employee>.
Fifth Norman Form(5NF): A relation is in 5NF if it is in 4NF and not contains any join
dependency and joining should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).
Example:
In the below table, John takes both Computer and Math class for Semester 1 but he doesn't take Math
class for Semester 2. In this case, combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be
taking that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as
a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen