r18 Dbms Unit-III Part-II
r18 Dbms Unit-III Part-II
(UNIT-III PART-II)
Data Redundancy
Data Redundancy refers to having multiple copies of the same data stored in two or more
separate places. It leads to same data in multiple folders or databases that can lead to a lot of
problems. Repeated entry of a data record leads to redundant data
Anomalies in DBMS are caused when there is too much redundancy in the database’s
information. Anomalies can often be caused when the tables that make up the database suffer
from poor construction.
Student Table:
There are two students in the above table, 'James' and 'Ritchie Rich', whose records are repetitive
when we enter a new CourseID. Hence it repeats the studRegistration, StudName and address
attributes.
Insert Anomaly: An insert anomaly occurs in the relational database when some attributes or data
items are to be inserted into the database without existence of other attributes. For example, In the
Student table, if we want to insert a new courseID, we need to wait until the student enrolled in a
course. In this way, it is difficult to insert new record in the table. Hence, it is called insertion
anomalies.
Update Anomalies: The anomaly occurs when duplicate data is updated only in one place and not
in all instances. Hence, it makes our data or table inconsistent state. For example, suppose there is a
student 'James' who belongs to Student table. If we want to update the course in the Student, we
need to update the same in the course table; otherwise, the data can be inconsistent. And it reflects
the changes in a table with updated values where some of them will not.
Delete Anomalies: An anomaly occurs in a database table when some records are lost or deleted
from the database table due to the deletion of other records. For example, if we want to remove
Trent Bolt from the Student table, it also removes his address, course and other details from the
Student table. Therefore, we can say that deleting some attributes can remove other attributes of
the database table.
So, we need to avoid these types of anomalies from the tables and maintain the integrity, accuracy
of the database table. Therefore, we use the normalization concept in the database
management system.
Types of Redundancy
There are two types of redundancy level, given below
1. Row level redundancy:
When two rows are the exactly same is called row level redundancy. Then It will never accepted by
RDMS.
Keep in mind: Row level delicacy can removed by set a primary key in the table.
1. Insertion Anomaly
This problem occurs when the new insertion of a data record is not possible without adding some
additional unrelated data to the record.
Syntax:
INSERT INTO table_name (column1, column2, …)
VALUES (value1, value2, …);
Example: If a new student detail need to be inserted while the course and faculty is not still
decided. Then student insertion will not be possible till the course and faculty is decided for
student. As in the following SQL query
Output
2. Deletion Anomaly
This anomaly occurs, when deletion of record results in losing some other information’s that was
stored as part of the record that was deleted from a table.
Syntax:
DELETE FROM table_name WHERE condition;
For example:
SQL Query: Delete from student_detail where Std_ID = 2.
Execution of above query leads toward the loss of Course 2 information. So, deletion is also an
anomaly.
3. Updation Anomaly
This anomaly occurs when changing in one field leads toward the changing in many fields.
Syntax:
UPDATE table_name SET column1 = value1, column2 = value2, ……. WHERE condition;
For example
SQL Query: Update Student_detail SET faculty _Fee = ‘15K’
If we want to Change the faculty_fee of Ali from 10K to 15K. It will update the faculty_fee in many
fields which may be not necessary.
1. Data Inconsistency: The term data inconsistency refers to existence of the same data in
different formats in multiple databases. Redundant data leads to inconsistent duplicates of data
and meaningless or unreliable information in a company's database.
2. Data corruption is increased: The term data corruption refers to damage to data due to error in
reading, writing, storage or processing. This happens when same data fields are repeated in a
database or file storage system like when data is redundant. Corrupted files generate error
message for the customers if the task is not completed
3. Database size increases: Size and complexity of the database is increased due to redundant
data making maintenance of the database a challenge. Larger database leads to long load times
and longer time is spent on completion of daily tasks.
4. Cost increase: Storage costs increase and can affect the profits and goals of the companies due
to redundant data. The implementation of a database system becomes very expensive.
5. Additional space consumed: Redundant data takes up additional space which adds up over
time to form bloated databases. This can prove to be a problem for companies to meet the
demands of their customers.
Functional dependency
To understand the concept thoroughly, let us consider P is a relation with attributes A and B.
Functional Dependency is represented by -> (arrow sign)
Then the following will represent the functional dependency between attributes with an arrow sign
−
A -> B
Above suggests the following:
Example:01
The following is an example that would make it easier to understand functional dependency −
We have a <Department> table with two attributes − DeptId and DeptName.
The DeptId is our primary key. Here, DeptId uniquely identifies the DeptName attribute. This
is because if you want to know the department name, then at first you need to have the DeptId.
DeptId DeptName
001 Finance
002 Marketing
003 HR
Therefore, the above functional dependency between DeptId and DeptName can be determined
as DeptId is functionally dependent on DeptName −
Example:02
From the above table we can conclude some valid functional dependencies:
roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine values of fields
name, dept_name and dept_building, hence a valid Functional dependency
roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name,
dept_building}, it can determine its subset dept_name also.
dept_name → dept_building , Dept_name can identify the dept_building accurately, since
departments with different dept_name will also have a different dept_building
More valid functional dependencies: roll_no → name, {roll_no, name} ⇢ {dept_name,
dept_building}, etc.
Here are some invalid functional dependencies:
name → dept_name Students with the same name can have different dept_name, hence this is
not a valid functional dependency.
dept_building → dept_name There can be multiple departments in the same building, For
example, in the above table departments ME and EC are in the same building B2, hence
dept_building → dept_name is an invalid functional dependency.
More invalid functional dependencies: name → roll_no, {name, dept_name} → roll_no,
dept_building → roll_no, etc.
Types of Functional Dependencies
1. Trivial FD:
A → B has trivial functional dependency if B is a subset of A or B.
it is the case where the derived attribute is derived directly
The following dependencies are also trivial like: A → A, B → B
Example:
{Student_id, Student_Name} → Student_Id // it is a trivial functional dependency
as Student_Id is a subset of { Student_Id, Student_Name}.
Also, Student_Id → Student_Id and Student_address → Student_address are trivial
dependncies
Keep In Mind
Intersection of left hand side of FD and right hand side of FD will never be a null
L.H.S ∩ R.H.S ≠ Ø
Travail FD are valid in each case and never be a problematic in transactions
1. Non-Trivial FD
As in the following example, Student_address is derive from student_ID but not directly
Student_ID → Student_Name
Student_Name → Student_Address
Keep In Mind
When ‘A’ intersection ‘B’ is NULL, then A → B is called as complete non-trivial.
Intersection of both left and right side will always be null (A ∩ B = Ø)
Non-travial are not valid in each case
There are mainly four types of Functional Dependency in DBMS. Following are the types of
Functional Dependencies in DBMS:
1. Multivalued Dependency
2. Trivial Functional Dependency
3. Non-Trivial Functional Dependency
4. Transitive Dependency
1. Multivalued Dependency in DBMS
Multivalued dependency occurs in the situation where there are multiple independent
multivalued attributes in a single table.
A multivalued dependency is a complete constraint between two sets of attributes in a
relation. It requires that certain tuples be present in a relation.
Consider the following Multivalued Dependency Example to understand.
Example:
In this example, maf_year and color are independent of each other but dependent on car_model.
In this example, these two columns are said to be multivalue dependent on car_model.
This dependence can be represented like this:
car_model -> maf_year
car_model-> colour
For example:
Emp_id Emp_name
AS555 Harry
AS811 George
AS999 Kevin
Functional dependency which also known as a nontrivial dependency occurs when A->B holds true
where B is not a subset of A. In a relationship, if attribute B is not a subset of attribute A, then it is
considered as a non-trivial dependency.
Company CEO Age
Microsoft Satya Nadella 51
Google Sundar Pichai 46
Apple Tim Cook 57
Example:
(Company} -> {CEO} (if we know the Company, we knows the CEO name)
But CEO is not a subset of Company, and hence it’s non-trivial functional dependency.
4. Transitive Dependency in DBMS
A Transitive Dependency is a type of functional dependency which happens when “It” is indirectly
formed by two functional dependencies. Let’s understand with the following Transitive
Dependency Example.
Example:
{Company} -> {CEO} (if we know the company, we know its CEO’s name)
{CEO } -> {Age} If we know the CEO, we know the Age
Therefore according to the rule of rule of transitive dependency:
{ Company} -> {Age} should hold, that makes sense because if we know the company name,
we can know his age.
Note: You need to remember that transitive dependency can only occur in a relation of three or
more attributes.
************ *******************************Example-02****************************************
Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a
subset of determinant set {roll_no, name}
Similarly, roll_no → roll_no is also an example of trivial functional dependency.
2. Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of the
determinant.
i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional dependency .
For example,
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18
Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a
subset of determinant roll_no
Similarly, {roll_no, name} → age is also a non-trivial functional dependency, since age is not a
subset of {roll_no, name}
3. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent set are not
dependent on each other.
i.e. If a → {b, c} and there exists no functional dependency between b and c, then it is
called a multivalued functional dependency.
For example,
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18
45 abc 19
Step-01: Add the attributes contained in the attribute set for which closure is being calculated
to the result set.
Step-02: Recursively add the attributes to the result set which can be functionally determined
from the attributes already contained in the result set.
Example-
Consider a relation R ( A , B , C , D , E , F , G ) with the functional dependencies-
A → BC
BC → DE
D→F
CF → G
Now, let us find the closure of some attributes and attribute sets-
Closure of attribute A-
A+ = { A }
= { A , B , C } ( Using A → BC )
= { A , B , C , D , E } ( Using BC → DE )
= { A , B , C , D , E , F } ( Using D → F )
= { A , B , C , D , E , F , G } ( Using CF → G )
Thus,
A+ = { A , B , C , D , E , F , G }
Closure of attribute D-
D+ = { D }
= { D , F } ( Using D → F )
We can not determine any other attribute using attributes D and F contained in the result set.
Thus,
D+ = { D , F }
Closure of attribute set {B, C}-
{ B , C }+= { B , C }
= { B , C , D , E } ( Using BC → DE )
= { B , C , D , E , F } ( Using D → F )
= { B , C , D , E , F , G } ( Using CF → G )
Thus,
{ B , C }+ = { B , C , D , E , F , G }
Finding the Keys Using Closure-
Super Key-
If the closure result of an attribute set contains all the attributes of the relation, then that
Candidate Key-
If there exists no subset of an attribute set whose closure contains all the attributes of the
relation, then that attribute set is called as a candidate key of that relation.
Example-
In the above example,
No subset of attribute A contains all the attributes of the relation.
According to recursive rule, Attribute “A” can determine Attribute “A” itself.
According to given FD, Attribute “A” can directly determine Attribute “B”.
According to transitive property, Attribute “A” can determine “C” through “B”.
According to transitive property, As Attribute “A” already determine C So, Attribute “A” can
determine “D” through “C”.
So, Closure of A = A+ = ABCD
Closure of attribute “B”
According to recursive rule, Attribute “B” can determine Attribute “B” itself.
According to given FD, Attribute “B” can directly determine Attribute “C”.
According to transitive property, Attribute “B” can determine “D” through “C”.
Attribute “B” cannot determine attribute “A”
So, Closure of B = B+ = BCD
Closure of attribute “C”
According to recursive rule, Attribute “C” can determine Attribute “C” itself.
According to given FD, Attribute “C” can directly determine Attribute “D”.
Attribute “C” cannot determine attribute “A” and “B”
So, Closure of C = C+ = CD
Closure of attribute “D”
According to recursive rule, Attribute “D” can determine Attribute “C” itself.
Attribute “D” cannot determine attribute “A”, “B” and “C”
So, Closure of D = D+ = D
Conclusion: As we see only the closure of attribute “A” can determine the all attributes of relation so
attribute “A” can be used as Candidate key.
So, Candidate Key = {A}
Keep In Mind:
Attribute set AB, AC, AD, or ABC, ACD or ABCD can be used to determine all the attributes in the
relation but cannot consider as candidate key.
Because candidate key is a minimal key to determine all attributes in the relation. So “A” is a
candidate key and combination of A with others like (AB, AC, AD, or ABC, ACD or ABCD) is
considered as Super Key.
Example 02
Let suppose R= {A, B, C, D} and FD = {A→B, B→C, C→D, D→A}
As attributes A, B, C and D are present in the left side of FD so, we will find the closure of all these
attributes.
Closure of attribute “A”
As, Attribute “A” can determine itself.
According to FD, Attribute “A” can directly determine Attribute “B”.
According to transitive property, Attribute “A” can determine “C” through “B”.
According to transitive property, As Attribute “A” already determine C So, Attribute “A” can
determine “D” through “C”.
So, Closure of A = A+ = ABCD
Closure of attribute “B”
As, Attribute “B” can determine itself.
According to FD, Attribute “B” can directly determine Attribute “C”.
According to transitive property, Attribute “B” can determine “D” through “C”.
According to transitive property, As Attribute “B” already determine D So, Attribute “B” can
determine “A” through “D”.
So, Closure of B = B+ = BCDA
Closure of attribute “C”
As, Attribute “C” can determine itself.
According to FD, Attribute “C” can directly determine Attribute “D”.
According to transitive property, As Attribute “C” already determine D So, Attribute “C” can
determine “A” through “D”.
As Attribute “C” already determine A So, Attribute “C” can determine “B” through “A”.
So, Closure of C = C+ = CDAB
Closure of attribute “D”
As, Attribute “D” can determine itself.
According to FD, Attribute “D” can directly determine Attribute “A”.
According to transitive property, As Attribute “D” already determine A So, Attribute “D” can
determine “B” through “A”.
As Attribute “D” already determine B So, Attribute “D” can determine “C” through “B”.
So, Closure of D = D+ = DABC
Conclusion: As we see the closure of all attributes “A”, “B”, “C” and “D” can determine the all
attributes of relation so all attributes can be used as Candidate key.
So, Candidate Key = {A, B, C, D}
Essential attributes are those attributes which are not present on RHS of any functional
dependency.
Example:
Case-01:
If all essential attributes together can determine all remaining non-essential attributes, then-
Case-02:
If all essential attributes together cannot determine all remaining non-essential attributes, then-
The set of essential attributes and some non-essential attributes will be the candidate
key(s).
To find the candidate keys, we check different combinations of essential and non-essential
attributes.
We will further understand how to find candidate keys with the help of following problems.
The following practice problems are based on Case-01.
Problem-01:
Let R = (A, B, C, D, E, F) be a relation scheme with the following dependencies-
C→F
E→A
EC → D
A→B
Which of the following is a key for R?
1. CD
2. EC
3. AE
4. AC
Solution-
We will find candidate keys of the given relation in the following steps-
Step-01:
Step-02:
Now,
We will check if the essential attributes together can determine all remaining non-essential
attributes.
To check, we find the closure of CE.
So, we have-
{ CE }+
={C,E}
= { C , E , F } ( Using C → F )
= { A , C , E , F } ( Using E → A )
= { A , C , D , E , F } ( Using EC → D )
= { A , B , C , D , E , F } ( Using A → B )
We conclude that CE can determine all the attributes of the given relation.
So, CE is the only possible candidate key of the relation.
Thus, Option (B) is correct.
Problem-02:
Let R = (A, B, C, D, E) be a relation scheme with the following dependencies-
AB → C
C→D
B→E
Solution-
We will find candidate keys of the given relation in the following steps-
Step-01:
Determine all essential attributes of the given relation.
Essential attributes of the relation are- A and B.
So, attributes A and B will definitely be a part of every candidate key.
Step-02:
Now,
We will check if the essential attributes together can determine all remaining non-essential
attributes.
To check, we find the closure of AB.
So, we have-
{ AB }+
={A,B}
= { A , B , C } ( Using AB → C )
= { A , B , C , D } ( Using C → D )
= { A , B , C , D , E } ( Using B → E )
We conclude that AB can determine all the attributes of the given relation.
Thus, AB is the only possible candidate key of the relation.
Problem-03:
Consider the relation scheme R(E, F, G, H, I, J, K, L, M, N) and the set of functional dependencies-
{ E, F } → { G }
{F}→{I,J}
{ E, H } → { K, L }
{K}→{M}
{L}→{N}
Problem-04:
Consider the relation scheme R(A, B, C, D, E, H) and the set of functional dependencies-
A→B
BC → D
E→C
D→A
What are the candidate keys of R?
1. AE, BE
2. AE, BE, DE
3. AEH, BEH, BCH
4. AEH, BEH, DEH
Solution-
Step-01:
Determine all essential attributes of the given relation.
Essential attributes of the relation are- E and H.
So, attributes E and H will definitely be a part of every candidate key.
Attribute closure:
A -> ABCDE
B -> BD
C -> C
D -> D
E -> ABCDE
AB -> ABCDE
AC -> ABCDE
AD -> ABCDE
AE -> ABCDE
BC -> ABCDE [ Candidate key ]
BD -> BD [NOT a candidate key ]
BE -> ABCDE
CD -> ABCDE
CE -> ABCDE
DE -> ABCDE
ABC -> ABCDE
ABD -> ABCDE
ABE -> ABCDE
ACD -> ABCDE
ACE -> ABCDE
ADE -> ABCDE
BCD -> ABCDE
BDE -> ABCDE
CDE -> ABCDE
ABCD -> ABCDE
ABCE -> ABCDE
ABDE -> ABCDE
ACDE -> ABCDE
BCDE -> ABCDE
GATE QUESTIONS ON FD
1. Let R= (A, B, C, D, E, F) be a relation scheme with the following dependencies: C->F, E->A, EC->D,
A->B. Which of the following is a key for R?
(a) CD (b) EC (c) AE (d) AC
In option (a), its given Z->Y, it means that the value of Z uniquely determines the value of Y. But
here the value 2 of Z, gives two different values of Y i.e. 4 and 2. Therefore this FD is not satisfied
by the instance.
In option (c), its given X->Z, it means that the value of X uniquely determines the value of Z. But
here the value 1 of X, gives two different values of Z i.e. 2 and 3. Therefore this FD is not satisfied
by the instance.
In option (d), its given Y->X, here the value of Y uniquely determines the value of X. Therefore
this FD is satisfied by the instance. Now take FD XZ->Y, here (1,3) cannot uniquely determine
the value of Y. (1,3) gives two values for Y i.e. 5 and 6. Therefore this FD (XZ->Y) is not satisfied
by the instance.
3. From the following instance of a relational schema R(A, B, C), we can conclude that:
----------
A B C
----------
1 1 1
1 1 0
2 3 2
2 3 2
----------
(a)A functionally determines B and B functionally determines C
(b) A functionally determines B and B does not functionally determine C
(c) B does not functionally determine C
(d) A does not functionally determine B and B does not functionally determine C
Ans: option (c)
Explanation:
Looking into an instance we can't conclude that A functionally determines B. A->B could
hold true to the given instance but not on the entire database.
But we can be sure that B does not functionally determine C because for the value 1 of B, it
gives two different values of C i.e. 1 & 0.
Issue in option (d) - Again from the provided instance we cannot say that A does not
determine B.
A->B may or may not hold. Therefore only possible option is (c).
If any closure includes all attributes of a table then it becomes the candidate key.
Find closure (As explained in question 1) of AEH as below.
Closure of AEH = AEHB {A->B}
= AEHBC {E->C}
= AEHBCD {BC->D}
5. In a schema with attributes A, B, C, D and E, following set of functional dependencies are
given:
A->B
A->C
CD->E
B->D
E->A
Which of the following functional dependencies is NOT implied by the above set?
(a) CD->AC (b) BD->CD (c) BC->CD (d) AC->BC
Ans: option(c)
Explanation:
AF+ = {AFDE}
As explained in question 1, find the closure set of each options.
Option (d) is also false. AB+ = {ABCDG}.
Relation R has eight attributes ABCDEFGH. Fields of R contain only atomic values.
F={CH->G, A->BC, B->CFH, E->A, F->EG} is a set of functional dependencies (FDs) so that F + is
exactly the set of FDs that hold for R.
7. How many candidate keys does the relation R have?
(a) 3 (b) 4 (c) 5 (d) 6
Ans: option (b)
Explanation:
In a relational database, a key helps to uniquely identify each record within a table . A key is a
combination of one or more fields/attributes in a table. If a relational schema has multiple keys,
each key is a candidate key. One of the candidate keys is chosen as the primary key.
To find the candidate keys, we need to find the closure of each attribute. (If x is an attribute
(field), set of attributes determined by x under a set F of functional dependencies is the closure
of x under F, denoted x+ ).
Thus,
A+:ABCFHGE [ as per augmentation rule , augmented both side with D ]
B+: BCFHEGA
C+:C
D+:D
E+: EABCFHG
F+:FEGABCH
G+:G
H+ : H
A+,B+,E+,F+ contains all attributes except D. Thus there are 4 candidate keys DA,DB,DE and DF.
8. The relation R is
(a) in 1NF, but not in 2NF.
(b) in 2NF, but not in 3NF.
(c) in 3NF, but not in BCNF.
(d) in BCNF.
Ans: option (a)
Explanation:
An attribute that does not occur in any candidate key is called a non-prime attribute.
Consider F->G; G is a non-prime attribute and F is a proper subset of a candidate key (refer the
above question). This is a case of partial dependency. Hence 2NF condition is violated. similarly
A->C and B->CFH also violates 2NF condition, hence R is not in 2NF.
10. Consider the relation scheme R=(E,F,G,H,I,J,K,L,M,N) and the set of functional
dependencies
{{E,F}→{G},{F}→{I,J},{E,H}→{K,L},{K}→{M},{L}→{N}}
on R. What is the key for R?
(a) {E,F}
(b) {E,F,H}
(c) {E,F,H,K,L}
(d) {E}
A large database defined as a single relation may result in data duplication. This repetition of data
may result in:
So to handle these problems, we should analyze and decompose the relations with redundant data
into smaller, simpler, and well-structured relations that are satisfying desirable properties.
Normalization is a process of decomposing the relations into relations with fewer attributes.
What is Normalization?
Normalization is the process of minimizing redundancy from a relation or set of relations.
Redundancy in relation may cause insertion, deletion, and update anomalies. So, it helps to
minimize the redundancy in relations. Normal forms are used to eliminate or reduce
redundancy in database tables .
Types of Normal Forms
There are the four types of normal forms
Table should not contain any multi valued attributes. It should only have single (atomic)
valued attributes/columns.
First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
Column names of entire tables should be unique.
Note: Primary key will be composite key i.e. (“Std_ID” and “Std_Course”) in above example.
Example:02- Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.
EMPLOYEE table:
The decomposition of the EMPLOYEE table into 1NF has been shown below:
Partial dependency: A part of candidate key is determining the non-prime attribute is called
partial dependency. Suppose AB is the candidate key, if a part of candidate key (i.e. A) determines
the non-prime attribute (i.e. X). Like A → X, then it is partial dependency.
Partial Dependency
A partial dependency is a dependency where few attributes of the candidate key determines non-
prime attribute(s).
OR
In other words,
A → B is called a partial dependency if and only if-
1. A is a subset of some candidate key
2. B is a non-prime attribute.
If any one condition fails, then it will not be a partial dependency.
Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;
Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the one only candidate key
{STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on COURSE_NO, which is a
proper subset of the candidate key. Non-prime attribute COURSE_FEE is dependent on a proper
subset of the candidate key, which is a partial dependency and so this relation is not in 2NF.
To convert the above relation to 2NF, we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE
Table 1
STUD_NO COURSE_NO
1 C1
2 C2
1 C4
4 C3
4 C1
2 C5
Table 2
COURSE_NO COURSE_FEE
C1 1000
C2 1500
C3 1000
C4 2000
C5 2000
NOTE: 2NF tries to reduce the redundant data getting stored in memory. For instance, if there are
100 students taking C1 course, we don’t need to store its Fee as 1000 for all the 100 records,
instead, once we can store it in the second table as the course fee for C1 is 1000.
Example 2 – Suppose Customer table where attributes are Std_ID, Std_RegNo and Location.
Now note that, above both tables fulfil the conditions of 2NF.
Example:03- Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.
TEACHER table
TEACHER_ID SUBJECT
TEACHER_ID TEACHER_AGE 25 Chemistry
25 30 25 Biology
47 35 47 English
83 38 83 Math
83 Computer
Second Normal Form (2NF)
Third Normal Form (3NF)
A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form
A relation is in third normal form if it holds atleast one of the following conditions for every non-
trivial function dependency X → Y.
1. X is a super key or
EMPLOYEE_DETAIL table:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The
non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It
violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table,
with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMPLOYEE_ZIP table:
BCNF is extension of 3NF. It is stricter than 3NF. According to codd normal form (BCNF),
The above table holds the following Candidate keys and FD’s
Candidate key = {RollNo, ID_Card}
FD = {RollNo → Name, RollNo→ ID_Card, ID_Card → age, ID_Card → RollNo}
As the L.H.S of all above functional dependencies contains a candidate key or super key. So, the
above table is in BCNF.
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
264 Designing
264 Testing
364 Stores
364 Developing
Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Fourth Normal Form (4NF)
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation
will be a multi-valued dependency.
Multivalued Dependency
For a single value of A, More than one values (either similar or not) of B exists.
Multivalued dependency must contains at least 3 attributes (i.e. A, B,C) Because In the
Multivalued dependency, two attributes (i.e. B,C) in a table are independent to each other,
but both attributes (B,C) depend on a third attribute (i.e. A)
Above table does not holds the conditions of 4NF because it Holds the Multivalued Dependency.
Explanation
FOR std_id =1 there are two values of Std_course (i.e. CS and English) and same for Std_id =2
there are two values of Std_course (i.e. Java and C#).
Columns “Std_course” and “Std_hobby” are independent to each other but depend on
“Std_id”.
Above both points tells that Multivalued Dependency exist in table. As we know if there exist
Multivalued Dependency then that table is not in 4NF.
How to Satisfy 4th Normal Form?
To remove 4NF problem and satisfy the 4NF, we can decompose the “Student” table as given below
Table 01 “Student_Course Table 02 “Student_Hobbies”
As we see table is also in 4NF as for each value of column “A” more than one values of column “B”
exist. Now this relation satisfies the 4NF.
Example: STUDENT
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STU_ID HOBBY
STUDENT_HOBBY 21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Fifth Normal Form (5NF):
The Fifth normal form (5NF) is generally not implemented in real life database design. But we
must learn the concept about it.
It is in 4NF
It should not have join dependency and Joining should be lossless.
5NF is also known as Project join normal form (PJ/NF).
Join Dependency:
We can understand the 5th NF by understanding either join dependency or breaking down the
tables into parts and rejoin.
As Join dependency is a little bit confusing topic so let understand the breaking down the tables
into parts.
Suppose a table SPC with composite primary key {Supplier, product, customer}
In the above table, supplier supplies products and customer can use these products but note
that supplier does not directly supply to any customer.
In simple word, Supplier (“Ali”) produce (“ABC”) and Customer (“Nauman”) can use it. But Ali
and Nauman are not directly connected.
Decomposition
After Decomposition of table (SPC) in to three parts as given below
Note that, In table (SC) Supplier and Customer are directly connected. So values are changed after
decomposition of table (SPC) into parts.
Example-02
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take
Math class for Semester 2. In this case, combination of all these fields required to identify a valid
data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will
be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Decomposition: in DBMS removes redundancy, anomalies and inconsistencies from a database by
dividing the table into multiple tables.
EMPLOYEE_DEPARTMENT table:
EMPLOYEE Table: The above table is decomposed into two relations EMPLOYEE and
DEPARTMENT
Now, Above two tables are joined on the common column “EMP_ID”.
Employee ⋈ Department
As above table is the original table as it was before decomposition, Hence, the decomposition is
Lossless join decomposition.
Example −02
<EmpInfo>
Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_Name
E001 Jacob 29 Alabama Dpt1 Operations
E002 Henry 32 Alabama Dpt2 HR
E003 Tom 22 Texas Dpt3 Finance
<DeptDetails>
Dept_ID Emp_ID Dept_Name
Dpt1 E001 Operations
Dpt2 E002 HR
Dpt3 E003 Finance
Now, Natural Join is applied on the above two tables −
The result will be −
Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_Name
E001 Jacob 29 Alabama Dpt1 Operations
E002 Henry 32 Alabama Dpt2 HR
E003 Tom 22 Texas Dpt3 Finance
Therefore, the above relation had lossless decomposition i.e. no loss of information.
Lossy Decomposition:
As the name suggests, when a relation is decomposed into two or more relational schemas, the loss
of information is unavoidable when the original relation is retrieved.
<EmpInfo>
Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_Name
E001 Jacob 29 Alabama Dpt1 Operations
E002 Henry 32 Alabama Dpt2 HR
E003 Tom 22 Texas Dpt3 Finance
<DeptDetails>
Dept_ID Dept_Name
Dpt1 Operations
Dpt2 HR
Dpt3 Finance
Now, you won’t be able to join the above tables, since Emp_ID isn’t part of
the DeptDetails relation.
Therefore, the above relation has lossy decomposition.
Dependency Preserving Decomposition:
As we know table decomposition should be either lossless or dependency preserving to avoid the
loss of data.
The decomposition of relation R with FD’s (F), into relation R1 and R2 with their FD’s (F1) and (F2)
respectively will be dependency preserving if.
“If a relation R is decomposing into relation R1 and R2, then the dependencies of R either must be a
part of R1 or R2 or must be derivable from the dependencies of R1 and R2.”
Note: After Union of R1 and R2 attributes, the Resultant must be equal to attributes of original
relation R.
Suppose a relation R with A, B, C and D. This Relation R is decompose into tables R1 with A, B
attributes, R2 with B, C attributes and R3 with B, D attributes.
Solution
1. First of all Find the closure of each attribute which given in left hand side of given FD’s of
Relation R(ABCD). As given in following diagram.
2. Second, Find all Non-Trivial FD’s of Decomposed Relations (R1, R2 and R3) as given under.
3. Third, find all those Non-trivial FD’s which are not determine from given Relation R(ABCD).
Let’s check one by one all Non-Trivial FD’s of all decomposed relations R1, R2 and R3.
a) Check (A→B):
As A→B of Relation R1(AB), is directly given in the FD’s of Relation R(ABCD). So, this
Dependency can determine from original table R(ABCD). Because Closure of A in Original Table
can determine “B”. So, this is valid Dependency.
b) Check (B→A):
As B→A of Relation R1 (AB), is not directly given in the FD’s of Relation R(ABCD). This
Dependency cannot determined from FD’s of original table R(ABCD). Because Closure of B
in Original Table cannot determining “A”. So this is valid a Dependency.
c) Check (B→C):
As B→C of Relation R2(BC), is directly given in the FD’s of Relation R(ABCD). So this
Dependency can determined from original table R(ABCD). Because Closure of B in Original
Table can determine “C”. So, this is valid Dependency.
d) Fourth Check (C→B):
As C→B of Relation R2(BC), is not directly given in the FD’s of Relation R(ABCD). But this
Dependency can determined from original table R(ABCD) Because Closure of C in Original Table
can determining “B”. So, this is valid Dependency.
e) Fifth Check (B→D):
As B→D of Relation R3(BD), is not directly given in the FD’s of Relation R(ABCD). But this
Dependency can determined from original table R(ABCD) Because Closure of B in Original Table
can determining “D”. So, this Dependency is valid.
f) Sixth Check (D→B):
As D→B of Relation R3(BD), is directly given in the FD’s of Relation R(ABCD). So, this
Dependency can determine from original table R(ABCD) Because Closure of D in Original Table
can determining “B”. Thus, this is also a valid Dependency.
So, above all Non-trivial FD’s are valid except 2nd FD (B→A), As in the following diagram
4. Find the closure of all valid Non-trivial FD’s of Decomposed Relations (R1, R2 and R3) as given
below
5. If all dependencies of given relation are preserve through all valid non-trivial dependencies of
its decomposed tables. Then the original table will also preserve.
We will check all valid Non-trivial FD’s of decomposed tables (one by one) and see whether
these FD’s can preserve all FD’s of original Relation. If it preserves, then the decomposition is
dependency preserving otherwise not.
As all Four FD’s of original Relation are preserve through valid Non-trivial FD’s No, 1,3,4 and
5. So, it is a dependency preserving decomposition