0% found this document useful (0 votes)
25 views56 pages

r18 Dbms Unit-III Part-II

Uploaded by

Pannala Ravi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views56 pages

r18 Dbms Unit-III Part-II

Uploaded by

Pannala Ravi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

SCHEMA REFINEMENT

(UNIT-III PART-II)

Data Redundancy

 Data Redundancy refers to having multiple copies of the same data stored in two or more
separate places. It leads to same data in multiple folders or databases that can lead to a lot of
problems. Repeated entry of a data record leads to redundant data

Problems Caused by Redundancy:

 Anomalies in DBMS are caused when there is too much redundancy in the database’s
information. Anomalies can often be caused when the tables that make up the database suffer
from poor construction.

Student Table:

StudRegistration CourseID StudName Address Course


205 6204 James Los Angeles Economics
205 6247 James Los Angeles Economics
224 6247 Trent Bolt New York Mathematics
230 6204 Ritchie Rich Egypt Computer
230 6208 Ritchie Rich Egypt Accounts

There are two students in the above table, 'James' and 'Ritchie Rich', whose records are repetitive
when we enter a new CourseID. Hence it repeats the studRegistration, StudName and address
attributes.

Insert Anomaly: An insert anomaly occurs in the relational database when some attributes or data
items are to be inserted into the database without existence of other attributes. For example, In the
Student table, if we want to insert a new courseID, we need to wait until the student enrolled in a
course. In this way, it is difficult to insert new record in the table. Hence, it is called insertion
anomalies.

Update Anomalies: The anomaly occurs when duplicate data is updated only in one place and not
in all instances. Hence, it makes our data or table inconsistent state. For example, suppose there is a
student 'James' who belongs to Student table. If we want to update the course in the Student, we
need to update the same in the course table; otherwise, the data can be inconsistent. And it reflects
the changes in a table with updated values where some of them will not.
Delete Anomalies: An anomaly occurs in a database table when some records are lost or deleted
from the database table due to the deletion of other records. For example, if we want to remove
Trent Bolt from the Student table, it also removes his address, course and other details from the
Student table. Therefore, we can say that deleting some attributes can remove other attributes of
the database table.

So, we need to avoid these types of anomalies from the tables and maintain the integrity, accuracy
of the database table. Therefore, we use the normalization concept in the database
management system.

Types of Redundancy
There are two types of redundancy level, given below
1. Row level redundancy:
When two rows are the exactly same is called row level redundancy. Then It will never accepted by
RDMS.

Keep in mind: Row level delicacy can removed by set a primary key in the table.

2. Column Level Redundancy


When any column in a relation contains similar data then it will be column level redundancy. So, It
is problematic in some cases but not in all cases.

Redundancy Problem Reasons


Redundancy Problems can occur due to following reasons.
 Insertion Anomaly
 Deletion Anomaly
 Updation Anomaly
Anomaly: Anomaly is a problem because it occurs in some cases.
Let explain all anomalies through the following table (Student_details)

1. Insertion Anomaly
This problem occurs when the new insertion of a data record is not possible without adding some
additional unrelated data to the record.
Syntax:
INSERT INTO table_name (column1, column2, …)
VALUES (value1, value2, …);

Example: If a new student detail need to be inserted while the course and faculty is not still
decided. Then student insertion will not be possible till the course and faculty is decided for
student. As in the following SQL query

INSERT INTO Student_details (Std_ID, STD_Name, Course_ID, Course_Name, Faculty_ID,


Faculty_Name, Faculty_Fee) VALUES (‘5’, ‘Khalid’, ”, ”, ”, ”, ” );.

Output

2. Deletion Anomaly

This anomaly occurs, when deletion of record results in losing some other information’s that was
stored as part of the record that was deleted from a table.
Syntax:
DELETE FROM table_name WHERE condition;
For example:
SQL Query: Delete from student_detail where Std_ID = 2.
Execution of above query leads toward the loss of Course 2 information. So, deletion is also an
anomaly.

3. Updation Anomaly
This anomaly occurs when changing in one field leads toward the changing in many fields.
Syntax:
UPDATE table_name SET column1 = value1, column2 = value2, ……. WHERE condition;
For example
SQL Query: Update Student_detail SET faculty _Fee = ‘15K’
If we want to Change the faculty_fee of Ali from 10K to 15K. It will update the faculty_fee in many
fields which may be not necessary.

Solution of Removing Anomalies


One of the best solutions to remove the above anomalies is to divide the table into parts as given
below.
Disadvantages of Data Redundancy

Drawbacks include the following points:

1. Data Inconsistency: The term data inconsistency refers to existence of the same data in
different formats in multiple databases. Redundant data leads to inconsistent duplicates of data
and meaningless or unreliable information in a company's database.

2. Data corruption is increased: The term data corruption refers to damage to data due to error in
reading, writing, storage or processing. This happens when same data fields are repeated in a
database or file storage system like when data is redundant. Corrupted files generate error
message for the customers if the task is not completed

3. Database size increases: Size and complexity of the database is increased due to redundant
data making maintenance of the database a challenge. Larger database leads to long load times
and longer time is spent on completion of daily tasks.

4. Cost increase: Storage costs increase and can affect the profits and goals of the companies due
to redundant data. The implementation of a database system becomes very expensive.

5. Additional space consumed: Redundant data takes up additional space which adds up over
time to form bloated databases. This can prove to be a problem for companies to meet the
demands of their customers.

1. What is redundant data or redundancy?


Data that is either stored in several tables at the same time or occurs more than once
within a table. They increase the amount of data considerably because the same data has
to be saved multiple times, which is superfluous.
2. If we neglect normalization in database what would happen?
Normalization is a technique to save ourselves from redundancy.
e.g. if we have a table where we fill “posted by” column and save the name of user
instead of his “ID“. The maintenance is lengthy process and we have inconsistent data.
3. What happen if we use redundant data or repeatable data in table?
If we didn’t use normalization technique our storage will be waste. So in order to safe
ourselves from inconsistency avoid redundant data.
4. Do normalization help to solve inconsistent data problem?
Yes, by normalization our database tables you can get consistent data.
5. If we have redundant data what will happen?
The update command in Database will get longer time and may be you can crash your
database by updating long tables.
6. Does consistency and inconsistency are interrelated?
Yes, a database must have consistent data to avoid the problems of redundancy and
inconsistency. That is why we called consistency and inconsistency is interrelated.

Functional dependency

 Functional dependency in DBMS, as the name suggests is a relationship between attributes of a


table dependent on each other. Introduced by E. F. Codd, it helps in preventing data redundancy
and gets to know about bad designs.

 To understand the concept thoroughly, let us consider P is a relation with attributes A and B.
Functional Dependency is represented by -> (arrow sign)

Then the following will represent the functional dependency between attributes with an arrow sign

A -> B
Above suggests the following:

Example:01
 The following is an example that would make it easier to understand functional dependency −
 We have a <Department> table with two attributes − DeptId and DeptName.
 The DeptId is our primary key. Here, DeptId uniquely identifies the DeptName attribute. This
is because if you want to know the department name, then at first you need to have the DeptId.
DeptId DeptName

001 Finance

002 Marketing

003 HR

Therefore, the above functional dependency between DeptId and DeptName can be determined
as DeptId is functionally dependent on DeptName −

DeptId -> DeptName

Example:02

roll_no name dept_name dept_building


42 abc CO A4
43 pqr IT A3
44 xyz CO A4
45 xyz IT A3
46 mno EC B2
47 jkl ME B2

From the above table we can conclude some valid functional dependencies:
 roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine values of fields
name, dept_name and dept_building, hence a valid Functional dependency
 roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name,
dept_building}, it can determine its subset dept_name also.
 dept_name → dept_building , Dept_name can identify the dept_building accurately, since
departments with different dept_name will also have a different dept_building
 More valid functional dependencies: roll_no → name, {roll_no, name} ⇢ {dept_name,
dept_building}, etc.
Here are some invalid functional dependencies:
 name → dept_name Students with the same name can have different dept_name, hence this is
not a valid functional dependency.
 dept_building → dept_name There can be multiple departments in the same building, For
example, in the above table departments ME and EC are in the same building B2, hence
dept_building → dept_name is an invalid functional dependency.
 More invalid functional dependencies: name → roll_no, {name, dept_name} → roll_no,
dept_building → roll_no, etc.
Types of Functional Dependencies

There are two types of functional dependencies-

Let explain with example:


Suppose we have an Student table with attributes: Student_Id, Student_Name, Student _class.
Here Student_Id attribute can uniquely identify the Student_Name and Student_address attribute of
Student table.

1. Trivial FD:
 A → B has trivial functional dependency if B is a subset of A or B.
 it is the case where the derived attribute is derived directly
 The following dependencies are also trivial like: A → A, B → B

Example:
 {Student_id, Student_Name} → Student_Id // it is a trivial functional dependency
as Student_Id is a subset of { Student_Id, Student_Name}.
 Also, Student_Id → Student_Id and Student_address → Student_address are trivial
dependncies

Keep In Mind
 Intersection of left hand side of FD and right hand side of FD will never be a null
 L.H.S ∩ R.H.S ≠ Ø
 Travail FD are valid in each case and never be a problematic in transactions

1. Non-Trivial FD

 A → B has a non-trivial functional dependency if B is not a subset of A.


 it is the case where the derived attribute are not derived directly
Example:

As in the following example, Student_address is derive from student_ID but not directly
 Student_ID → Student_Name
 Student_Name → Student_Address

Keep In Mind
 When ‘A’ intersection ‘B’ is NULL, then A → B is called as complete non-trivial.
 Intersection of both left and right side will always be null (A ∩ B = Ø)
 Non-travial are not valid in each case

There are mainly four types of Functional Dependency in DBMS. Following are the types of
Functional Dependencies in DBMS:

1. Multivalued Dependency
2. Trivial Functional Dependency
3. Non-Trivial Functional Dependency
4. Transitive Dependency
1. Multivalued Dependency in DBMS

 Multivalued dependency occurs in the situation where there are multiple independent
multivalued attributes in a single table.
 A multivalued dependency is a complete constraint between two sets of attributes in a
relation. It requires that certain tuples be present in a relation.
 Consider the following Multivalued Dependency Example to understand.

Example:

Car_model Maf_year Color


H001 2017 Metallic
H001 2017 Green
H005 2018 Metallic
H005 2018 Blue
H010 2015 Metallic
H033 2012 Gray

 In this example, maf_year and color are independent of each other but dependent on car_model.
In this example, these two columns are said to be multivalue dependent on car_model.
This dependence can be represented like this:
car_model -> maf_year
car_model-> colour

2. Trivial Functional Dependency in DBMS


 The Trivial dependency is a set of attributes which are called a trivial if the set of attributes are
included in that attribute.
 So, X -> Y is a trivial functional dependency if Y is a subset of X. Let’s understand with a Trivial
Functional Dependency Example.

For example:

Emp_id Emp_name
AS555 Harry
AS811 George
AS999 Kevin

Consider this table of with two columns Emp_id and Emp_name.

{Emp_id, Emp_name} -> Emp_id is a trivial functional dependency as Emp_id is a subset of


{Emp_id,Emp_name}.

3. Non Trivial Functional Dependency in DBMS

Functional dependency which also known as a nontrivial dependency occurs when A->B holds true
where B is not a subset of A. In a relationship, if attribute B is not a subset of attribute A, then it is
considered as a non-trivial dependency.
Company CEO Age
Microsoft Satya Nadella 51
Google Sundar Pichai 46
Apple Tim Cook 57

Example:

(Company} -> {CEO} (if we know the Company, we knows the CEO name)

But CEO is not a subset of Company, and hence it’s non-trivial functional dependency.
4. Transitive Dependency in DBMS
A Transitive Dependency is a type of functional dependency which happens when “It” is indirectly
formed by two functional dependencies. Let’s understand with the following Transitive
Dependency Example.

Example:

Company CEO Age


Microsoft Satya Nadella 51
Google Sundar Pichai 46
Alibaba Jack Ma 54

 {Company} -> {CEO} (if we know the company, we know its CEO’s name)
 {CEO } -> {Age} If we know the CEO, we know the Age
 Therefore according to the rule of rule of transitive dependency:
 { Company} -> {Age} should hold, that makes sense because if we know the company name,
we can know his age.

Note: You need to remember that transitive dependency can only occur in a relation of three or
more attributes.

************ *******************************Example-02****************************************

1. Trivial Functional Dependency


 In Trivial Functional Dependency, a dependent is always a subset of the determinant.
i.e. If X → Y and Y is the subset of X, then it is called trivial functional dependency
For example,
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18

Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a
subset of determinant set {roll_no, name}
Similarly, roll_no → roll_no is also an example of trivial functional dependency.
2. Non-trivial Functional Dependency
 In Non-trivial functional dependency, the dependent is strictly not a subset of the
determinant.
 i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional dependency .
For example,
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18

Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a
subset of determinant roll_no
Similarly, {roll_no, name} → age is also a non-trivial functional dependency, since age is not a
subset of {roll_no, name}
3. Multivalued Functional Dependency
 In Multivalued functional dependency, entities of the dependent set are not
dependent on each other.
 i.e. If a → {b, c} and there exists no functional dependency between b and c, then it is
called a multivalued functional dependency.
For example,
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18
45 abc 19

Here, roll_no → {name, age} is a multivalued functional dependency, since the


dependents name & age are not dependent on each other(i.e. name → age or age → name
doesn’t exist !)
4. Transitive Functional Dependency
 In transitive functional dependency, dependent is indirectly dependent on determinant.
 i.e. If a → b & b → c, then according to axiom of transitivity, a → c. This is a transitive
functional dependency
For example,
enrol_no name dept building_no
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2

Here, enrol_no → dept and dept → building_no,


Hence, according to the axiom of transitivity, enrol_no → building_no is a valid functional
dependency. This is an indirect functional dependency, hence called Transitive functional
dependency.
Inference Rule (IR)
 Using the inference rule, we can derive additional FD (functional dependency) from the
initial attribute set.
Types of Inference Rule
The FD (Functional dependency) has 6 types of inference rule. Any rule can be used in-between two
attributes

1. Reflexive Rule (IR1)


 In the reflexive rule, if B is a subset of A, then A determines B.

 Every attribute can determine itself also called Reflexive.


 If A ⊇ B then A → B
Example:
A ⊇ A , B → B , AB → B
2. Augmentation Rule (IR2)
 The augmentation rule is also called as a partial dependency.
 If A → B then AC → BC
3. Transitive Rule (IR3)
 In the transitive rule, if A determines B and B determine C, then A must also determine C.
 If A → B and B → C, then A → C

4. Union Rule (IR4)


 Union rule says, if A determines B and B determines C, then A must also determine B and C.
 If A → B and A → C then A → BC
5. Decomposition Rule (IR5)
 Decomposition rule is also called project rule. It is the reverse of union rule.
 If A → BC then A → B and A → C
6. Pseudo transitive Rule (IR6)
 According to Pseudo transitive Rule
 If A → B and BC → W then AC → W
Closure of an Attribute Set-
 The set of all those attributes which can be functionally determined from an attribute set is called
as a closure of that attribute set.

 Closure of attribute set {X} is denoted as {X}+.

Steps to Find Closure of an Attribute Set-

Following steps are followed to find the closure of an attribute set-

 Step-01: Add the attributes contained in the attribute set for which closure is being calculated
to the result set.
 Step-02: Recursively add the attributes to the result set which can be functionally determined
from the attributes already contained in the result set.
Example-
Consider a relation R ( A , B , C , D , E , F , G ) with the functional dependencies-
A → BC
BC → DE
D→F
CF → G
Now, let us find the closure of some attributes and attribute sets-

Closure of attribute A-
A+ = { A }
= { A , B , C } ( Using A → BC )
= { A , B , C , D , E } ( Using BC → DE )
= { A , B , C , D , E , F } ( Using D → F )
= { A , B , C , D , E , F , G } ( Using CF → G )
Thus,
A+ = { A , B , C , D , E , F , G }
Closure of attribute D-
D+ = { D }
= { D , F } ( Using D → F )
We can not determine any other attribute using attributes D and F contained in the result set.
Thus,
D+ = { D , F }
Closure of attribute set {B, C}-
{ B , C }+= { B , C }
= { B , C , D , E } ( Using BC → DE )
= { B , C , D , E , F } ( Using D → F )
= { B , C , D , E , F , G } ( Using CF → G )
Thus,
{ B , C }+ = { B , C , D , E , F , G }
Finding the Keys Using Closure-

Super Key-
 If the closure result of an attribute set contains all the attributes of the relation, then that

attribute set is called as a super key of that relation.

 Thus, we can say-

“The closure of a super key is the entire relation schema.”


Example-
In the above example,
 The closure of attribute A is the entire relation schema.

 Thus, attribute A is a super key for that relation.

Candidate Key-
 If there exists no subset of an attribute set whose closure contains all the attributes of the

relation, then that attribute set is called as a candidate key of that relation.

Example-
In the above example,
 No subset of attribute A contains all the attributes of the relation.

 Thus, attribute A is also a candidate key for that relation.


Method to Find Closure of an Attribute Set
 Find all determined values from closure and add recursive value of closure.
 Find the closure and candidate key
Whenever we need to find the closure of any relation the following two things will be given
 Relation with attributes i.e. {R(ABCDEF)}
 Functional dependencies (FD) i.e. {A→B, BC→D, E→F} etc.
We need to find the closure of all those attributes which are left side of the FD.
Let explain with multiple Examples.
Example 01
Suppose a relation (R) containing four attributes as A, B, C, and D and functional dependencies as
given below.
 R= {A, B, C, D}
 FD = {A→B, B→C, C→D}
As attributes A, B, C and D are present in the left side of FD so, we will find the closure of all these
attributes.
So, Closure of all attributes are given under
 Closure of attribute “A”

 According to recursive rule, Attribute “A” can determine Attribute “A” itself.
 According to given FD, Attribute “A” can directly determine Attribute “B”.
 According to transitive property, Attribute “A” can determine “C” through “B”.
 According to transitive property, As Attribute “A” already determine C So, Attribute “A” can
determine “D” through “C”.
So, Closure of A = A+ = ABCD
 Closure of attribute “B”
 According to recursive rule, Attribute “B” can determine Attribute “B” itself.
 According to given FD, Attribute “B” can directly determine Attribute “C”.
 According to transitive property, Attribute “B” can determine “D” through “C”.
 Attribute “B” cannot determine attribute “A”
 So, Closure of B = B+ = BCD
 Closure of attribute “C”
 According to recursive rule, Attribute “C” can determine Attribute “C” itself.
 According to given FD, Attribute “C” can directly determine Attribute “D”.
 Attribute “C” cannot determine attribute “A” and “B”
 So, Closure of C = C+ = CD
 Closure of attribute “D”
 According to recursive rule, Attribute “D” can determine Attribute “C” itself.
 Attribute “D” cannot determine attribute “A”, “B” and “C”
 So, Closure of D = D+ = D

Conclusion: As we see only the closure of attribute “A” can determine the all attributes of relation so
attribute “A” can be used as Candidate key.
So, Candidate Key = {A}

Keep In Mind:
 Attribute set AB, AC, AD, or ABC, ACD or ABCD can be used to determine all the attributes in the
relation but cannot consider as candidate key.
 Because candidate key is a minimal key to determine all attributes in the relation. So “A” is a
candidate key and combination of A with others like (AB, AC, AD, or ABC, ACD or ABCD) is
considered as Super Key.

Example 02
Let suppose R= {A, B, C, D} and FD = {A→B, B→C, C→D, D→A}
As attributes A, B, C and D are present in the left side of FD so, we will find the closure of all these
attributes.
 Closure of attribute “A”
 As, Attribute “A” can determine itself.
 According to FD, Attribute “A” can directly determine Attribute “B”.
 According to transitive property, Attribute “A” can determine “C” through “B”.
 According to transitive property, As Attribute “A” already determine C So, Attribute “A” can
determine “D” through “C”.
 So, Closure of A = A+ = ABCD
 Closure of attribute “B”
 As, Attribute “B” can determine itself.
 According to FD, Attribute “B” can directly determine Attribute “C”.
 According to transitive property, Attribute “B” can determine “D” through “C”.
 According to transitive property, As Attribute “B” already determine D So, Attribute “B” can
determine “A” through “D”.
 So, Closure of B = B+ = BCDA
 Closure of attribute “C”
 As, Attribute “C” can determine itself.
 According to FD, Attribute “C” can directly determine Attribute “D”.
 According to transitive property, As Attribute “C” already determine D So, Attribute “C” can
determine “A” through “D”.
 As Attribute “C” already determine A So, Attribute “C” can determine “B” through “A”.
 So, Closure of C = C+ = CDAB
 Closure of attribute “D”
 As, Attribute “D” can determine itself.
 According to FD, Attribute “D” can directly determine Attribute “A”.
 According to transitive property, As Attribute “D” already determine A So, Attribute “D” can
determine “B” through “A”.
 As Attribute “D” already determine B So, Attribute “D” can determine “C” through “B”.
 So, Closure of D = D+ = DABC

Conclusion: As we see the closure of all attributes “A”, “B”, “C” and “D” can determine the all
attributes of relation so all attributes can be used as Candidate key.
So, Candidate Key = {A, B, C, D}

Example 03: Find the closure and candidate key

Consider a relation R ( A , B , C , D , E , F , G ) with the functional dependencies-


 A → BC
 BC → DE
 D→F
 CF → G
Here we will find the closure of attribute “A”, “BC”, “D”, and “CF” but keep in mind, First, we have
to find the closure of single attribute i.e. (A and D) and then find the closure of multiple
attributes (“BC” and “CF”)
Now, let us find the closure of some attributes and attribute sets-
 Closure of attribute A
 A+ = {A} According to recursive property, A can determine itself.
 Using FD, A → BC, According to decomposition rule “A” can determine “B” and “C” )
A+ = { A , B , C }
 As B, C is already determined in A+ So by, Using BC → DE attributes D and E and also be
determined.
 A+ = { A , B , C , D , E }
 As attribute “D” is determined in closure of “A” so by using D → F. attribute “F” can also be
determined
 A+ = { A , B , C , D , E , F }
 ( Using D → F ) As attributes { A , B , C , D , E , F } is determined in closure of “A” so by
using (CF → G). attribute “G” can also be determined
 A+ = { A , B , C , D , E , F , G } ( Using CF → G )
Thus,
A+ = { A , B , C , D , E , F , G }
 Closure of attribute D
 D+ = { D }
 D+= { D , F } ( Using D → F )
 We cannot determine any other attribute using attributes D and F contained in the result set.
Thus,
 D+ = { D , F }
 “D” is not a candidate key because it cannot determine the attribute”A”, “B” and “C”.
 Closure of attribute set {B, C}
 { BC }+= { B , C }
 { BC }+= { B , C , D , E } ( Using BC → DE )
 { BC }+= { B , C , D , E , F } ( Using D → F )
 { BC }+= { B , C , D , E , F , G } ( Using CF → G )
Thus,
 { BC }+ = { B , C , D , E , F , G }
 “BC” is not a candidate key because it cannot determine the attribute ”A”
 Closure of attribute set {C,F}
 { CF }+= { C , F }
 { CF }+= { C , F , G } ( Using CF → G )
Thus,
 {CF }+ = {C, F , G }
“BC” is not a candidate key because it cannot determine the attribute ”A, Candidate key
So the candidate key is only “A” because closure of A determines the all attributes of relations
Finding Candidate Keys
We can determine the candidate keys of a given relation using the following steps-
Step-01:

 Determine all essential attributes of the given relation.

 Essential attributes are those attributes which are not present on RHS of any functional
dependency.

 Essential attributes are always a part of every candidate key.

 This is because they cannot be determined by other attributes.

Example:

Let R(A, B, C, D, E, F) be a relation scheme with the following functional dependencies-


A→B
C→D
D→E
Here, the attributes which are not present on RHS of any functional dependency are A, C and F.
So, essential attributes are- A, C and F.
Step-02:
 The remaining attributes of the relation are non-essential attributes.
 This is because they can be determined by using essential attributes.

Now, following two cases are possible-

Case-01:

If all essential attributes together can determine all remaining non-essential attributes, then-

 The combination of essential attributes is the candidate key.

 It is the only possible candidate key.

Case-02:

If all essential attributes together cannot determine all remaining non-essential attributes, then-

 The set of essential attributes and some non-essential attributes will be the candidate
key(s).

 In this case, multiple candidate keys are possible.

 To find the candidate keys, we check different combinations of essential and non-essential
attributes.
We will further understand how to find candidate keys with the help of following problems.
The following practice problems are based on Case-01.
Problem-01:
Let R = (A, B, C, D, E, F) be a relation scheme with the following dependencies-
C→F
E→A
EC → D
A→B
Which of the following is a key for R?
1. CD
2. EC
3. AE
4. AC

Solution-

We will find candidate keys of the given relation in the following steps-

Step-01:

 Determine all essential attributes of the given relation.

 Essential attributes of the relation are- C and E.

 So, attributes C and E will definitely be a part of every candidate key.

Step-02:

Now,
 We will check if the essential attributes together can determine all remaining non-essential
attributes.
 To check, we find the closure of CE.
So, we have-
{ CE }+
={C,E}
= { C , E , F } ( Using C → F )
= { A , C , E , F } ( Using E → A )
= { A , C , D , E , F } ( Using EC → D )
= { A , B , C , D , E , F } ( Using A → B )
We conclude that CE can determine all the attributes of the given relation.
So, CE is the only possible candidate key of the relation.
Thus, Option (B) is correct.
Problem-02:
Let R = (A, B, C, D, E) be a relation scheme with the following dependencies-
AB → C
C→D
B→E
Solution-
We will find candidate keys of the given relation in the following steps-
Step-01:
 Determine all essential attributes of the given relation.
 Essential attributes of the relation are- A and B.
 So, attributes A and B will definitely be a part of every candidate key.
Step-02:
Now,
 We will check if the essential attributes together can determine all remaining non-essential
attributes.
 To check, we find the closure of AB.
So, we have-
{ AB }+
={A,B}
= { A , B , C } ( Using AB → C )
= { A , B , C , D } ( Using C → D )
= { A , B , C , D , E } ( Using B → E )
We conclude that AB can determine all the attributes of the given relation.
Thus, AB is the only possible candidate key of the relation.

Problem-03:
Consider the relation scheme R(E, F, G, H, I, J, K, L, M, N) and the set of functional dependencies-
{ E, F } → { G }
{F}→{I,J}
{ E, H } → { K, L }
{K}→{M}
{L}→{N}

What is the key for R?


1. { E, F }
2. { E, F, H }
3. { E, F, H, K, L }
4. { E }
Solution-
We will find candidate keys of the given relation in the following steps-
Step-01:
 Determine all essential attributes of the given relation.
 Essential attributes of the relation are- E, F and H.
 So, attributes E, F and H will definitely be a part of every candidate key.
Step-02:
Now,
 We will check if the essential attributes together can determine all remaining non-essential
attributes.
 To check, we find the closure of EFH.
So, we have-
{ EFH }+
={E,F,H}
= { E , F , G , H } ( Using EF → G )
= { E , F , G , H , I , J } ( Using F → IJ )
= { E , F , G , H , I , J , K , L } ( Using EH → KL )
= { E , F , G , H , I , J , K , L , M } ( Using K → M )
= { E , F , G , H , I , J , K , L , M , N } ( Using L → N )
We conclude that EFH can determine all the attributes of the given relation.
So, EFH is the only possible candidate key of the relation.
Thus, Option (B) is correct.

Problem-04:
Consider the relation scheme R(A, B, C, D, E, H) and the set of functional dependencies-
A→B
BC → D
E→C
D→A
What are the candidate keys of R?
1. AE, BE
2. AE, BE, DE
3. AEH, BEH, BCH
4. AEH, BEH, DEH
Solution-
Step-01:
 Determine all essential attributes of the given relation.
 Essential attributes of the relation are- E and H.
 So, attributes E and H will definitely be a part of every candidate key.

The only possible option is (D).


Thus, Option (D) is correct.
Problem-05:
Problem-06:
Consider a relation- R ( A , B , C , D , E ) with functional dependencies-
A → BC
CD → E
B→D
E→A
Answer
First, we have to find the closure of single attribute i.e. (A ,B and E) and then find the closure of
multiple attributes (CD)

 A+ = ABCDE, Hence A is a candidate key


 CD+ = ABCDE, Hence CD is a candidate key
 B+ = BD, Hence B is NOT a candidate key
 E+ = ABCDE, Hence E is a candidate key.

To find total no of candidate keys


Find the closure of all remaining combinations of attributes and neglect the combination
of above candidate keys (A, E, CD)

Attribute closure:

A -> ABCDE
B -> BD
C -> C
D -> D
E -> ABCDE
AB -> ABCDE
AC -> ABCDE
AD -> ABCDE
AE -> ABCDE
BC -> ABCDE [ Candidate key ]
BD -> BD [NOT a candidate key ]
BE -> ABCDE
CD -> ABCDE
CE -> ABCDE
DE -> ABCDE
ABC -> ABCDE
ABD -> ABCDE
ABE -> ABCDE
ACD -> ABCDE
ACE -> ABCDE
ADE -> ABCDE
BCD -> ABCDE
BDE -> ABCDE
CDE -> ABCDE
ABCD -> ABCDE
ABCE -> ABCDE
ABDE -> ABCDE
ACDE -> ABCDE
BCDE -> ABCDE
GATE QUESTIONS ON FD

1. Let R= (A, B, C, D, E, F) be a relation scheme with the following dependencies: C->F, E->A, EC->D,
A->B. Which of the following is a key for R?
(a) CD (b) EC (c) AE (d) AC

Ans: option (b)


Explanation:
Find the closure set of all the options given. If any closure covers all the attributes of the relation R
then that is the key.
Algorithm to find Closure Set
Step1: Equate an attribute or attributes to X for which closure needs to be identified.
Step2: Take each FD (functional dependency) one by one and check whether the left side of FD is
available in X, if yes then add the right side attributes to X if it is not available.
Step3: Repeat step 2 as many times as possible to cover all FD's.
Step4: After no more attributes can be added to X declare it as the closure set.
FDs: C->F, E->A, EC->D, A->B
Find closure set for CD.
X = CD
= CDF {C->F}
No more attributes can be added to X. Hence closure set of CD = CDF
Find closure set for EC.
X = EC
= ECF {C->F}
= ECFA {E->A}
= ECFAD {EC->D}
= ECFADB {A->B}
Closure set of EC covers all the attributes of the relation R.

2. Given the following relation instance.


-------
X Y Z
-------
1 4 2
1 5 3
1 6 3
3 2 2
-------
Which of the following functional dependencies are satisfied by the instance?
(a) XY -> Z and Z -> Y
(b) YZ -> X and Y -> Z
(c) YZ -> X and X -> Z
(d) XZ -> Y and Y -> X

Ans: option (b)


Explanation:
Association among attributes is known as Functional Dependencies (FD). A FD X->Y requires that
the value of X uniquely determines the value of Y where X and Y are set of attributes.
For example,
Roll_No -> Name: the value of Roll_No uniquely determines the Name.
Roll_No, Book_No -> Issue_Date : In the case of library, Roll_No and Book_No can determine the
Issue_Date of a book.

 In option (a), its given Z->Y, it means that the value of Z uniquely determines the value of Y. But
here the value 2 of Z, gives two different values of Y i.e. 4 and 2. Therefore this FD is not satisfied
by the instance.
 In option (c), its given X->Z, it means that the value of X uniquely determines the value of Z. But
here the value 1 of X, gives two different values of Z i.e. 2 and 3. Therefore this FD is not satisfied
by the instance.
 In option (d), its given Y->X, here the value of Y uniquely determines the value of X. Therefore
this FD is satisfied by the instance. Now take FD XZ->Y, here (1,3) cannot uniquely determine
the value of Y. (1,3) gives two values for Y i.e. 5 and 6. Therefore this FD (XZ->Y) is not satisfied
by the instance.

3. From the following instance of a relational schema R(A, B, C), we can conclude that:
----------
A B C
----------
1 1 1
1 1 0
2 3 2
2 3 2
----------
(a)A functionally determines B and B functionally determines C
(b) A functionally determines B and B does not functionally determine C
(c) B does not functionally determine C
(d) A does not functionally determine B and B does not functionally determine C
Ans: option (c)
Explanation:
 Looking into an instance we can't conclude that A functionally determines B. A->B could
hold true to the given instance but not on the entire database.
 But we can be sure that B does not functionally determine C because for the value 1 of B, it
gives two different values of C i.e. 1 & 0.
 Issue in option (d) - Again from the provided instance we cannot say that A does not
determine B.
 A->B may or may not hold. Therefore only possible option is (c).

4. Consider a relation scheme R = (A, B, C, D, E, H) on which the following functional


dependencies hold: {A–>B, BC–>D, E–>C, D–>A}. What are the candidate keys of R?
(a) AE, BE
(b) AE, BE, DE
(c) AEH, BEH, BCH
(d) AEH, BEH, DEH
Ans: option (d)
 Explanation:
 If you focus on the right side of the functional dependencies you can see that E & H cannot be
derived using the left side of functional dependencies. Hence E & H will definitely be part of
the candidate key. Therefore only option (d) satisfies such condition.

If any closure includes all attributes of a table then it becomes the candidate key.
Find closure (As explained in question 1) of AEH as below.
Closure of AEH = AEHB {A->B}
= AEHBC {E->C}
= AEHBCD {BC->D}
5. In a schema with attributes A, B, C, D and E, following set of functional dependencies are
given:
A->B
A->C
CD->E
B->D
E->A
Which of the following functional dependencies is NOT implied by the above set?
(a) CD->AC (b) BD->CD (c) BC->CD (d) AC->BC

Ans: option (b)


Explanation:
For every options given, find the closure set of left side of each FD. If the closure set of left side
contains the right side of the FD, then the particular FD is implied by the given set.
Option (a): Closure set of CD = CDEAB. Therefore CD->AC can be derived from the given set of FDs.
Option (c): Closure set of BC = BCDEA. Therefore BC->CD can be derived from the given set of FDs.
Option (d): Closure set of AC = ACBDE. Therefore AC->BC can be derived from the given set of FDs.
Option (b): Closure set of BD = BD. Therefore BD->CD cannot be derived from the given set of FDs.

6. The following functional dependencies are given:


AB->CD, AF->D, DE->F, C->G , F->E, G->A
Which one of the following options is false?
(a)CF+ = {ACDEFG} (b)BG+ = {ABCDG}
(c)AF+ = {ACDEFG} (d)AB+ = {ABCDFG}

Ans: option(c)
Explanation:
AF+ = {AFDE}
As explained in question 1, find the closure set of each options.
Option (d) is also false. AB+ = {ABCDG}.

Relation R has eight attributes ABCDEFGH. Fields of R contain only atomic values.
F={CH->G, A->BC, B->CFH, E->A, F->EG} is a set of functional dependencies (FDs) so that F + is
exactly the set of FDs that hold for R.
7. How many candidate keys does the relation R have?
(a) 3 (b) 4 (c) 5 (d) 6
Ans: option (b)
Explanation:
 In a relational database, a key helps to uniquely identify each record within a table . A key is a
combination of one or more fields/attributes in a table. If a relational schema has multiple keys,
each key is a candidate key. One of the candidate keys is chosen as the primary key.
 To find the candidate keys, we need to find the closure of each attribute. (If x is an attribute
(field), set of attributes determined by x under a set F of functional dependencies is the closure
of x under F, denoted x+ ).
Thus,
A+:ABCFHGE [ as per augmentation rule , augmented both side with D ]
B+: BCFHEGA
C+:C
D+:D
E+: EABCFHG
F+:FEGABCH
G+:G
H+ : H
A+,B+,E+,F+ contains all attributes except D. Thus there are 4 candidate keys DA,DB,DE and DF.

8. The relation R is
(a) in 1NF, but not in 2NF.
(b) in 2NF, but not in 3NF.
(c) in 3NF, but not in BCNF.
(d) in BCNF.
Ans: option (a)
Explanation:
 An attribute that does not occur in any candidate key is called a non-prime attribute.
 Consider F->G; G is a non-prime attribute and F is a proper subset of a candidate key (refer the
above question). This is a case of partial dependency. Hence 2NF condition is violated. similarly
A->C and B->CFH also violates 2NF condition, hence R is not in 2NF.

Since attributes of relation R has only atomic values, R is in 1NF.


9. Consider the relation X(P, Q, R, S, T, U) with the following set of functional dependencies
F={
{P, R} → {S, T}
{P, S, U} → {Q, R}
}
Which of the following is the trivial functional dependency in F+ is closure of F?
(a) {P, R} → {S, T}
(b) {P, R} → {R, T}
(c) {P, S} → {S}
(d) {P, S, U} → {Q}

Ans: option (c)


Explanation:
A functional dependency X → Y is trivial, if Y is a subset of X.
In the above question , {S}is a subset of {P,S}. Hence option (c) is the answer.

10. Consider the relation scheme R=(E,F,G,H,I,J,K,L,M,N) and the set of functional
dependencies
{{E,F}→{G},{F}→{I,J},{E,H}→{K,L},{K}→{M},{L}→{N}}
on R. What is the key for R?
(a) {E,F}
(b) {E,F,H}
(c) {E,F,H,K,L}
(d) {E}

Ans: option (b)


Explanation:
 Find closure set of all options. But as you can notice that H cannot be derived from any of the
above functional dependencies it means that H should be present in the key. Therefore we need
to check only the closure set of option b and option c which contains H.
 Since EFH+ derives all the attributes in the relation R it is the candidate key. Note that option (c)
is the super key since adding zero or more attributes to candidate key generates super key.
Normalization

A large database defined as a single relation may result in data duplication. This repetition of data
may result in:

 Making relations very large.


 It isn't easy to maintain and update data as it would involve searching many records in
relation.
 Wastage and poor utilization of disk space and resources.
 The likelihood of errors and inconsistencies increases.

So to handle these problems, we should analyze and decompose the relations with redundant data
into smaller, simpler, and well-structured relations that are satisfying desirable properties.
Normalization is a process of decomposing the relations into relations with fewer attributes.

What is Normalization?
Normalization is the process of minimizing redundancy from a relation or set of relations.
Redundancy in relation may cause insertion, deletion, and update anomalies. So, it helps to
minimize the redundancy in relations. Normal forms are used to eliminate or reduce
redundancy in database tables .
Types of Normal Forms
There are the four types of normal forms

Normal Form Description


1NF A relation will be in 1NF if it contains an atomic value.
A relation will be in 2NF if it follow the followings
2NF • It is in 1NF
• All non-key attributes are fully functional dependent on the primary key.
A relation will be in 3NF if it follow the followings
3NF • It is in 2NF
• It has no transition dependency exists.
A relation will be in BCNF if it follow the followings
BCNF  It should be in 3NF
 For every FD, LHS is a candidate key or super key.
A relation will be in 4NF if it follow the followings
4NF • It is in Boyce Codd normal form
• It has no multi-valued dependency.
A relation is in 5NF if it follow the followings
5NF • It is in 4NF
• It does not contain any join dependency and joining should be lossless.

First Normal Form (1NF)

According to fist normal form (1NF),

 Table should not contain any multi valued attributes. It should only have single (atomic)
valued attributes/columns.
 First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
 Column names of entire tables should be unique.

Note: Primary key will be composite key i.e. (“Std_ID” and “Std_Course”) in above example.
Example:02- Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

EMPLOYEE table:

MP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab

Candidate Key VS Super Key


A super-key is a set of attributes of a relation which are used to
 uniquely identify a tuple
 Determines all the attributes of given relation.
A candidate key is a minimal set of attributes necessary to
 Uniquely identify a tuple.
 Determines all the attributes of given relation.
We can say, candidate key is a minimal super-key.
Prime Vs. Non-Prime Attributes
Those attributes which are appears in candidate key set are Prime attributes.
Those attributes that does not occur in ANY candidate key is called a non-prime attribute.
Example: 01
Suppose a relation (R)= {A, B, C, D} and FD = {A→B, B→C, C→D}. Then
 As we see in previous only the closure of attribute “A” can determine the all attributes of given
relation. So, attribute “A” can be used as Candidate key
 Keep in mind, the combination of candidate key with some other attributes of given relation can
also determine all the attributes of a relation but it consider as super key not a candidate key.
So,
 Candidate Key of given relation = {A}
 Super Key of given Relation = {AB, AC, AD, ABC, ACD, ABCD}
 Prime attributes of given relation is only A. Because only a single attribute “A” is a part of
candidate key set.
 Non-Prime attributes of given relation are B,C and D. Because B,C and D attribute are not the
part of candidate key set.
Example: 02
Suppose a Relation R= {A, B, C, D} and FD = {A→B, B→C, C→D, D→A} Then
 Candidate Key of given relation = {A, B, C,D}
 Super Key of given Relation = {AB, BC, CD, ABC, BCD………..}
 Prime attributes of given relation = {A, B, C,D} . It is because all attributes of given relation
are the part of candidate key set.
 Non-Prime attributes of given relation = NULL. It is because there is non- attributes of given
relation which is not the part of candidate key set.

Second Normal Form (2NF)

A table will be in 2NF if it follows the followings


1. Table should be in the First Normal form (1NF).
2. There should no Partial Dependency in the relation; it means all the non-prime attributes
should be fully functional dependent on candidate key (all non-key attributes are fully
functional dependent on the primary key).

Partial dependency: A part of candidate key is determining the non-prime attribute is called
partial dependency. Suppose AB is the candidate key, if a part of candidate key (i.e. A) determines
the non-prime attribute (i.e. X). Like A → X, then it is partial dependency.
Partial Dependency
A partial dependency is a dependency where few attributes of the candidate key determines non-
prime attribute(s).

OR

A partial dependency is a dependency where a portion of the candidate key or incomplete


candidate key determines non-prime attribute(s).

In other words,
A → B is called a partial dependency if and only if-
1. A is a subset of some candidate key
2. B is a non-prime attribute.
If any one condition fails, then it will not be a partial dependency.

Example 1 – Consider table-3 as following below

STUD_NO COURSE_NO COURSE_FEE


1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
{Note that, there are many courses having the same course fee. }

Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;
Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the one only candidate key
{STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on COURSE_NO, which is a
proper subset of the candidate key. Non-prime attribute COURSE_FEE is dependent on a proper
subset of the candidate key, which is a partial dependency and so this relation is not in 2NF.
To convert the above relation to 2NF, we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE

Table 1

STUD_NO COURSE_NO
1 C1
2 C2
1 C4
4 C3
4 C1
2 C5

Table 2

COURSE_NO COURSE_FEE
C1 1000
C2 1500
C3 1000
C4 2000
C5 2000

NOTE: 2NF tries to reduce the redundant data getting stored in memory. For instance, if there are
100 students taking C1 course, we don’t need to store its Fee as 1000 for all the 100 records,
instead, once we can store it in the second table as the course fee for C1 is 1000.

Example 2 – Suppose Customer table where attributes are Std_ID, Std_RegNo and Location.

In the above said table,


 Candidate key: are Std_ID, Std_RegNo So,
 Prime attributes: are Std_ID, Std_RegNo
 Non-prime attributes: Location
Note that Std_RegNo determines the Location in the table which is partial dependency. Because a
part of candidate key is determining the attribute “Location”. So above relation is not in 2NF.

Solution: Divide the above table in to two parts as given below,

Now note that, above both tables fulfil the conditions of 2NF.

Example:03- Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE


25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38

 In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a


proper subset of a candidate key. That's why it violates the rule for 2NF.
 To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table: TEACHER_SUBJECT table:

TEACHER_ID SUBJECT
TEACHER_ID TEACHER_AGE 25 Chemistry
25 30 25 Biology
47 35 47 English
83 38 83 Math
83 Computer
Second Normal Form (2NF)
Third Normal Form (3NF)

 A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
 If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form

A relation is in third normal form if it holds atleast one of the following conditions for every non-
trivial function dependency X → Y.

1. X is a super key or

2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Let explain with Relational table


In the following table Roll_No is the candidate key which determines ExamType and ExamType
determines the MaxMarks. So the following table holds transition property. That’s why it in not in
3NF.

Removal of Transitive Property:


Transitive property can be removed by dividing the table into its parts.
Example:02

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY


222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal

Super key in the table above:

{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime

 Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The
non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It
violates the rule of third normal form.
 That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table,
with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP


222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY


201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
Boyce Codd Normal Form (BCNF)

BCNF is extension of 3NF. It is stricter than 3NF. According to codd normal form (BCNF),

 BCNF is the advance version of 3NF. It is stricter than 3NF.


 A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
 For BCNF, the table should be in 3NF, and for every FD, LHS is super key or candidate key.

Example: Let’s assume the following student table

The above table holds the following Candidate keys and FD’s
 Candidate key = {RollNo, ID_Card}
 FD = {RollNo → Name, RollNo→ ID_Card, ID_Card → age, ID_Card → RollNo}
As the L.H.S of all above functional dependencies contains a candidate key or super key. So, the
above table is in BCNF.

Example: Let's assume there is a company where employees work in more than one department.

EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO


264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549

In the above table Functional dependencies are as follows:

 EMP_ID → EMP_COUNTRY
 EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY
264 India
264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO


Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT
264 Designing
264 Testing
364 Stores
364 Developing

Functional dependencies:

 EMP_ID → EMP_COUNTRY
 EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

 For the first table: EMP_ID


 For the second table: EMP_DEPT
 For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.
Fourth Normal Form (4NF)

 A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
 For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation
will be a multi-valued dependency.

Multivalued Dependency

 For a single value of A, More than one values (either similar or not) of B exists.
 Multivalued dependency must contains at least 3 attributes (i.e. A, B,C) Because In the
Multivalued dependency, two attributes (i.e. B,C) in a table are independent to each other,
but both attributes (B,C) depend on a third attribute (i.e. A)

Notation of Multi-Valued Dependency

Example: Consider the Student Table as given below.

Above table does not holds the conditions of 4NF because it Holds the Multivalued Dependency.
Explanation
 FOR std_id =1 there are two values of Std_course (i.e. CS and English) and same for Std_id =2
there are two values of Std_course (i.e. Java and C#).
 Columns “Std_course” and “Std_hobby” are independent to each other but depend on
“Std_id”.
Above both points tells that Multivalued Dependency exist in table. As we know if there exist
Multivalued Dependency then that table is not in 4NF.
How to Satisfy 4th Normal Form?
To remove 4NF problem and satisfy the 4NF, we can decompose the “Student” table as given below
Table 01 “Student_Course Table 02 “Student_Hobbies”

As we see table is also in 4NF as for each value of column “A” more than one values of column “B”
exist. Now this relation satisfies the 4NF.
Example: STUDENT

STU_ID COURSE HOBBY


21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
 The 59 Physics Hockey given
STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there
is no relationship between COURSE and HOBBY.
 In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-
valued dependency on STU_ID, which leads to unnecessary repetition of data.
 So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics

STU_ID HOBBY
STUDENT_HOBBY 21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Fifth Normal Form (5NF):
 The Fifth normal form (5NF) is generally not implemented in real life database design. But we
must learn the concept about it.

A relation will be in 5NF if

 It is in 4NF
 It should not have join dependency and Joining should be lossless.
 5NF is also known as Project join normal form (PJ/NF).

Join Dependency:

 If a table holds the join dependency then


Decomposition of that table into multiple tables and re-joining of tables will always be same
table as before decomposition. It is also called lossless decomposition.

 If a table does not hold the join dependency then


Result of Decomposition or rejoining after decomposition leads toward either loss of data or
new entries are created in the table. It is called lossY decomposition.

Concept of 5th Normal Form:

 We can understand the 5th NF by understanding either join dependency or breaking down the
tables into parts and rejoin.
 As Join dependency is a little bit confusing topic so let understand the breaking down the tables
into parts.
Suppose a table SPC with composite primary key {Supplier, product, customer}

 In the above table, supplier supplies products and customer can use these products but note
that supplier does not directly supply to any customer.
 In simple word, Supplier (“Ali”) produce (“ABC”) and Customer (“Nauman”) can use it. But Ali
and Nauman are not directly connected.
Decomposition
After Decomposition of table (SPC) in to three parts as given below

Note that, In table (SC) Supplier and Customer are directly connected. So values are changed after
decomposition of table (SPC) into parts.

So above table is in 5NF

Example-02

SUBJECT LECTURER SEMESTER


Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

 In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take
Math class for Semester 2. In this case, combination of all these fields required to identify a valid
data.
 Suppose we add a new Semester as Semester 3 but do not know about the subject and who will
be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

P1

SEMESTER SUBJECT
Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2

SUBJECT LECTURER
Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3

SEMSTER LECTURER
Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen
Decomposition: in DBMS removes redundancy, anomalies and inconsistencies from a database by
dividing the table into multiple tables.

The following are the types −

Lossless join Decomposition:


 If the information is not lost from the relation that is decomposed, then the decomposition
will be lossless.
 The result of lossless decomposition after natural joins will be the exactly same relation as
it was before the decomposition.

EMPLOYEE_DEPARTMENT table:

EMPLOYEE Table: The above table is decomposed into two relations EMPLOYEE and
DEPARTMENT

Table 01: EMPLOYEE


Table 02: DEPARTMENT

Now, Above two tables are joined on the common column “EMP_ID”.
Employee ⋈ Department

As above table is the original table as it was before decomposition, Hence, the decomposition is
Lossless join decomposition.

Example −02
<EmpInfo>
Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_Name
E001 Jacob 29 Alabama Dpt1 Operations
E002 Henry 32 Alabama Dpt2 HR
E003 Tom 22 Texas Dpt3 Finance

Decompose the above table into two tables:


<EmpDetails>
Emp_ID Emp_Name Emp_Age Emp_Location
E001 Jacob 29 Alabama
E002 Henry 32 Alabama
E003 Tom 22 Texas

<DeptDetails>
Dept_ID Emp_ID Dept_Name
Dpt1 E001 Operations
Dpt2 E002 HR
Dpt3 E003 Finance
Now, Natural Join is applied on the above two tables −
The result will be −
Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_Name
E001 Jacob 29 Alabama Dpt1 Operations
E002 Henry 32 Alabama Dpt2 HR
E003 Tom 22 Texas Dpt3 Finance

Therefore, the above relation had lossless decomposition i.e. no loss of information.

Lossy Decomposition:

As the name suggests, when a relation is decomposed into two or more relational schemas, the loss
of information is unavoidable when the original relation is retrieved.

Let us see an example −

<EmpInfo>
Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_Name
E001 Jacob 29 Alabama Dpt1 Operations
E002 Henry 32 Alabama Dpt2 HR
E003 Tom 22 Texas Dpt3 Finance

Decompose the above table into two tables −


<EmpDetails>
Emp_ID Emp_Name Emp_Age Emp_Location
E001 Jacob 29 Alabama
E002 Henry 32 Alabama
E003 Tom 22 Texas

<DeptDetails>

Dept_ID Dept_Name
Dpt1 Operations
Dpt2 HR
Dpt3 Finance

Now, you won’t be able to join the above tables, since Emp_ID isn’t part of
the DeptDetails relation.
Therefore, the above relation has lossy decomposition.
Dependency Preserving Decomposition:
As we know table decomposition should be either lossless or dependency preserving to avoid the
loss of data.

So According to dependency preserving,

The decomposition of relation R with FD’s (F), into relation R1 and R2 with their FD’s (F1) and (F2)
respectively will be dependency preserving if.

Closure of F (F+) = Closure of F1 (F1+) U Closure of F2 (F2+)

Above equation can explain as

“If a relation R is decomposing into relation R1 and R2, then the dependencies of R either must be a
part of R1 or R2 or must be derivable from the dependencies of R1 and R2.”

Note: After Union of R1 and R2 attributes, the Resultant must be equal to attributes of original
relation R.

Example: Let explain the dependency preserving with example

Suppose a relation R with A, B, C and D. This Relation R is decompose into tables R1 with A, B
attributes, R2 with B, C attributes and R3 with B, D attributes.
Solution

1. First of all Find the closure of each attribute which given in left hand side of given FD’s of
Relation R(ABCD). As given in following diagram.

2. Second, Find all Non-Trivial FD’s of Decomposed Relations (R1, R2 and R3) as given under.

3. Third, find all those Non-trivial FD’s which are not determine from given Relation R(ABCD).
Let’s check one by one all Non-Trivial FD’s of all decomposed relations R1, R2 and R3.

a) Check (A→B):
As A→B of Relation R1(AB), is directly given in the FD’s of Relation R(ABCD). So, this
Dependency can determine from original table R(ABCD). Because Closure of A in Original Table
can determine “B”. So, this is valid Dependency.
b) Check (B→A):
As B→A of Relation R1 (AB), is not directly given in the FD’s of Relation R(ABCD). This
Dependency cannot determined from FD’s of original table R(ABCD). Because Closure of B
in Original Table cannot determining “A”. So this is valid a Dependency.
c) Check (B→C):
As B→C of Relation R2(BC), is directly given in the FD’s of Relation R(ABCD). So this
Dependency can determined from original table R(ABCD). Because Closure of B in Original
Table can determine “C”. So, this is valid Dependency.
d) Fourth Check (C→B):
As C→B of Relation R2(BC), is not directly given in the FD’s of Relation R(ABCD). But this
Dependency can determined from original table R(ABCD) Because Closure of C in Original Table
can determining “B”. So, this is valid Dependency.
e) Fifth Check (B→D):
As B→D of Relation R3(BD), is not directly given in the FD’s of Relation R(ABCD). But this
Dependency can determined from original table R(ABCD) Because Closure of B in Original Table
can determining “D”. So, this Dependency is valid.
f) Sixth Check (D→B):
As D→B of Relation R3(BD), is directly given in the FD’s of Relation R(ABCD). So, this
Dependency can determine from original table R(ABCD) Because Closure of D in Original Table
can determining “B”. Thus, this is also a valid Dependency.

So, above all Non-trivial FD’s are valid except 2nd FD (B→A), As in the following diagram

4. Find the closure of all valid Non-trivial FD’s of Decomposed Relations (R1, R2 and R3) as given
below
5. If all dependencies of given relation are preserve through all valid non-trivial dependencies of
its decomposed tables. Then the original table will also preserve.

Explanation of above diagram

 We will check all valid Non-trivial FD’s of decomposed tables (one by one) and see whether
these FD’s can preserve all FD’s of original Relation. If it preserves, then the decomposition is
dependency preserving otherwise not.
 As all Four FD’s of original Relation are preserve through valid Non-trivial FD’s No, 1,3,4 and
5. So, it is a dependency preserving decomposition

You might also like