0% found this document useful (0 votes)
44 views45 pages

DBDM Unit-3

Dbdm notes

Uploaded by

rsaraswathiit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views45 pages

DBDM Unit-3

Dbdm notes

Uploaded by

rsaraswathiit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 45

UNIT-3 RELATIONAL DATABASE DESIGN AND NORMALIZATION

1.ER AND EER-TO-RELATIONAL MAPPING


REFER PPT
2.UPDATE ANAMOLIES
Anomalies in DBMS
What is Anomaly?
Anomaly means inconsistency in the pattern from the normal form. In Database Management
System (DBMS), anomaly means the inconsistency occurred in the relational table during the
operations performed on the relational table.
There can be various reasons for anomalies to occur in the database. For example, if there is
a lot of redundant data present in our database then DBMS anomalies can occur. If a table is
constructed in a very poor manner then there is a chance of database anomaly. Due to
database anomalies, the integrity of the database suffers.
The other reason for the database anomalies is that all the data is stored in a single table. So,
to remove the anomalies of the database, normalization is the process which is done where
the splitting of the table and joining of the table (different types of join) occurs.
We will see the anomalies present in a table by the different examples:
Advertisement
Example 1:

Worker_id Worker_name Worker_dept Worker_addres

65 Ramesh ECT001 Jaipur

65 Ramesh ECT002 Jaipur

73 Amit ECT002 Delhi

76 Vikas ECT501 Pune

76 Vikas ECT502 Pune

79 Rajesh ECT669 Mumbai

In the above table, we have four columns which describe the details about the workers like
their name, address, department and their id. The above table is not normalized, and there is
definitely a chance of anomalies present in the table.
There can be three types of an anomaly in the database:
Updation / Update Anomaly
When we update some rows in the table, and if it leads to the inconsistency of the table then
this anomaly occurs. This type of anomaly is known as an updation anomaly. In the above
table, if we want to update the address of Ramesh then we will have to update all the rows
where Ramesh is present. If during the update we miss any single row, then there will be two
addresses of Ramesh, which will lead to inconsistent and wrong databases.
Insertion Anomaly
If there is a new row inserted in the table and it creates the inconsistency in the table then it is
called the insertion anomaly. For example, if in the above table, we create a new row of a
worker, and if it is not allocated to any department then we cannot insert it in the table so, it
will create an insertion anomaly.
Deletion Anomaly
If we delete some rows from the table and if any other information or data which is required
is also deleted from the database, this is called the deletion anomaly in the database. For
example, in the above table, if we want to delete the department number ECT669 then the
details of Rajesh will also be deleted since Rajesh's details are dependent on the row of
ECT669. So, there will be deletion anomalies in the table.
To remove this type of anomalies, we will normalize the table or split the table or join the
tables. There can be various normalized forms of a table like 1NF, 2NF, 3NF, BCNF etc. we
will apply the different normalization schemes according to the current form of the table.
Example 2:

Stu_id Stu_name Stu_branch Stu_club

2018nk01 Shivani Computer science literature

2018nk01 Shivani Computer science dancing

2018nk02 Ayush Electronics Videography

2018nk03 Mansi Electrical dancing

2018nk03 Mansi Electrical singing

2018nk04 Gopal Mechanical Photography

In the above table, we have listed students with their name, id, branch and their respective
clubs.
Updation / Update Anomaly
In the above table, if Shivani changes her branch from Computer Science to Electronics, then
we will have to update all the rows. If we miss any row, then Shivani will have more than
one branch, which will create the update anomaly in the table.
Insertion Anomaly
If we add a new row for student Ankit who is not a part of any club, we cannot insert the row
into the table as we cannot insert null in the column of stu_club. This is called insertion
anomaly.
Deletion Anomaly
If we remove the photography club from the college, then we will have to delete its row from
the table. But it will also delete the table of Gopal and his details. So, this is called deletion
anomaly and it will make the database inconsistent.
Anomalies in Relational Model
Last Updated : 16 Nov, 2023
Generative Summary
Now you can generate the summary of any article of your choice.
Got it

Anomalies in the relational model refer to inconsistencies or errors that can arise when
working with relational databases, specifically in the context of data insertion, deletion, and
modification. There are different types of anomalies that can occur in referencing and
referenced relations which can be discussed as:
These anomalies can be categorized into three types:
 Insertion Anomalies
 Deletion Anomalies
 Update Anomalies.
How Are Anomalies Caused in DBMS?
Database anomalies are the faults in the database caused due to poor management of storing
everything in the flat database. It can be removed with the process of Normalization, which
generally splits the database which results in reducing the anomalies in the database.
STUDENT Table
STUD-
STUD_N STUD_NAM STUD_PHON STUD_STAT COUNTR STUD_AG
O E E E Y E

1 RAM 9716271721 Haryana India 20

2 RAM 9898291281 Punjab India 19

3 SUJIT 7898291981 Rajasthan India 18

4 SURESH Punjab India 21

Table 1
STUDENT_COURSE

STUD_NO COURSE_NO COURSE_NAME

1 C1 DBMS

2 C2 Computer Networks

1 C2 Computer Networks

Table 2
Insertion Anomaly: If a tuple is inserted in referencing relation and referencing attribute
value is not present in referenced attribute, it will not allow insertion in referencing relation.
Example: If we try to insert a record in STUDENT_COURSE with STUD_NO =7, it will not
allow it.
Deletion and Updation Anomaly: If a tuple is deleted or updated from referenced relation and
the referenced attribute value is used by referencing attribute in referencing relation, it will
not allow deleting the tuple from referenced relation.
Example: If we want to update a record from STUDENT_COURSE with STUD_NO =1, We
have to update it in both rows of the table. If we try to delete a record from STUDENT with
STUD_NO =1, it will not allow it.
To avoid this, the following can be used in query:
 ON DELETE/UPDATE SET NULL: If a tuple is deleted or updated from referenced
relation and the referenced attribute value is used by referencing attribute in
referencing relation, it will delete/update the tuple from referenced relation and set the
value of referencing attribute to NULL.
 ON DELETE/UPDATE CASCADE: If a tuple is deleted or updated from referenced
relation and the referenced attribute value is used by referencing attribute in
referencing relation, it will delete/update the tuple from referenced relation and
referencing relation as well.
How These Anomalies Occur?
 Insertion Anomalies: These anomalies occur when it is not possible to insert data into a
database because the required fields are missing or because the data is incomplete. For
example, if a database requires that every record has a primary key, but no value is
provided for a particular record, it cannot be inserted into the database.
 Deletion anomalies: These anomalies occur when deleting a record from a database
and can result in the unintentional loss of data. For example, if a database contains
information about customers and orders, deleting a customer record may also delete all
the orders associated with that customer.
 Update anomalies: These anomalies occur when modifying data in a database and can
result in inconsistencies or errors. For example, if a database contains information
about employees and their salaries, updating an employee’s salary in one record but
not in all related records could lead to incorrect calculations and reporting.
Removal of Anomalies
These anomalies can be avoided or minimized by designing databases that adhere to the
principles of normalization. Normalization involves organizing data into tables and applying
rules to ensure data is stored in a consistent and efficient manner. By reducing data
redundancy and ensuring data integrity, normalization helps to eliminate anomalies and
improve the overall quality of the database
According to E.F.Codd, who is the inventor of the Relational Database, the goals of
Normalization include:
 It helps in vacatingall the repeated data from the database.
 It helps in removing undesirable deletion, insertion, and update anomalies.
 It helps in making a proper and useful relationship between tables.
Advantages Anomalies in Relational Model
 Data Integrity: Relational databases enforce data integrity through various constraints
such as primary keys, foreign keys, and referential integrity rules, ensuring that the
data is accurate and consistent.
 Scalability: Relational databases are highly scalable and can handle large amounts of
data without sacrificing performance.
 Flexibility: The relational model allows for flexible querying of data, making it easier
to retrieve specific information and generate reports.
 Security: Relational databases provide robust security features to protect data from
unauthorized access.
Disadvantages of Anomalies in Relational Model
 Redundancy: When the same data is stored in various locations, a relational
architecture may cause data redundancy. This can result in inefficiencies and even
inconsistent data.
 Complexity: Establishing and keeping up a relational database calls for specific
knowledge and abilities and can be difficult and time-consuming.
 Performance: Because more tables must be joined in order to access information,
performance may degrade as a database gets larger.
 Incapacity to manage unstructured data: Text documents, videos, and other forms of
semi-structured or unstructured data are not well-suited for the relational paradigm.
Conclusion
Ensuring data integrity requires addressing anomalies such as insertion, update,
and deletion problems in the Relational Model. By effectively arranging data, normalization
techniques offer a solution that guarantees consistency and dependability in relational
databases.
FAQs on Anomalies in Relational Model
Q.1: What is Normalization?
Answer:
Normalization is the process of splitting the tables into smaller ones so as to remove
anomalies in the database. It helps in reducing redundancy in the database.
Q.2: What are Anomalies in the Relational Model?
Answer:
An anomaly is a fault that is present in the database which occurs because of the poor
maintenance and poor storing of the data in the flat database. Normalization is the process
of removing anomalies from the database.
Q.3: How Anomalies can be removed?
Answer:
Anomalies can be removed with the process of Normalization. Normalization involves
organizing data into tables and applying rules to ensure data is stored in a consistent and
efficient manner.
3.FUNCTIONAL DEPENDENCIES
Types of Functional dependencies in DBMS
In relational database management, functional dependency is a concept that specifies the
relationship between two sets of attributes where one attribute determines the value of
another attribute. It is denoted as X → Y, where the attribute set on the left side of the
arrow, X is called Determinant, and Y is called the Dependent.
What is Functional Dependency?
A functional dependency occurs when one attribute uniquely determines another attribute
within a relation. It is a constraint that describes how attributes in a table relate to each
other. If attribute A functionally determines attribute B we write this as the A→B.
Functional dependencies are used to mathematically express relations among database
entities and are very important to understanding advanced concepts in Relational Database
Systems.
Example:

roll_n nam dept_nam dept_buildin


o e e g

42 abc CO A4

43 pqr IT A3

44 xyz CO A4

45 xyz IT A3

46 mno EC B2

47 jkl ME B2

From the above table we can conclude some valid functional dependencies:
 roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine values
of fields name, dept_name and dept_building, hence a valid Functional dependency
 roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name,
dept_building}, it can determine its subset dept_name also.
 dept_name → dept_building , Dept_name can identify the dept_building accurately,
since departments with different dept_name will also have a different dept_building
 More valid functional dependencies: roll_no → name, {roll_no, name} ⇢ {dept_name,
dept_building}, etc.
Here are some invalid functional dependencies:
 name → dept_name Students with the same name can have different dept_name,
hence this is not a valid functional dependency.
 dept_building → dept_name There can be multiple departments in the same
building. Example, in the above table departments ME and EC are in the same
building B2, hence dept_building → dept_name is an invalid functional dependency.
 More invalid functional dependencies: name → roll_no, {name, dept_name} →
roll_no, dept_building → roll_no, etc.
Armstrong’s axioms/properties of functional dependencies:
1. Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity rule
Example, {roll_no, name} → name is valid.
2. Augmentation: If X → Y is a valid dependency, then XZ → YZ is also valid by the
augmentation rule.
Example, {roll_no, name} → dept_building is valid, hence {roll_no, name,
dept_name} → {dept_building, dept_name} is also valid.
3. Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is also valid
by the Transitivity rule.
Example, roll_no → dept_name & dept_name → dept_building, then roll_no →
dept_building is also valid.
Types of Functional Dependencies in DBMS
1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
1. Trivial Functional Dependency
In Trivial Functional Dependency, a dependent is always a subset of the determinant. i.e.
If X → Y and Y is the subset of X, then it is called trivial functional dependency
Example:
roll_n nam ag
o e e

42 abc 17

43 pqr 18

44 xyz 18

Here, {roll_no, name} → name is a trivial functional dependency, since the


dependent name is a subset of determinant set {roll_no, name}. Similarly, roll_no →
roll_no is also an example of trivial functional dependency.
2. Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of the
determinant. i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional
dependency.
Example:

roll_n nam ag
o e e

42 abc 17

43 pqr 18

44 xyz 18

Here, roll_no → name is a non-trivial functional dependency, since the


dependent name is not a subset of determinant roll_no. Similarly, {roll_no, name} →
age is also a non-trivial functional dependency, since age is not a subset of {roll_no,
name}
3. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent set are not dependent on
each other. i.e. If a → {b, c} and there exists no functional dependency between b and c,
then it is called a multivalued functional dependency.
For example,
roll_n nam
o e age

42 abc 17

43 pqr 18

44 xyz 18

45 abc 19

Here, roll_no → {name, age} is a multivalued functional dependency, since the


dependents name & age are not dependent on each other(i.e. name → age or age →
name doesn’t exist !)
4. Transitive Functional Dependency
In transitive functional dependency, dependent is indirectly dependent on determinant. i.e.
If a → b & b → c, then according to axiom of transitivity, a → c. This is a transitive
functional dependency.
For example,

enrol_n nam dep building_n


o e t o

42 abc CO 4

43 pqr EC 2

44 xyz IT 1

45 abc EC 2

Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid functional dependency. This is an indirect
functional dependency, hence called Transitive functional dependency.
5. Fully Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely determines another
attribute or set of attributes. If a relation R has attributes X, Y, Z with the dependencies X->Y
and X->Z which states that those dependencies are fully functional.
6. Partial Functional Dependency
In partial functional dependency a non key attribute depends on a part of the composite key,
rather than the whole key. If a relation R has attributes X, Y, Z where X and Y are the
composite key and Z is non key attribute. Then X->Z is a partial functional dependency in
RBDMS.
Advantages of Functional Dependencies
Functional dependencies having numerous applications in the field of database management
system. Here are some applications listed below:
1. Data Normalization
Data normalization is the process of organizing data in a database in order to minimize
redundancy and increase data integrity. Functional dependencies play an important part in
data normalization. With the help of functional dependencies we are able to identify the
primary key, candidate key in a table which in turns helps in normalization.
2. Query Optimization
With the help of functional dependencies we are able to decide the connectivity between the
tables and the necessary attributes need to be projected to retrieve the required data from the
tables. This helps in query optimization and improves performance.
3. Consistency of Data
Functional dependencies ensures the consistency of the data by removing any redundancies
or inconsistencies that may exist in the data. Functional dependency ensures that the
changes made in one attribute does not affect inconsistency in another set of attributes thus
it maintains the consistency of the data in database.
4. Data Quality Improvement
Functional dependencies ensure that the data in the database to be accurate, complete and
updated. This helps to improve the overall quality of the data, as well as it eliminates errors
and inaccuracies that might occur during data analysis and decision making, thus functional
dependency helps in improving the quality of data in database.
Conclusion
Functional dependency is very important concept in database management system for
ensuring the data consistency and accuracy. In this article we have discuss what is the
concept behind functional dependencies and why they are important. The valid and invalid
functional dependencies and the types of most important functional dependencies in RDBMS.
We have also discussed about the advantages of FDs.
For more details you can refer Database Normalization and Difference between Fully and
Partial Functional Dependency articles.
Types of Functional dependencies in DBMS – FAQs
What is the difference between trivial and non-trivial functional dependencies?
The Trivial dependencies occur when the dependent attribute is a subset of the determinant
attribute while non-trivial dependencies involve attributes that are not subsets of the each
other.
What is a completely-functional dependency?
A completely-functional dependency occurs when an attribute depends on a set of the
attributes and not on any proper subset of this set.
What is a partial dependency, and why is it important?
A partial dependency occurs when a non-key attribute is dependent on part of the composite
key. It is important for the normalization to eliminate such dependencies to achieve the
Second Normal Form (2NF).
How does a transitive dependency affect database normalization?
The Transitive dependencies can lead to the redundancy and anomalies in a database.
Removing them helps achieve the Third Normal Form (3NF).

Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically
exists between the primary key and non-key attribute within a table.
1. X → Y
The left side of FD is known as a determinant, the right side of the production is known as a
dependent.
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
PauseNext
Mute
Current Time 0:19
/
Duration 18:10
Loaded: 6.24%
Fullscreen
Advertisement
Advertisement
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table
because if we know the Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as:
1. Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency

1. Trivial functional dependency


Advertisement
o A → B has trivial functional dependency if B is a subset of A.
o The following dependencies are also trivial like: A → A, B → B
Example:
1. Consider a table with two columns Employee_Id and Employee_Name.
2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional depende
ncy as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.
4. Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name a
re trivial dependencies too.
2. Non-trivial functional dependency
o A → B has a non-trivial functional dependency if B is not a subset of A.
o When A intersection B is NULL, then A → B is called as complete non-trivial.
Example:
1. ID → Name,
2. Name → DOB

4.INFERENCE RULES
Inference Rule (IR):
Advertisement
o The Armstrong's axioms are the basic inference rule.
o Armstrong's axioms are used to conclude functional dependencies on a relational
database.
o The inference rule is a type of assertion. It can apply to a set of FD(functional
dependency) to derive other FD.
o Using the inference rule, we can derive additional functional dependency from the
initial set.
The Functional dependency has 6 types of inference rule:
1. Reflexive Rule (IR1)
In the reflexive rule, if Y is a subset of X, then X determines Y.
1. If X ⊇ Y then X → Y
Example:
1. X = {a, b, c, d, e}
2. Y = {a, b, c}
2. Augmentation Rule (IR2)
The augmentation is also called as a partial dependency. In augmentation, if X determines Y,
then XZ determines YZ for any Z.
Advertisement
Advertisement
1. If X → Y then XZ → YZ
Example:
1. For R(ABCD), if A → B then AC → BC
3. Transitive Rule (IR3)
In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.
1. If X → Y and Y → Z then X → Z
4. Union Rule (IR4)
Union rule says, if X determines Y and X determines Z, then X must also determine Y and Z.
1. If X → Y and X → Z then X → YZ
Proof:
1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
Advertisement
5. Decomposition Rule (IR5)
Decomposition rule is also known as project rule. It is the reverse of union rule.
This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.
Advertisement
1. If X → YZ then X → Y and X → Z
Proof:
1. X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)
6. Pseudo transitive Rule (IR6)
In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ determines W.
1. If X → Y and YZ → W then XZ → W
Proof:
1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)
Inference Rules in DBMS
Last Updated : 29 Jul, 2024
Generative Summary
Now you can generate the summary of any article of your choice.
Got it

Inference rules in databases are also known as Armstrong’s Axioms in Functional


Dependency. These rules govern the functional dependencies in a relational database. From
inference rules a new functional dependency can be derived using other FDs. These rules
were introduced by William W. Armstrong. In this article, we will come to know about all
the rules proposed by him. Also, we will be exploring the prerequisites for it and will
understand the topic in a better way.
Prerequisites
 Attributes: When we talk about databases, we think of them as organized collections
of information. Imagine that you have a table called “Student.” Now, this table has
columns, which we also call “Attributes.” These columns define specific details about
the students. For example:
o Student_name: This column stores the names of the students.
o Roll_no: Here, we keep track of their roll numbers.
o Marks: And finally, we record their exam scores.
 Functional Dependencies (FDs) are like the building blocks of a database. Imagine
you have a bunch of attributes (think of them as characteristics) in a table. These
attributes can be related to each other in interesting ways or say logically. For
example, Roll_no → Marks means that from Roll_no we can get the Marks of the
student, which shows that they are Roll_no is logically related to Marks.
Inference Rules
There are 6 inference rules, which are defined below:

B. Formally, B ⊆ A then A → B.
 Reflexive Rule: According to this rule, if B is a subset of A then A logically determines

o Example: Let us take an example of the Address (A) of a house, which contains
so many parameters like House no, Street no, City etc. These all are the subsets
of A. Thus, address (A) → House no. (B).
 Augmentation Rule: It is also known as Partial dependency. According to this rule, If
A logically determines B, then adding any extra attribute doesn’t change the basic
functional dependency.
o Example: A → B, then adding any extra attribute let say C will give AC →
BC and doesn’t make any change.
 Transitive rule: Transitive rule states that if A determines B and B determines C, then
it can be said that A indirectly determines B.
o Example: If A → B and B → C then A → C.
 Union Rule: Union rule states that If A determines B and C, then A determines BC.
o Example: If A → B and A → C then A → BC.
 Decomposition Rule: It is perfectly reverse of the above Union rule. According to this
rule, If A determined BC then it can be decomposed as A → B and A → C.
o Example: If A → BC then A → B and A → C.
 Pseudo Transitive Rule: According to this rule, If A determined B and BC determines
D then BC determines D.
o Example: If A → B and BC → D then AC → D.
Conclusion
In this article, we get to know about all the inference rules in DBMS and some basic
terminologies related to it. Along with this we also learn that what are functional
dependencies and how they are interrelated in the structured table inside the Database
Management System.
Frequently Asked Questions on Inference Rules – FAQs
How Many Inference Rules are there name them?
There are 6 inference rules. Which are defined below:
 Reflexive Rule
 Augmentation Rule
 Transitive Rule
 Union Rule
 Decomposition Rule
 Pseudo Transitive Rule
What are FDs?
FDs stands for Functional Dependencies. These are the set of attributes, which are logically
related to each other.
Inference rules are proposed by whom?
These rules were introduced by William W. Armstrong
Inference rules also known as what?
These rules are also known as Armstrong’s Axioms in Functional Dependency.

5.MINIMAL COVER
Minimal Cover
#database#computerscience
A minimal cover is a simplified and reduced version of the given set of functional
dependencies.
Since it is a reduced version, it is also called as Irreducible set.
It is also called as Canonical Cover.
Steps to Find Minimal Cover
1) Split the right-hand attributes of all FDs.
Example
A->XY => A->X, A->Y
2) Remove all redundant FDs.
Example
{ A->B, B->C, A->C }
Here A->C is redundant since it can already be achieved using the Transitivity Property.
3) Find the Extraneous attribute and remove it.
Example
AB->C, either A or B or none can be extraneous.
If A closure contains B then B is extraneous and it can be removed.
If B closure contains A then A is extraneous and it can be removed.
Example 1
Minimize {A->C, AC->D, E->H, E->AD}
Step 1: {A->C, AC->D, E->H, E->A, E->D}
Step 2: {A->C, AC->D, E->H, E->A}
Here Redundant FD : {E->D}
Step 3: {AC->D}
{A}+ = {A,C}
Therefore C is extraneous and is removed.
{A->D}
Minimal Cover = {A->C, A->D, E->H, E->A}
Example 2
Minimize {AB->C, D->E, AB->E, E->C}
Step 1: {AB->C, D->E, AB->E, E->C}
Step 2: {D->E, AB->E, E->C}
Here Redundant FD = {AB->C}
Step 3: {AB->E}
{A}+ = {A}
{B}+ = {B}
There is no extraneous attribute.
Therefore, Minimal cover = {D->E, AB->E, E->C}
How To Find Minimal Set?
Last Updated : 25 Jul, 2024
Generative Summary
Now you can generate the summary of any article of your choice.
Got it

If we have a set of functional dependencies, we get the simplest and irreducible form of
functional dependencies after reducing these functional dependencies. This is called
the Minimal Cover or Irreducible Set (as we can’t reduce the set further). It is also called
a Canonical Cover.
Let us understand the procedure to find the minimal cover by this example:
The Given Functional Dependencies are – A->B, B ->C, D->ABC, AC-> D
1. Steps to Find Minimal Cover
Step 1: First split all the right-hand attributes of all FDs such that RHS contains a single
attribute.
Example: D->ABC is split into D->A, D->B and D->C
A->B, B->C, D->A, D->B, D->C, AC->D
[Note: We can't split AC->D as A->D, C->D]
Step 2: Now remove all redundant FDs.
[Redundant FD is if we derive one FD from another FD ]
Let, 's test the redundance of A->B
A+ = A (A is only closure contains to A, simply we can derive A from A) So, A->B is not
redundant.
Similarly, B->C is not redundant.
But, D->B and D->C is redundant
because D+= A and A+=B, So D+=B can be derived which means D->B is redundant. So,
We remove D->B from the FDs set.
Now, check for D->C, it is not redundant because we can't D+=B and B+=C as we remove
D->B from the list.
At last, we check for AC->D. This is also not redundant.
AC+=AC
So, the final FDs are: A->B, B->C, D->A,D->C, AC->D
Step 3: Find the Extraneous attribute and remove it.
In this case, we should only check AC->D. Simply
we can say the right-hand attributes are pointed
by only one attribute at one time.
AC->D, either A or C, or none can be extraneous.
If A=+ C then C is extraneous and it can be removed.
If C+=A then A is extraneous and it can be removed.
So, the final FDs are: A->B, B->C, D->A,D->C, AC->D
Hence, we can write it as A->B, B->C, D->AC, AC->D this is the minimum cover.
2. Find the Minimal Cover
Given Functional Dependencies
 A -> B
 B -> C
 D -> ABC
 AC -> D
Step 1: Split the Functional Dependencies
 A -> B
 B -> C
 D -> A
 D -> B
 D -> C
 AC -> D
Step 2: Remove Redundant FDs
 A -> B (not redundant)
 B -> C (not redundant)
 D -> A (not redundant)
 D -> B (redundant, because D -> A and A -> B)
 D -> C (not redundant, as D -> B was removed)
 AC -> D (not redundant)
After removing redundancies FDs set became
 A -> B
 B -> C
 D -> A
 D -> C
 AC -> D
Step 3: Remove Extraneous Attributes
 In AC -> D, check if A or C is extraneous.
 Compute closures
 A+ = {A,B,C}
 C+ = {C}
 Since A alone does not give D, A is not extraneous
 Since C alone does not give D,C is not extraneous
So Minimal cover of (A -> B, B -> C, D -> ABC, AC -> D) => (A -> B, B -> C, D -> A, AC
-> D)
By ensuring all functional dependencies are minimal and non-redundant, we obtain the
correct minimal cover.
Conclusion
To find a minimal set, first split the functional dependencies, then remove any redundant
functional dependencies, and finally remove any extraneous attributes. This process ensures
that you have the smallest possible set that still includes all necessary information without
any redundancy.
Frequently Asked Questions on Minimal Set – FAQs
Why is finding a minimal set important?
Finding a minimal set is crucial to eliminate redundancies, simplify analysis, and improve
efficiency in data handling and processing.
What are functional dependencies?
Functional dependencies describe relationships between attributes in a dataset, where one
attribute’s value depends on another.
How do you split functional dependencies?
Splitting functional dependencies involves breaking down complex dependencies into
simpler, single-attribute dependencies.

6.PROPERIES OF RELATIONAL DECOMPOSITION


Properties of Relational Decomposition
Last Updated : 29 Aug, 2019
Generative Summary
Now you can generate the summary of any article of your choice.
Got it

When a relation in the relational model is not appropriate normal form then the
decomposition of a relation is required. In a database, breaking down the table into multiple
tables termed as decomposition. The properties of a relational decomposition are listed
below :
1. Attribute Preservation:
Using functional dependencies the algorithms decompose the universal relation
schema R in a set of relation schemas D = { R1, R2, ….. Rn } relational database
schema, where ‘D’ is called the Decomposition of R.
The attributes in R will appear in at least one relation schema Ri in the decomposition, i.e.,
no attribute is lost. This is called the Attribute Preservation condition of decomposition.
2. Dependency Preservation:
If each functional dependency X->Y specified in F appears directly in one of the
relation schemas Ri in the decomposition D or could be inferred from the
dependencies that appear in some Ri. This is the Dependency Preservation.
If a decomposition is not dependency preserving some dependency is lost in decomposition.
To check this condition, take the JOIN of 2 or more relations in the decomposition.
For example:
R = (A, B, C)
F = {A ->B, B->C}
Key = {A}

R is not in BCNF.
Decomposition R1 = (A, B), R2 = (B, C)
R1 and R2 are in BCNF, Lossless-join decomposition, Dependency preserving.
Each Functional Dependency specified in F either appears directly in one of the relations in
the decomposition.
It is not necessary that all dependencies from the relation R appear in some relation Ri.
It is sufficient that the union of the dependencies on all the relations Ri be equivalent to the
dependencies on R.
3. Non Additive Join Property:
Another property of decomposition is that D should possess is the Non Additive Join
Property, which ensures that no spurious tuples are generated when a NATURAL
JOIN operation is applied to the relations resulting from the decomposition.

4. No redundancy:
Decomposition is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy.If the relation has no proper decomposition, then it
may lead to problems like loss of information.

5. Lossless Join:
Lossless join property is a feature of decomposition supported by normalization. It is
the ability to ensure that any instance of the original relation can be identified from
corresponding instances in the smaller relations.
For example:
R : relation, F : set of functional dependencies on R,
X, Y : decomposition of R,
A decomposition {R1, R2, …, Rn} of a relation R is called a lossless decomposition for R if
the natural join of R1, R2, …, Rn produces exactly the relation R.
A decomposition is lossless if we can recover:
R(A, B, C) -> Decompose -> R1(A, B) R2(A, C) -> Recover -> R’(A, B, C)
Thus, R’ = R
Decomposition is lossless if:
X intersection Y -> X, that is: all attributes common to both X and Y functionally determine
ALL the attributes in X.
X intersection Y -> Y, that is: all attributes common to both X and Y functionally determine
ALL the attributes in Y
If X intersection Y forms a superkey of either X or Y, the decomposition of R is a lossless
decomposition.

Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of
information.
o Decomposition is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy.
Types of Decomposition

Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same
relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Mark

46 Stephan 30 Bangalore 869 Finan

52 Katherine 36 Mumbai 575 Produ


60 Jack 40 Noida 678 Testin

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
Advertisement
Advertisement

EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida

DEPARTMENT table

DEPT_ID EMP_ID DEPT_NAME

827 22 Sales

438 33 Marketing

869 46 Finance

575 52 Production

678 60 Testing

Now, when these two relations are joined on the common column "EMP_ID", then the
resultant relation will look like:
Employee ⋈ Department

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT

22 Denim 28 Mumbai 827 Sales


33 Alina 25 Delhi 438 Mark

46 Stephan 30 Bangalore 869 Finan

52 Katherine 36 Mumbai 575 Produ

60 Jack 40 Noida 678 Testin

Hence, the decomposition is Lossless join decomposition.


Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every
dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R
either must be a part of R1 or R2 or must be derivable from the combination of
functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of relation R1(ABC).

7.NORMALIZATION(UPTO BCNF)
Normal Forms in DBMS
Last Updated : 23 Jul, 2024
Generative Summary
Now you can generate the summary of any article of your choice.
Got it

Normalization is the process of minimizing redundancy from a relation or set of relations.


Redundancy in relation may cause insertion, deletion, and update anomalies. So, it helps to
minimize the redundancy in relations. Normal forms are used to eliminate or reduce
redundancy in database tables.
Normalization of DBMS
In database management systems (DBMS), normal forms are a series of guidelines that help
to ensure that the design of a database is efficient, organized, and free from data anomalies.
There are several levels of normalization, each with its own set of guidelines, known as
normal forms.
Important Points Regarding Normal Forms in DBMS
 First Normal Form (1NF): This is the most basic level of normalization. In 1NF, each
table cell should contain only a single value, and each column should have a unique
name. The first normal form helps to eliminate duplicate data and simplify queries.
 Second Normal Form (2NF): 2NF eliminates redundant data by requiring that each
non-key attribute be dependent on the primary key. This means that each column
should be directly related to the primary key, and not to other columns.
 Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key
attributes are independent of each other. This means that each column should be
directly related to the primary key, and not to any other columns in the same table.
 Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures that
each determinant in a table is a candidate key. In other words, BCNF ensures that
each non-key attribute is dependent only on the candidate key.
 Fourth Normal Form (4NF): 4NF is a further refinement of BCNF that ensures that
a table does not contain any multi-valued dependencies.
 Fifth Normal Form (5NF): 5NF is the highest level of normalization and involves
decomposing a table into smaller tables to remove data redundancy and improve data
integrity.
Normal forms help to reduce data redundancy, increase data consistency, and improve
database performance. However, higher levels of normalization can lead to more complex
database designs and queries. It is important to strike a balance between normalization and
practicality when designing a database.
Advantages of Normal Form
 Reduced data redundancy: Normalization helps to eliminate duplicate data in tables,
reducing the amount of storage space needed and improving database efficiency.
 Improved data consistency: Normalization ensures that data is stored in a consistent
and organized manner, reducing the risk of data inconsistencies and errors.
 Simplified database design: Normalization provides guidelines for organizing tables
and data relationships, making it easier to design and maintain a database.
 Improved query performance: Normalized tables are typically easier to search and
retrieve data from, resulting in faster query performance.
 Easier database maintenance: Normalization reduces the complexity of a database by
breaking it down into smaller, more manageable tables, making it easier to add,
modify, and delete data.
Overall, using normal forms in DBMS helps to improve data quality, increase database
efficiency, and simplify database design and maintenance.
First Normal Form
If a relation contain composite or multi-valued attribute, it violates first normal form or a
relation is in first normal form if it does not contain any composite or multi-valued attribute.
A relation is in first normal form if every attribute in that relation is singled valued attribute.
 Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-valued
attribute STUD_PHONE. Its decomposition into 1NF has been shown in table 2.

Example
 Example 2 –
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
 In the above table Course is a multi-valued attribute so it is not in 1NF. Below Table is
in 1NF as there is no multi-valued attribute
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
Second Normal Form
A relation is in 2NF if it is in 1NF and any non-prime attribute (attributes which are not part
of any candidate key) is not partially dependent on any proper subset of any candidate key of
the table. In other words, we can say that, every non-prime attribute must be fully dependent
on each candidate key.
A functional dependency X->Y (where X and Y are set of attributes) is said to be in partial
dependency, if Y can be determined by any proper subset of X.
However, in 2NF it is possible for a prime attribute to be partially dependent on any
candidate key, but every non-prime attribute must be fully dependent(or not partially
dependent) on each candidate key of the table.
 Example 1 – Consider table-3 as following below.
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
 {Note that, there are many courses having the same course fee} Here, COURSE_FEE
cannot alone decide the value of COURSE_NO or STUD_NO; COURSE_FEE
together with STUD_NO cannot decide the value of COURSE_NO; COURSE_FEE
together with COURSE_NO cannot decide the value of STUD_NO; Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the one only
candidate key {STUD_NO, COURSE_NO} ; But, COURSE_NO -> COURSE_FEE,
i.e., COURSE_FEE is dependent on COURSE_NO, which is a proper subset of the
candidate key. Non-prime attribute COURSE_FEE is dependent on a proper subset of
the candidate key, which is a partial dependency and so this relation is not in 2NF. To
convert the above relation to 2NF, we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
 NOTE: 2NF tries to reduce the redundant data getting stored in memory. For
instance, if there are 100 students taking C1 course, we don’t need to store its Fee as
1000 for all the 100 records, instead, once we can store it in the second table as the
course fee for C1 is 1000.
 Example 2 – Consider following functional dependencies in relation R (A, B , C,
D)
AB -> C [A and B together determine C]
BC -> D [B and C together determine D]
In the above relation, AB is the only candidate key and there is no partial dependency, i.e.,
any proper subset of AB doesn’t determine any non-prime attribute.
X is a super key.
Y is a prime attribute (each element of Y is part of some candidate key).
Example 1: In relation STUDENT given in Table 4, FD set: {STUD_NO -> STUD_NAME,
STUD_NO -> STUD_STATE, STUD_STATE -> STUD_COUNTRY, STUD_NO ->
STUD_AGE}
Candidate Key: {STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true.
So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the third normal
form.
To convert it in third normal form, we will decompose the relation STUDENT (STUD_NO,
STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)
Consider relation R(A, B, C, D, E) A -> BC, CD -> E, B -> D, E -> A All possible candidate
keys in above relation are {A, E, CD, BC} All attributes are on right sides of all functional
dependencies are prime.
Example 2: Find the highest normal form of a relation R(A,B,C,D,E) with FD set as {BC-
>D, AC->BE, B->E}
Step 1: As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can determine all
attribute of relation, So AC will be candidate key. A or C can’t be derived from any other
attribute of the relation, so there will be only 1 candidate key {AC}.
Step 2: Prime attributes are those attributes that are part of candidate key {A, C} in this
example and others will be non-prime {B, D, E} in this example.
Step 3: The relation R is in 1st normal form as a relational DBMS does not allow multi-
valued or composite attribute. The relation is in 2nd normal form because BC->D is in 2nd
normal form (BC is not a proper subset of candidate key AC) and AC->BE is in 2nd normal
form (AC is candidate key) and B->E is in 2nd normal form (B is not a proper subset of
candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is
a prime attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to
satisfy 3rd normal for, either LHS of an FD should be super key or RHS should be prime
attribute. So the highest normal form of relation will be 2nd Normal form.
For example consider relation R(A, B, C) A -> BC, B -> A and B both are super keys so
above relation is in BCNF.
Third Normal Form
A relation is said to be in third normal form, if we did not have any transitive dependency for
non-prime attributes. The basic condition with the Third Normal Form is that, the relation
must be in Second Normal Form.
Below mentioned is the basic condition that must be hold in the non-trivial functional
dependency X -> Y:
 X is a Super Key.
 Y is a Prime Attribute ( this means that element of Y is some part of Candidate Key).
For more, refer to Third Normal Form in DBMS.
BCNF
BCNF (Boyce-Codd Normal Form) is just a advanced version of Third Normal Form. Here
we have some additional rules than Third Normal Form. The basic condition for any relation
to be in BCNF is that it must be in Third Normal Form.
We have to focus on some basic rules that are for BCNF:
1. Table must be in Third Normal Form.
2. In relation X->Y, X must be a superkey in a relation.
For more, refer to BCNF in DBMS.
Fourth Normal Form
Fourth Normal Form contains no non-trivial multivalued dependency except candidate key.
The basic condition with Fourth Normal Form is that the relation must be in BCNF.
The basic rules are mentioned below.
1. It must be in BCNF.
2. It does not have any multi-valued dependency.
For more, refer to Fourth Normal Form in DBMS.
Fifth Normal Form
Fifth Normal Form is also called as Projected Normal Form. The basic conditions of Fifth
Normal Form is mentioned below.
Relation must be in Fourth Normal Form.
The relation must not be further non loss decomposed.
For more, refer to Fifth Normal Form in DBMS.
Applications of Normal Forms in DBMS
 Data consistency: Normal forms ensure that data is consistent and does not contain
any redundant information. This helps to prevent inconsistencies and errors in the
database.
 Data redundancy: Normal forms minimize data redundancy by organizing data into
tables that contain only unique data. This reduces the amount of storage space
required for the database and makes it easier to manage.
 Response time: Normal forms can improve query performance by reducing the
number of joins required to retrieve data. This helps to speed up query processing and
improve overall system performance.
 Database maintenance: Normal forms make it easier to maintain the database by
reducing the amount of redundant data that needs to be updated, deleted, or modified.
This helps to improve database management and reduce the risk of errors or
inconsistencies.
 Database design: Normal forms provide guidelines for designing databases that are
efficient, flexible, and scalable. This helps to ensure that the database can be easily
modified, updated, or expanded as needed.
Some Important Points about Normal Forms
 BCNF is free from redundancy caused by Functional Dependencies.
 If a relation is in BCNF, then 3NF is also satisfied.
 If all attributes of relation are prime attribute, then the relation is always in 3NF.
 A relation in a Relational Database is always and at least in 1NF form.
 Every Binary Relation ( a Relation with only 2 attributes ) is always in BCNF.
 If a Relation has only singleton candidate keys( i.e. every candidate key consists of
only 1 attribute), then the Relation is always in 2NF( because no Partial functional
dependency possible).
 Sometimes going for BCNF form may not preserve functional dependency. In that case
go for BCNF only if the lost FD(s) is not required, else normalize till 3NF only.
 There are many more Normal forms that exist after BCNF, like 4NF and more. But in
real world database systems it’s generally not required to go beyond BCNF.
Conclusion
In Conclusion, relational databases can be arranged according to a set of rules called
normal forms in database administration (1NF, 2NF, 3NF, BCNF, 4NF, and 5NF), which
reduce data redundancy and preserve data integrity. By resolving various kinds of data
anomalies and dependencies, each subsequent normal form expands upon the one that came
before it. The particular requirements and properties of the data being stored determine
which normal form should be used; higher normal forms offer stricter data integrity but may
also result in more complicated database structures.
Previous Year Question Links
 GATE CS 2012, Question 2
 GATE CS 2013, Question 54
 GATE CS 2013, Question 55
 GATE CS 2005, Question 29
 GATE CS 2002, Question 23
 GATE CS 2002, Question 50
 GATE CS 2001, Question 48
 GATE CS 1999, Question 32
 GATE IT 2005, Question 22
 GATE IT 2008, Question 60
 GATE CS 2016 (Set 1), Question 31
Normal Forms in DBMS – FAQs
Why is Normalization Important in DBMS?
Normalization helps in preventing database from anomalies, that ultimately ensures the
consistency of database and helps in easy maintenance of database.
Is it possible to over-normalize the database?
Yes, excessive normalization will go to complex queries and also reduces performance. It
strikes balance between normalization and practicality.
Is it necessary to normalize a database to Highest Normal Form like (BCNF or 4NF)?
There is no certain necessary condition for any database normalization. Many times, lower
form can be sufficient for specific performance and simplicity.
Is it possible for a relation in 2NF to have partial dependency?
Yes, It is possible for a prime attribute to be partially dependent(or not fully dependent) on a
candidate key. For example, consider a R(A, B, C, D, E, F) relation(in 2NF), with the
following functional dependency,
ABC -> D
ABC -> E
AB -> D
DE -> ABC
DE -> F
E -> C
Here, ABC and DE are candidate keys and we can see that here, prime attribute D is
partially dependent on ABC in functional dependency, ABC -> D (because of AB -> D).
Also, prime attribute C is partially dependent on DE (because of E -> C).
In conclusion, the above given relation R is in 2NF despite having partial dependencies.
Normalization
A large database defined as a single relation may result in data duplication. This repetition
of data may result in:
Advertisement
o Making relations very large.
o It isn't easy to maintain and update data as it would involve searching many records in
relation.
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.
So to handle these problems, we should analyze and decompose the relations with redundant
data into smaller, simpler, and well-structured relations that are satisfy desirable properties.
Normalization is a process of decomposing the relations into relations with fewer attributes.
What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It
is also used to eliminate undesirable characteristics like Insertion, Update, and
Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.
Why do we need Normalization?
The main reason for normalizing the relations is removing these anomalies. Failure to
eliminate anomalies leads to data redundancy and can cause data integrity and other
problems as the database grows. Normalization consists of a series of guidelines that helps
to guide you in creating a good database structure.
Advertisement
Data modification anomalies can be categorized into three types:
o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple
into a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of
data results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.
Types of Normal Forms:
Normalization works through a series of stages called Normal forms. The normal forms
apply to individual relations. The relation is said to be in particular normal form if it
satisfies constraints.
Following are the various types of Normal forms:

Normal Form Description

1NF A relation is in 1NF if it contains an atomic

A relation will be in 2NF if it is in 1NF and


2NF key attributes are fully functional dependen
primary key.

A relation will be in 3NF if it is in 2NF and


3NF
transition dependency exists.

A stronger definition of 3NF is known as B


BCNF
Codd's normal form.

A relation will be in 4NF if it is in Boyce C


4NF
normal form and has no multi-valued depe
A relation is in 5NF. If it is in 4NF and doe
5NF contain any join dependency, joining shoul
lossless.

Advantages of Normalization
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.
Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms,
i.e., 4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious
problems.
First Normal Form (1NF)
o A relation will be 1NF if it contains an atomic value.
o It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute
EMP_PHONE.
EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

7272826385,
14 John UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

Second Normal Form (2NF)


o In the 2NF, relational must be in 1NF.
o In the second normal form, all non-key attributes are fully functional dependent on the
primary key
Example: Let's assume, a school can store the data of teachers and the subjects they teach.
In a school, a teacher can teach more than one subject.
TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID


which is a proper subset of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer

Third Normal Form (3NF)


Advertisement
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be
in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every
non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicag

555 Katharine 06389 UK Norwi

666 John 462007 MP Bhopa

Super key in the table above:


1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....s
o on
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on
EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on
super key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP table:
EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal

Boyce Codd normal form (BCNF)


o BCNF is the advance version of 3NF. It is stricter than 3NF.
o A table is in BCNF if every functional dependency X → Y, X is the super key of the
table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one
department.
EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DE

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:


1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
Advertisement
Advertisement
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.

Fourth normal form (4NF)


o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then
the relation will be a multi-valued dependency.
Example
STUDENT

STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-
valued dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
Advertisement
Advertisement
STUDENT_COURSE

STU_ID COURSE
21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics

STUDENT_HOBBY

STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

Fifth normal form (5NF)


o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order
to avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).
Example

SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1


Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case, combination of all these fields required to
identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who
will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2

SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen
P3

SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen

You might also like