DBDM Unit-3
DBDM Unit-3
In the above table, we have four columns which describe the details about the workers like
their name, address, department and their id. The above table is not normalized, and there is
definitely a chance of anomalies present in the table.
There can be three types of an anomaly in the database:
Updation / Update Anomaly
When we update some rows in the table, and if it leads to the inconsistency of the table then
this anomaly occurs. This type of anomaly is known as an updation anomaly. In the above
table, if we want to update the address of Ramesh then we will have to update all the rows
where Ramesh is present. If during the update we miss any single row, then there will be two
addresses of Ramesh, which will lead to inconsistent and wrong databases.
Insertion Anomaly
If there is a new row inserted in the table and it creates the inconsistency in the table then it is
called the insertion anomaly. For example, if in the above table, we create a new row of a
worker, and if it is not allocated to any department then we cannot insert it in the table so, it
will create an insertion anomaly.
Deletion Anomaly
If we delete some rows from the table and if any other information or data which is required
is also deleted from the database, this is called the deletion anomaly in the database. For
example, in the above table, if we want to delete the department number ECT669 then the
details of Rajesh will also be deleted since Rajesh's details are dependent on the row of
ECT669. So, there will be deletion anomalies in the table.
To remove this type of anomalies, we will normalize the table or split the table or join the
tables. There can be various normalized forms of a table like 1NF, 2NF, 3NF, BCNF etc. we
will apply the different normalization schemes according to the current form of the table.
Example 2:
In the above table, we have listed students with their name, id, branch and their respective
clubs.
Updation / Update Anomaly
In the above table, if Shivani changes her branch from Computer Science to Electronics, then
we will have to update all the rows. If we miss any row, then Shivani will have more than
one branch, which will create the update anomaly in the table.
Insertion Anomaly
If we add a new row for student Ankit who is not a part of any club, we cannot insert the row
into the table as we cannot insert null in the column of stu_club. This is called insertion
anomaly.
Deletion Anomaly
If we remove the photography club from the college, then we will have to delete its row from
the table. But it will also delete the table of Gopal and his details. So, this is called deletion
anomaly and it will make the database inconsistent.
Anomalies in Relational Model
Last Updated : 16 Nov, 2023
Generative Summary
Now you can generate the summary of any article of your choice.
Got it
Anomalies in the relational model refer to inconsistencies or errors that can arise when
working with relational databases, specifically in the context of data insertion, deletion, and
modification. There are different types of anomalies that can occur in referencing and
referenced relations which can be discussed as:
These anomalies can be categorized into three types:
Insertion Anomalies
Deletion Anomalies
Update Anomalies.
How Are Anomalies Caused in DBMS?
Database anomalies are the faults in the database caused due to poor management of storing
everything in the flat database. It can be removed with the process of Normalization, which
generally splits the database which results in reducing the anomalies in the database.
STUDENT Table
STUD-
STUD_N STUD_NAM STUD_PHON STUD_STAT COUNTR STUD_AG
O E E E Y E
Table 1
STUDENT_COURSE
1 C1 DBMS
2 C2 Computer Networks
1 C2 Computer Networks
Table 2
Insertion Anomaly: If a tuple is inserted in referencing relation and referencing attribute
value is not present in referenced attribute, it will not allow insertion in referencing relation.
Example: If we try to insert a record in STUDENT_COURSE with STUD_NO =7, it will not
allow it.
Deletion and Updation Anomaly: If a tuple is deleted or updated from referenced relation and
the referenced attribute value is used by referencing attribute in referencing relation, it will
not allow deleting the tuple from referenced relation.
Example: If we want to update a record from STUDENT_COURSE with STUD_NO =1, We
have to update it in both rows of the table. If we try to delete a record from STUDENT with
STUD_NO =1, it will not allow it.
To avoid this, the following can be used in query:
ON DELETE/UPDATE SET NULL: If a tuple is deleted or updated from referenced
relation and the referenced attribute value is used by referencing attribute in
referencing relation, it will delete/update the tuple from referenced relation and set the
value of referencing attribute to NULL.
ON DELETE/UPDATE CASCADE: If a tuple is deleted or updated from referenced
relation and the referenced attribute value is used by referencing attribute in
referencing relation, it will delete/update the tuple from referenced relation and
referencing relation as well.
How These Anomalies Occur?
Insertion Anomalies: These anomalies occur when it is not possible to insert data into a
database because the required fields are missing or because the data is incomplete. For
example, if a database requires that every record has a primary key, but no value is
provided for a particular record, it cannot be inserted into the database.
Deletion anomalies: These anomalies occur when deleting a record from a database
and can result in the unintentional loss of data. For example, if a database contains
information about customers and orders, deleting a customer record may also delete all
the orders associated with that customer.
Update anomalies: These anomalies occur when modifying data in a database and can
result in inconsistencies or errors. For example, if a database contains information
about employees and their salaries, updating an employee’s salary in one record but
not in all related records could lead to incorrect calculations and reporting.
Removal of Anomalies
These anomalies can be avoided or minimized by designing databases that adhere to the
principles of normalization. Normalization involves organizing data into tables and applying
rules to ensure data is stored in a consistent and efficient manner. By reducing data
redundancy and ensuring data integrity, normalization helps to eliminate anomalies and
improve the overall quality of the database
According to E.F.Codd, who is the inventor of the Relational Database, the goals of
Normalization include:
It helps in vacatingall the repeated data from the database.
It helps in removing undesirable deletion, insertion, and update anomalies.
It helps in making a proper and useful relationship between tables.
Advantages Anomalies in Relational Model
Data Integrity: Relational databases enforce data integrity through various constraints
such as primary keys, foreign keys, and referential integrity rules, ensuring that the
data is accurate and consistent.
Scalability: Relational databases are highly scalable and can handle large amounts of
data without sacrificing performance.
Flexibility: The relational model allows for flexible querying of data, making it easier
to retrieve specific information and generate reports.
Security: Relational databases provide robust security features to protect data from
unauthorized access.
Disadvantages of Anomalies in Relational Model
Redundancy: When the same data is stored in various locations, a relational
architecture may cause data redundancy. This can result in inefficiencies and even
inconsistent data.
Complexity: Establishing and keeping up a relational database calls for specific
knowledge and abilities and can be difficult and time-consuming.
Performance: Because more tables must be joined in order to access information,
performance may degrade as a database gets larger.
Incapacity to manage unstructured data: Text documents, videos, and other forms of
semi-structured or unstructured data are not well-suited for the relational paradigm.
Conclusion
Ensuring data integrity requires addressing anomalies such as insertion, update,
and deletion problems in the Relational Model. By effectively arranging data, normalization
techniques offer a solution that guarantees consistency and dependability in relational
databases.
FAQs on Anomalies in Relational Model
Q.1: What is Normalization?
Answer:
Normalization is the process of splitting the tables into smaller ones so as to remove
anomalies in the database. It helps in reducing redundancy in the database.
Q.2: What are Anomalies in the Relational Model?
Answer:
An anomaly is a fault that is present in the database which occurs because of the poor
maintenance and poor storing of the data in the flat database. Normalization is the process
of removing anomalies from the database.
Q.3: How Anomalies can be removed?
Answer:
Anomalies can be removed with the process of Normalization. Normalization involves
organizing data into tables and applying rules to ensure data is stored in a consistent and
efficient manner.
3.FUNCTIONAL DEPENDENCIES
Types of Functional dependencies in DBMS
In relational database management, functional dependency is a concept that specifies the
relationship between two sets of attributes where one attribute determines the value of
another attribute. It is denoted as X → Y, where the attribute set on the left side of the
arrow, X is called Determinant, and Y is called the Dependent.
What is Functional Dependency?
A functional dependency occurs when one attribute uniquely determines another attribute
within a relation. It is a constraint that describes how attributes in a table relate to each
other. If attribute A functionally determines attribute B we write this as the A→B.
Functional dependencies are used to mathematically express relations among database
entities and are very important to understanding advanced concepts in Relational Database
Systems.
Example:
42 abc CO A4
43 pqr IT A3
44 xyz CO A4
45 xyz IT A3
46 mno EC B2
47 jkl ME B2
From the above table we can conclude some valid functional dependencies:
roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine values
of fields name, dept_name and dept_building, hence a valid Functional dependency
roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name,
dept_building}, it can determine its subset dept_name also.
dept_name → dept_building , Dept_name can identify the dept_building accurately,
since departments with different dept_name will also have a different dept_building
More valid functional dependencies: roll_no → name, {roll_no, name} ⇢ {dept_name,
dept_building}, etc.
Here are some invalid functional dependencies:
name → dept_name Students with the same name can have different dept_name,
hence this is not a valid functional dependency.
dept_building → dept_name There can be multiple departments in the same
building. Example, in the above table departments ME and EC are in the same
building B2, hence dept_building → dept_name is an invalid functional dependency.
More invalid functional dependencies: name → roll_no, {name, dept_name} →
roll_no, dept_building → roll_no, etc.
Armstrong’s axioms/properties of functional dependencies:
1. Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity rule
Example, {roll_no, name} → name is valid.
2. Augmentation: If X → Y is a valid dependency, then XZ → YZ is also valid by the
augmentation rule.
Example, {roll_no, name} → dept_building is valid, hence {roll_no, name,
dept_name} → {dept_building, dept_name} is also valid.
3. Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is also valid
by the Transitivity rule.
Example, roll_no → dept_name & dept_name → dept_building, then roll_no →
dept_building is also valid.
Types of Functional Dependencies in DBMS
1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
1. Trivial Functional Dependency
In Trivial Functional Dependency, a dependent is always a subset of the determinant. i.e.
If X → Y and Y is the subset of X, then it is called trivial functional dependency
Example:
roll_n nam ag
o e e
42 abc 17
43 pqr 18
44 xyz 18
roll_n nam ag
o e e
42 abc 17
43 pqr 18
44 xyz 18
42 abc 17
43 pqr 18
44 xyz 18
45 abc 19
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2
Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid functional dependency. This is an indirect
functional dependency, hence called Transitive functional dependency.
5. Fully Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely determines another
attribute or set of attributes. If a relation R has attributes X, Y, Z with the dependencies X->Y
and X->Z which states that those dependencies are fully functional.
6. Partial Functional Dependency
In partial functional dependency a non key attribute depends on a part of the composite key,
rather than the whole key. If a relation R has attributes X, Y, Z where X and Y are the
composite key and Z is non key attribute. Then X->Z is a partial functional dependency in
RBDMS.
Advantages of Functional Dependencies
Functional dependencies having numerous applications in the field of database management
system. Here are some applications listed below:
1. Data Normalization
Data normalization is the process of organizing data in a database in order to minimize
redundancy and increase data integrity. Functional dependencies play an important part in
data normalization. With the help of functional dependencies we are able to identify the
primary key, candidate key in a table which in turns helps in normalization.
2. Query Optimization
With the help of functional dependencies we are able to decide the connectivity between the
tables and the necessary attributes need to be projected to retrieve the required data from the
tables. This helps in query optimization and improves performance.
3. Consistency of Data
Functional dependencies ensures the consistency of the data by removing any redundancies
or inconsistencies that may exist in the data. Functional dependency ensures that the
changes made in one attribute does not affect inconsistency in another set of attributes thus
it maintains the consistency of the data in database.
4. Data Quality Improvement
Functional dependencies ensure that the data in the database to be accurate, complete and
updated. This helps to improve the overall quality of the data, as well as it eliminates errors
and inaccuracies that might occur during data analysis and decision making, thus functional
dependency helps in improving the quality of data in database.
Conclusion
Functional dependency is very important concept in database management system for
ensuring the data consistency and accuracy. In this article we have discuss what is the
concept behind functional dependencies and why they are important. The valid and invalid
functional dependencies and the types of most important functional dependencies in RDBMS.
We have also discussed about the advantages of FDs.
For more details you can refer Database Normalization and Difference between Fully and
Partial Functional Dependency articles.
Types of Functional dependencies in DBMS – FAQs
What is the difference between trivial and non-trivial functional dependencies?
The Trivial dependencies occur when the dependent attribute is a subset of the determinant
attribute while non-trivial dependencies involve attributes that are not subsets of the each
other.
What is a completely-functional dependency?
A completely-functional dependency occurs when an attribute depends on a set of the
attributes and not on any proper subset of this set.
What is a partial dependency, and why is it important?
A partial dependency occurs when a non-key attribute is dependent on part of the composite
key. It is important for the normalization to eliminate such dependencies to achieve the
Second Normal Form (2NF).
How does a transitive dependency affect database normalization?
The Transitive dependencies can lead to the redundancy and anomalies in a database.
Removing them helps achieve the Third Normal Form (3NF).
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically
exists between the primary key and non-key attribute within a table.
1. X → Y
The left side of FD is known as a determinant, the right side of the production is known as a
dependent.
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
PauseNext
Mute
Current Time 0:19
/
Duration 18:10
Loaded: 6.24%
Fullscreen
Advertisement
Advertisement
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table
because if we know the Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as:
1. Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency
4.INFERENCE RULES
Inference Rule (IR):
Advertisement
o The Armstrong's axioms are the basic inference rule.
o Armstrong's axioms are used to conclude functional dependencies on a relational
database.
o The inference rule is a type of assertion. It can apply to a set of FD(functional
dependency) to derive other FD.
o Using the inference rule, we can derive additional functional dependency from the
initial set.
The Functional dependency has 6 types of inference rule:
1. Reflexive Rule (IR1)
In the reflexive rule, if Y is a subset of X, then X determines Y.
1. If X ⊇ Y then X → Y
Example:
1. X = {a, b, c, d, e}
2. Y = {a, b, c}
2. Augmentation Rule (IR2)
The augmentation is also called as a partial dependency. In augmentation, if X determines Y,
then XZ determines YZ for any Z.
Advertisement
Advertisement
1. If X → Y then XZ → YZ
Example:
1. For R(ABCD), if A → B then AC → BC
3. Transitive Rule (IR3)
In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.
1. If X → Y and Y → Z then X → Z
4. Union Rule (IR4)
Union rule says, if X determines Y and X determines Z, then X must also determine Y and Z.
1. If X → Y and X → Z then X → YZ
Proof:
1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
Advertisement
5. Decomposition Rule (IR5)
Decomposition rule is also known as project rule. It is the reverse of union rule.
This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.
Advertisement
1. If X → YZ then X → Y and X → Z
Proof:
1. X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)
6. Pseudo transitive Rule (IR6)
In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ determines W.
1. If X → Y and YZ → W then XZ → W
Proof:
1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)
Inference Rules in DBMS
Last Updated : 29 Jul, 2024
Generative Summary
Now you can generate the summary of any article of your choice.
Got it
B. Formally, B ⊆ A then A → B.
Reflexive Rule: According to this rule, if B is a subset of A then A logically determines
o Example: Let us take an example of the Address (A) of a house, which contains
so many parameters like House no, Street no, City etc. These all are the subsets
of A. Thus, address (A) → House no. (B).
Augmentation Rule: It is also known as Partial dependency. According to this rule, If
A logically determines B, then adding any extra attribute doesn’t change the basic
functional dependency.
o Example: A → B, then adding any extra attribute let say C will give AC →
BC and doesn’t make any change.
Transitive rule: Transitive rule states that if A determines B and B determines C, then
it can be said that A indirectly determines B.
o Example: If A → B and B → C then A → C.
Union Rule: Union rule states that If A determines B and C, then A determines BC.
o Example: If A → B and A → C then A → BC.
Decomposition Rule: It is perfectly reverse of the above Union rule. According to this
rule, If A determined BC then it can be decomposed as A → B and A → C.
o Example: If A → BC then A → B and A → C.
Pseudo Transitive Rule: According to this rule, If A determined B and BC determines
D then BC determines D.
o Example: If A → B and BC → D then AC → D.
Conclusion
In this article, we get to know about all the inference rules in DBMS and some basic
terminologies related to it. Along with this we also learn that what are functional
dependencies and how they are interrelated in the structured table inside the Database
Management System.
Frequently Asked Questions on Inference Rules – FAQs
How Many Inference Rules are there name them?
There are 6 inference rules. Which are defined below:
Reflexive Rule
Augmentation Rule
Transitive Rule
Union Rule
Decomposition Rule
Pseudo Transitive Rule
What are FDs?
FDs stands for Functional Dependencies. These are the set of attributes, which are logically
related to each other.
Inference rules are proposed by whom?
These rules were introduced by William W. Armstrong
Inference rules also known as what?
These rules are also known as Armstrong’s Axioms in Functional Dependency.
5.MINIMAL COVER
Minimal Cover
#database#computerscience
A minimal cover is a simplified and reduced version of the given set of functional
dependencies.
Since it is a reduced version, it is also called as Irreducible set.
It is also called as Canonical Cover.
Steps to Find Minimal Cover
1) Split the right-hand attributes of all FDs.
Example
A->XY => A->X, A->Y
2) Remove all redundant FDs.
Example
{ A->B, B->C, A->C }
Here A->C is redundant since it can already be achieved using the Transitivity Property.
3) Find the Extraneous attribute and remove it.
Example
AB->C, either A or B or none can be extraneous.
If A closure contains B then B is extraneous and it can be removed.
If B closure contains A then A is extraneous and it can be removed.
Example 1
Minimize {A->C, AC->D, E->H, E->AD}
Step 1: {A->C, AC->D, E->H, E->A, E->D}
Step 2: {A->C, AC->D, E->H, E->A}
Here Redundant FD : {E->D}
Step 3: {AC->D}
{A}+ = {A,C}
Therefore C is extraneous and is removed.
{A->D}
Minimal Cover = {A->C, A->D, E->H, E->A}
Example 2
Minimize {AB->C, D->E, AB->E, E->C}
Step 1: {AB->C, D->E, AB->E, E->C}
Step 2: {D->E, AB->E, E->C}
Here Redundant FD = {AB->C}
Step 3: {AB->E}
{A}+ = {A}
{B}+ = {B}
There is no extraneous attribute.
Therefore, Minimal cover = {D->E, AB->E, E->C}
How To Find Minimal Set?
Last Updated : 25 Jul, 2024
Generative Summary
Now you can generate the summary of any article of your choice.
Got it
If we have a set of functional dependencies, we get the simplest and irreducible form of
functional dependencies after reducing these functional dependencies. This is called
the Minimal Cover or Irreducible Set (as we can’t reduce the set further). It is also called
a Canonical Cover.
Let us understand the procedure to find the minimal cover by this example:
The Given Functional Dependencies are – A->B, B ->C, D->ABC, AC-> D
1. Steps to Find Minimal Cover
Step 1: First split all the right-hand attributes of all FDs such that RHS contains a single
attribute.
Example: D->ABC is split into D->A, D->B and D->C
A->B, B->C, D->A, D->B, D->C, AC->D
[Note: We can't split AC->D as A->D, C->D]
Step 2: Now remove all redundant FDs.
[Redundant FD is if we derive one FD from another FD ]
Let, 's test the redundance of A->B
A+ = A (A is only closure contains to A, simply we can derive A from A) So, A->B is not
redundant.
Similarly, B->C is not redundant.
But, D->B and D->C is redundant
because D+= A and A+=B, So D+=B can be derived which means D->B is redundant. So,
We remove D->B from the FDs set.
Now, check for D->C, it is not redundant because we can't D+=B and B+=C as we remove
D->B from the list.
At last, we check for AC->D. This is also not redundant.
AC+=AC
So, the final FDs are: A->B, B->C, D->A,D->C, AC->D
Step 3: Find the Extraneous attribute and remove it.
In this case, we should only check AC->D. Simply
we can say the right-hand attributes are pointed
by only one attribute at one time.
AC->D, either A or C, or none can be extraneous.
If A=+ C then C is extraneous and it can be removed.
If C+=A then A is extraneous and it can be removed.
So, the final FDs are: A->B, B->C, D->A,D->C, AC->D
Hence, we can write it as A->B, B->C, D->AC, AC->D this is the minimum cover.
2. Find the Minimal Cover
Given Functional Dependencies
A -> B
B -> C
D -> ABC
AC -> D
Step 1: Split the Functional Dependencies
A -> B
B -> C
D -> A
D -> B
D -> C
AC -> D
Step 2: Remove Redundant FDs
A -> B (not redundant)
B -> C (not redundant)
D -> A (not redundant)
D -> B (redundant, because D -> A and A -> B)
D -> C (not redundant, as D -> B was removed)
AC -> D (not redundant)
After removing redundancies FDs set became
A -> B
B -> C
D -> A
D -> C
AC -> D
Step 3: Remove Extraneous Attributes
In AC -> D, check if A or C is extraneous.
Compute closures
A+ = {A,B,C}
C+ = {C}
Since A alone does not give D, A is not extraneous
Since C alone does not give D,C is not extraneous
So Minimal cover of (A -> B, B -> C, D -> ABC, AC -> D) => (A -> B, B -> C, D -> A, AC
-> D)
By ensuring all functional dependencies are minimal and non-redundant, we obtain the
correct minimal cover.
Conclusion
To find a minimal set, first split the functional dependencies, then remove any redundant
functional dependencies, and finally remove any extraneous attributes. This process ensures
that you have the smallest possible set that still includes all necessary information without
any redundancy.
Frequently Asked Questions on Minimal Set – FAQs
Why is finding a minimal set important?
Finding a minimal set is crucial to eliminate redundancies, simplify analysis, and improve
efficiency in data handling and processing.
What are functional dependencies?
Functional dependencies describe relationships between attributes in a dataset, where one
attribute’s value depends on another.
How do you split functional dependencies?
Splitting functional dependencies involves breaking down complex dependencies into
simpler, single-attribute dependencies.
When a relation in the relational model is not appropriate normal form then the
decomposition of a relation is required. In a database, breaking down the table into multiple
tables termed as decomposition. The properties of a relational decomposition are listed
below :
1. Attribute Preservation:
Using functional dependencies the algorithms decompose the universal relation
schema R in a set of relation schemas D = { R1, R2, ….. Rn } relational database
schema, where ‘D’ is called the Decomposition of R.
The attributes in R will appear in at least one relation schema Ri in the decomposition, i.e.,
no attribute is lost. This is called the Attribute Preservation condition of decomposition.
2. Dependency Preservation:
If each functional dependency X->Y specified in F appears directly in one of the
relation schemas Ri in the decomposition D or could be inferred from the
dependencies that appear in some Ri. This is the Dependency Preservation.
If a decomposition is not dependency preserving some dependency is lost in decomposition.
To check this condition, take the JOIN of 2 or more relations in the decomposition.
For example:
R = (A, B, C)
F = {A ->B, B->C}
Key = {A}
R is not in BCNF.
Decomposition R1 = (A, B), R2 = (B, C)
R1 and R2 are in BCNF, Lossless-join decomposition, Dependency preserving.
Each Functional Dependency specified in F either appears directly in one of the relations in
the decomposition.
It is not necessary that all dependencies from the relation R appear in some relation Ri.
It is sufficient that the union of the dependencies on all the relations Ri be equivalent to the
dependencies on R.
3. Non Additive Join Property:
Another property of decomposition is that D should possess is the Non Additive Join
Property, which ensures that no spurious tuples are generated when a NATURAL
JOIN operation is applied to the relations resulting from the decomposition.
4. No redundancy:
Decomposition is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy.If the relation has no proper decomposition, then it
may lead to problems like loss of information.
5. Lossless Join:
Lossless join property is a feature of decomposition supported by normalization. It is
the ability to ensure that any instance of the original relation can be identified from
corresponding instances in the smaller relations.
For example:
R : relation, F : set of functional dependencies on R,
X, Y : decomposition of R,
A decomposition {R1, R2, …, Rn} of a relation R is called a lossless decomposition for R if
the natural join of R1, R2, …, Rn produces exactly the relation R.
A decomposition is lossless if we can recover:
R(A, B, C) -> Decompose -> R1(A, B) R2(A, C) -> Recover -> R’(A, B, C)
Thus, R’ = R
Decomposition is lossless if:
X intersection Y -> X, that is: all attributes common to both X and Y functionally determine
ALL the attributes in X.
X intersection Y -> Y, that is: all attributes common to both X and Y functionally determine
ALL the attributes in Y
If X intersection Y forms a superkey of either X or Y, the decomposition of R is a lossless
decomposition.
Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of
information.
o Decomposition is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy.
Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same
relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
Advertisement
Advertisement
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then the
resultant relation will look like:
Employee ⋈ Department
7.NORMALIZATION(UPTO BCNF)
Normal Forms in DBMS
Last Updated : 23 Jul, 2024
Generative Summary
Now you can generate the summary of any article of your choice.
Got it
Example
Example 2 –
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
In the above table Course is a multi-valued attribute so it is not in 1NF. Below Table is
in 1NF as there is no multi-valued attribute
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
Second Normal Form
A relation is in 2NF if it is in 1NF and any non-prime attribute (attributes which are not part
of any candidate key) is not partially dependent on any proper subset of any candidate key of
the table. In other words, we can say that, every non-prime attribute must be fully dependent
on each candidate key.
A functional dependency X->Y (where X and Y are set of attributes) is said to be in partial
dependency, if Y can be determined by any proper subset of X.
However, in 2NF it is possible for a prime attribute to be partially dependent on any
candidate key, but every non-prime attribute must be fully dependent(or not partially
dependent) on each candidate key of the table.
Example 1 – Consider table-3 as following below.
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
{Note that, there are many courses having the same course fee} Here, COURSE_FEE
cannot alone decide the value of COURSE_NO or STUD_NO; COURSE_FEE
together with STUD_NO cannot decide the value of COURSE_NO; COURSE_FEE
together with COURSE_NO cannot decide the value of STUD_NO; Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the one only
candidate key {STUD_NO, COURSE_NO} ; But, COURSE_NO -> COURSE_FEE,
i.e., COURSE_FEE is dependent on COURSE_NO, which is a proper subset of the
candidate key. Non-prime attribute COURSE_FEE is dependent on a proper subset of
the candidate key, which is a partial dependency and so this relation is not in 2NF. To
convert the above relation to 2NF, we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
NOTE: 2NF tries to reduce the redundant data getting stored in memory. For
instance, if there are 100 students taking C1 course, we don’t need to store its Fee as
1000 for all the 100 records, instead, once we can store it in the second table as the
course fee for C1 is 1000.
Example 2 – Consider following functional dependencies in relation R (A, B , C,
D)
AB -> C [A and B together determine C]
BC -> D [B and C together determine D]
In the above relation, AB is the only candidate key and there is no partial dependency, i.e.,
any proper subset of AB doesn’t determine any non-prime attribute.
X is a super key.
Y is a prime attribute (each element of Y is part of some candidate key).
Example 1: In relation STUDENT given in Table 4, FD set: {STUD_NO -> STUD_NAME,
STUD_NO -> STUD_STATE, STUD_STATE -> STUD_COUNTRY, STUD_NO ->
STUD_AGE}
Candidate Key: {STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true.
So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the third normal
form.
To convert it in third normal form, we will decompose the relation STUDENT (STUD_NO,
STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)
Consider relation R(A, B, C, D, E) A -> BC, CD -> E, B -> D, E -> A All possible candidate
keys in above relation are {A, E, CD, BC} All attributes are on right sides of all functional
dependencies are prime.
Example 2: Find the highest normal form of a relation R(A,B,C,D,E) with FD set as {BC-
>D, AC->BE, B->E}
Step 1: As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can determine all
attribute of relation, So AC will be candidate key. A or C can’t be derived from any other
attribute of the relation, so there will be only 1 candidate key {AC}.
Step 2: Prime attributes are those attributes that are part of candidate key {A, C} in this
example and others will be non-prime {B, D, E} in this example.
Step 3: The relation R is in 1st normal form as a relational DBMS does not allow multi-
valued or composite attribute. The relation is in 2nd normal form because BC->D is in 2nd
normal form (BC is not a proper subset of candidate key AC) and AC->BE is in 2nd normal
form (AC is candidate key) and B->E is in 2nd normal form (B is not a proper subset of
candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is
a prime attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to
satisfy 3rd normal for, either LHS of an FD should be super key or RHS should be prime
attribute. So the highest normal form of relation will be 2nd Normal form.
For example consider relation R(A, B, C) A -> BC, B -> A and B both are super keys so
above relation is in BCNF.
Third Normal Form
A relation is said to be in third normal form, if we did not have any transitive dependency for
non-prime attributes. The basic condition with the Third Normal Form is that, the relation
must be in Second Normal Form.
Below mentioned is the basic condition that must be hold in the non-trivial functional
dependency X -> Y:
X is a Super Key.
Y is a Prime Attribute ( this means that element of Y is some part of Candidate Key).
For more, refer to Third Normal Form in DBMS.
BCNF
BCNF (Boyce-Codd Normal Form) is just a advanced version of Third Normal Form. Here
we have some additional rules than Third Normal Form. The basic condition for any relation
to be in BCNF is that it must be in Third Normal Form.
We have to focus on some basic rules that are for BCNF:
1. Table must be in Third Normal Form.
2. In relation X->Y, X must be a superkey in a relation.
For more, refer to BCNF in DBMS.
Fourth Normal Form
Fourth Normal Form contains no non-trivial multivalued dependency except candidate key.
The basic condition with Fourth Normal Form is that the relation must be in BCNF.
The basic rules are mentioned below.
1. It must be in BCNF.
2. It does not have any multi-valued dependency.
For more, refer to Fourth Normal Form in DBMS.
Fifth Normal Form
Fifth Normal Form is also called as Projected Normal Form. The basic conditions of Fifth
Normal Form is mentioned below.
Relation must be in Fourth Normal Form.
The relation must not be further non loss decomposed.
For more, refer to Fifth Normal Form in DBMS.
Applications of Normal Forms in DBMS
Data consistency: Normal forms ensure that data is consistent and does not contain
any redundant information. This helps to prevent inconsistencies and errors in the
database.
Data redundancy: Normal forms minimize data redundancy by organizing data into
tables that contain only unique data. This reduces the amount of storage space
required for the database and makes it easier to manage.
Response time: Normal forms can improve query performance by reducing the
number of joins required to retrieve data. This helps to speed up query processing and
improve overall system performance.
Database maintenance: Normal forms make it easier to maintain the database by
reducing the amount of redundant data that needs to be updated, deleted, or modified.
This helps to improve database management and reduce the risk of errors or
inconsistencies.
Database design: Normal forms provide guidelines for designing databases that are
efficient, flexible, and scalable. This helps to ensure that the database can be easily
modified, updated, or expanded as needed.
Some Important Points about Normal Forms
BCNF is free from redundancy caused by Functional Dependencies.
If a relation is in BCNF, then 3NF is also satisfied.
If all attributes of relation are prime attribute, then the relation is always in 3NF.
A relation in a Relational Database is always and at least in 1NF form.
Every Binary Relation ( a Relation with only 2 attributes ) is always in BCNF.
If a Relation has only singleton candidate keys( i.e. every candidate key consists of
only 1 attribute), then the Relation is always in 2NF( because no Partial functional
dependency possible).
Sometimes going for BCNF form may not preserve functional dependency. In that case
go for BCNF only if the lost FD(s) is not required, else normalize till 3NF only.
There are many more Normal forms that exist after BCNF, like 4NF and more. But in
real world database systems it’s generally not required to go beyond BCNF.
Conclusion
In Conclusion, relational databases can be arranged according to a set of rules called
normal forms in database administration (1NF, 2NF, 3NF, BCNF, 4NF, and 5NF), which
reduce data redundancy and preserve data integrity. By resolving various kinds of data
anomalies and dependencies, each subsequent normal form expands upon the one that came
before it. The particular requirements and properties of the data being stored determine
which normal form should be used; higher normal forms offer stricter data integrity but may
also result in more complicated database structures.
Previous Year Question Links
GATE CS 2012, Question 2
GATE CS 2013, Question 54
GATE CS 2013, Question 55
GATE CS 2005, Question 29
GATE CS 2002, Question 23
GATE CS 2002, Question 50
GATE CS 2001, Question 48
GATE CS 1999, Question 32
GATE IT 2005, Question 22
GATE IT 2008, Question 60
GATE CS 2016 (Set 1), Question 31
Normal Forms in DBMS – FAQs
Why is Normalization Important in DBMS?
Normalization helps in preventing database from anomalies, that ultimately ensures the
consistency of database and helps in easy maintenance of database.
Is it possible to over-normalize the database?
Yes, excessive normalization will go to complex queries and also reduces performance. It
strikes balance between normalization and practicality.
Is it necessary to normalize a database to Highest Normal Form like (BCNF or 4NF)?
There is no certain necessary condition for any database normalization. Many times, lower
form can be sufficient for specific performance and simplicity.
Is it possible for a relation in 2NF to have partial dependency?
Yes, It is possible for a prime attribute to be partially dependent(or not fully dependent) on a
candidate key. For example, consider a R(A, B, C, D, E, F) relation(in 2NF), with the
following functional dependency,
ABC -> D
ABC -> E
AB -> D
DE -> ABC
DE -> F
E -> C
Here, ABC and DE are candidate keys and we can see that here, prime attribute D is
partially dependent on ABC in functional dependency, ABC -> D (because of AB -> D).
Also, prime attribute C is partially dependent on DE (because of E -> C).
In conclusion, the above given relation R is in 2NF despite having partial dependencies.
Normalization
A large database defined as a single relation may result in data duplication. This repetition
of data may result in:
Advertisement
o Making relations very large.
o It isn't easy to maintain and update data as it would involve searching many records in
relation.
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.
So to handle these problems, we should analyze and decompose the relations with redundant
data into smaller, simpler, and well-structured relations that are satisfy desirable properties.
Normalization is a process of decomposing the relations into relations with fewer attributes.
What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It
is also used to eliminate undesirable characteristics like Insertion, Update, and
Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.
Why do we need Normalization?
The main reason for normalizing the relations is removing these anomalies. Failure to
eliminate anomalies leads to data redundancy and can cause data integrity and other
problems as the database grows. Normalization consists of a series of guidelines that helps
to guide you in creating a good database structure.
Advertisement
Data modification anomalies can be categorized into three types:
o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple
into a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of
data results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.
Types of Normal Forms:
Normalization works through a series of stages called Normal forms. The normal forms
apply to individual relations. The relation is said to be in particular normal form if it
satisfies constraints.
Following are the various types of Normal forms:
Advantages of Normalization
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.
Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms,
i.e., 4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious
problems.
First Normal Form (1NF)
o A relation will be 1NF if it contains an atomic value.
o It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute
EMP_PHONE.
EMPLOYEE table:
7272826385,
14 John UP
9064738238
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
EMPLOYEE_ZIP table:
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-
valued dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
Advertisement
Advertisement
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case, combination of all these fields required to
identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who
will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen