0% found this document useful (0 votes)
70 views78 pages

Normalization Databases

Database normalization is a crucial process in database design aimed at reducing data redundancy and improving data integrity by organizing data into structured tables. It involves various normal forms, each with specific rules to eliminate anomalies such as insertion, deletion, and update issues. While normalization enhances database efficiency and consistency, it can also lead to increased complexity and performance overhead if over-applied.

Uploaded by

patel.keni1993
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views78 pages

Normalization Databases

Database normalization is a crucial process in database design aimed at reducing data redundancy and improving data integrity by organizing data into structured tables. It involves various normal forms, each with specific rules to eliminate anomalies such as insertion, deletion, and update issues. While normalization enhances database efficiency and consistency, it can also lead to increased complexity and performance overhead if over-applied.

Uploaded by

patel.keni1993
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Database Design (integrity constraints, normal forms)

Introduction to Database Normalization


Introduction of Database Normalization
Normalization is an important process in database design that helps improve the
database’s efficiency, consistency, and accuracy. It makes it easier to manage and
maintain the data and ensures that the database is adaptable to changing
business needs.
 Database normalization is the process of organizing the attributes of
the database to reduce or eliminate data redundancy (having the same
data but at different places).
 Data redundancy unnecessarily increases the size of the database as
the same data is repeated in many places. Inconsistency problems also
arise during insert, delete, and update operations.
 In the relational model, there exist standard methods to quantify how
efficient a databases is. These methods are called normal forms and
there are algorithms to covert a given database into normal forms.
 Normalization generally involves splitting a table into multiple ones
which must be linked each time a query is made requiring data from
the split tables.
Why do we need Normalization?
The primary objective for normalizing the relations is to eliminate the below
anomalies. Failure to reduce anomalies results in data redundancy, which may
threaten data integrity and cause additional issues as the database increases.
Normalization consists of a set of procedures that assist you in developing an
effective database structure.
 Insertion Anomalies: Insertion anomalies occur when it is not possible
to insert data into a database because the required fields are missing
or because the data is incomplete. For example, if a database requires
that every record has a primary key, but no value is provided for a
particular record, it cannot be inserted into the database.
 Deletion anomalies: Deletion anomalies occur when deleting a record
from a database and can result in the unintentional loss of data. For
example, if a database contains information about customers and
orders, deleting a customer record may also delete all the orders
associated with that customer.
 Updation anomalies: Updation anomalies occur when modifying data
in a database and can result in inconsistencies or errors. For example,
if a database contains information about employees and their salaries,
updating an employee’s salary in one record but not in all related
records could lead to incorrect calculations and reporting.
Read more about Anomalies in Relational Model.

Before Normalization: The table is prone to redundancy and anomalies


(insertion, update, and deletion).
After Normalization: The data is divided into logical tables to ensure
consistency, avoid redundancy and remove anomalies making the database
efficient and reliable.
Prerequisites for Understanding Database Normalization
In database normalization, we mainly put only tightly related information
together. To find the closeness, we need to find which attributes are dependent
on each other. To understand dependencies, we need to learn the below
concepts.
Keys are like unique identifiers in a table. For example, in a table of students,
the student ID is a key because it uniquely identifies each student. Without
keys, it would be hard to tell one record apart from another, especially if some
information (like names) is the same. Keys ensure that data is not duplicated
and that every record can be uniquely accessed.
Functional dependency helps define the relationships between data in a table.
For example, if you know a student’s ID, you can find their name, age, and class.
This relationship shows how one piece of data (like the student ID) determines
other pieces of data in the same table. Functional dependency helps us
understand these rules and connections, which are crucial for organizing data
properly.
Once we figure out dependencies, we split tables to make sure that only
closely related data is together in a table. When we split tables, we need to
ensure that we do not loose information. For this, we need to learn the below
concepts.
Dependency Preserving Decomposition
Lossless Decomposition in DBMS
Features of Database Normalization
 Elimination of Data Redundancy: One of the main features of
normalization is to eliminate the data redundancy that can occur in a
database. Data redundancy refers to the repetition of data in different
parts of the database. Normalization helps in reducing or eliminating
this redundancy, which can improve the efficiency and consistency of
the database.
 Ensuring Data Consistency: Normalization helps in ensuring that the
data in the database is consistent and accurate. By eliminating
redundancy, normalization helps in preventing inconsistencies and
contradictions that can arise due to different versions of the same data.
 Simplification of Data Management: Normalization simplifies the
process of managing data in a database. By breaking down a complex
data structure into simpler tables, normalization makes it easier to
manage the data, update it, and retrieve it.
 Improved Database Design: Normalization helps in improving the
overall design of the database. By organizing the data in a structured
and systematic way, normalization makes it easier to design and
maintain the database. It also makes the database more flexible and
adaptable to changing business needs.
 Avoiding Update Anomalies: Normalization helps in avoiding update
anomalies, which can occur when updating a single record in a table
affects multiple records in other tables. Normalization ensures that
each table contains only one type of data and that the relationships
between the tables are clearly defined, which helps in avoiding such
anomalies.
 Standardization: Normalization helps in standardizing the data in the
database. By organizing the data into tables and defining relationships
between them, normalization helps in ensuring that the data is stored
in a consistent and uniform manner.
Normal Forms in DBMS
Normal
Forms Description of Normal Forms

First Normal A relation is in first normal form if every attribute in that relation is
Form (1NF) single-valued attribute.

Second A relation that is in First Normal Form and every non-primary-key


Normal attribute is fully functionally dependent on the primary key, then
Form (2NF) the relation is in Second Normal Form (2NF).

A relation is in the third normal form, if there is no transitive


dependency for non-prime attributes as well as it is in the second
normal form. A relation is in 3NF if at least one of the following
conditions holds in every non-trivial function dependency X –> Y.
Third  X is a super key.
Normal  Y is a prime attribute (each element of Y is part of some
Form (3NF) candidate key).

Boyce-Codd For BCNF the relation should satisfy the below conditions
Normal  The relation should be in the 3rd Normal Form.
Form  X should be a super-key for every functional dependency
(BCNF) (FD) X−>Y in a given relation.
Normal
Forms Description of Normal Forms

A relation R is in 4NF if and only if the following conditions are


Fourth satisfied:
Normal  It should be in the Boyce-Codd Normal Form (BCNF).
Form (4NF)  The table should not have any Multi-valued Dependency.

A relation R is in 5NF if and only if it satisfies the following


conditions:
 R should be already in 4NF.
Fifth Normal  It cannot be further non loss decomposed (join
Form (5NF) dependency).

Read more about Normal Forms in DBMS.


Advantages of Normalization
 Normalization eliminates data redundancy and ensures that each piece
of data is stored in only one place, reducing the risk of data
inconsistency and making it easier to maintain data accuracy.
 By breaking down data into smaller, more specific tables,
normalization helps ensure that each table stores only relevant data,
which improves the overall data integrity of the database.
 Normalization simplifies the process of updating data, as it only needs
to be changed in one place rather than in multiple places throughout
the database.
 Normalization enables users to query the database using a variety of
different criteria, as the data is organized into smaller, more specific
tables that can be joined together as needed.
 Normalization can help ensure that data is consistent across different
applications that use the same database, making it easier to integrate
different applications and ensuring that all users have access to
accurate and consistent data.
Disadvantages of Normalization
 Normalization can result in increased performance overhead due to the
need for additional join operations and the potential for slower query
execution times.
 Normalization can result in the loss of data context, as data may be
split across multiple tables and require additional joins to retrieve.
 Proper implementation of normalization requires expert knowledge of
database design and the normalization process.
 Normalization can increase the complexity of a database design,
especially if the data model is not well understood or if the
normalization process is not carried out correctly.
Conclusion
Database normalization is a key concept in organizing data efficiently within a
database. By reducing redundancy, ensuring data consistency, and breaking
data into well-structured tables, normalization enhances the accuracy,
scalability, and maintainability of a database. It simplifies data updates,
improves integrity, and supports flexible querying, making it an essential
practice for designing reliable and efficient database systems.

Normal Forms in Database Normalization

Normal Forms in DBMS


In the world of database management, Normal Forms are important for ensuring
that data is structured logically, reducing redundancy, and maintaining data
integrity. When working with databases, especially relational databases, it is
critical to follow normalization techniques that help to eliminate unnecessary
duplication, improve performance, and minimize the risk of anomalies.
In this article, we will explain normalization in DBMS, explain all the normal
forms, and explore the benefits of using them in real-world applications.
Whether you are a beginner or an experienced database professional,
understanding normal forms is fundamental to building efficient, scalabe, and
reliable databases.
What is Normalization in DBMS?
Normalization is a systematic approach to organize data within a database to
reduce redundancy and eliminate undesirable characteristics such
as insertion, update, and deletion anomalies. The process involves breaking
down large tables into smaller, well-structured ones and defining relationships
between them. This not only reduces the chances of storing duplicate data but
also improves the overall efficiency of the database.
Normal Forms

Why is Normalization Important?


 Reduces Data Redundancy: Duplicate data is stored efficiently, saving
disk space and reducing inconsistency.
 Improves Data Integrity: Ensures the accuracy and consistency of data
by organizing it in a structured manner.
 Simplifies Database Design: By following a clear structure, database
designs become easier to maintain and update.
 Optimizes Performance: Reduces the chance of anomalies and
increases the efficiency of database operations.
What are Normal Forms in DBMS?
Normalization is a technique used in database design to reduce redundancy and
improve data integrity by organizing data into tables and ensuring proper
relationships. Normal Forms are different stages of normalization, and each
stage imposes certain rules to improve the structure and performance of a
database. Let’s break down the various normal forms step-by-step to
understand the conditions that need to be satisfied at each level:
1. First Normal Form (1NF): Eliminating Duplicate Records
A table is in 1NF if it satisfies the following conditions:
 All columns contain atomic values (i.e., indivisible values).
 Each row is unique (i.e., no duplicate rows).
 Each column has a unique name.
 The order in which data is stored does not matter.
Example of 1NF Violation: If a table has a column “Phone Numbers” that stores
multiple phone numbers in a single cell, it violates 1NF. To bring it into 1NF, you
need to separate phone numbers into individual rows.
2. Second Normal Form (2NF): Eliminating Partial Dependency
A relation is in 2NF if it satisfies the conditions of 1NF and additionally. No
partial dependency exists, meaning every non-prime attribute (non-key
attribute) must depend on the entire primary key, not just a part of it.
Example: For a composite key (StudentID, CourseID), if
the StudentName depends only on StudentID and not on the entire key, it
violates 2NF. To normalize, move StudentName into a separate table where it
depends only on StudentID.
3. Third Normal Form (3NF): Eliminating Transitive Dependency
A relation is in 3NF if it satisfies 2NF and additionally, there are no transitive
dependencies. In simpler terms, non-prime attributes should not depend on
other non-prime attributes.
Example: Consider a table with (StudentID, CourseID, Instructor).
If Instructor depends on CourseID, and CourseID depends on StudentID,
then Instructor indirectly depends on StudentID, which violates 3NF. To resolve
this, place Instructor in a separate table linked by CourseID.
4. Boyce-Codd Normal Form (BCNF): The Strongest Form of 3NF
BCNF is a stricter version of 3NF where for every non-trivial functional
dependency (X → Y), X must be a superkey (a unique identifier for a record in the
table).
Example: If a table has a dependency (StudentID, CourseID) → Instructor, but
neither StudentID nor CourseID is a superkey, then it violates BCNF. To bring it
into BCNF, decompose the table so that each determinant is a candidate key.
5. Fourth Normal Form (4NF): Removing Multi-Valued Dependencies
A table is in 4NF if it is in BCNF and has no multi-valued dependencies.
A multi-valued dependency occurs when one attribute determines another, and
both attributes are independent of all other attributes in the table.
Example: Consider a table where (StudentID, Language, Hobby) are attributes.
If a student can have multiple hobbies and languages, a multi-valued
dependency exists. To resolve this, split the table into separate tables
for Languages and Hobbies.
6. Fifth Normal Form (5NF): Eliminating Join Dependency
5NF is achieved when a table is in 4NF and all join dependencies are removed.
This form ensures that every table is fully decomposed into smaller tables that
are logically connected without losing information.
Example: If a table contains (StudentID, Course, Instructor) and there is a
dependency where all combinations of these columns are needed for a specific
relationship, you would split them into smaller tables to remove redundancy.
Advantages of Normal Form
1. Reduced data redundancy: Normalization helps to eliminate duplicate data in
tables, reducing the amount of storage space needed and improving database
efficiency.
2. Improved data consistency: Normalization ensures that data is stored in a
consistent and organized manner, reducing the risk of data inconsistencies and
errors.
3. Simplified database design: Normalization provides guidelines for organizing
tables and data relationships, making it easier to design and maintain a database.
4. Improved query performance: Normalized tables are typically easier to search
and retrieve data from, resulting in faster query performance.
5. Easier database maintenance: Normalization reduces the complexity of a
database by breaking it down into smaller, more manageable tables, making it
easier to add, modify, and delete data.
Common Challenges of Over-Normalization
While normalization is a powerful tool for optimizing databases, it’s important
not to over-normalize your data. Excessive normalization can lead to:
 Complex Queries: Too many tables may result in multiple joins,
making queries slow and difficult to manage.
 Performance Overhead: Additional processing required for joins in
overly normalized databases may hurt performance, especially in
large-scale systems.
In many cases, denormalization (combining tables to reduce the need for
complex joins) is used for performance optimization in specific applications, such
as reporting systems.
When to Use Normalization and Denormalization
 Normalization is best suited for transactional systems where data
integrity is paramount, such as banking systems and enterprise
applications.
 Denormalization is ideal for read-heavy applications like data
warehousing and reporting systems where performance and query
speed are more critical than data integrity.
Applications of Normal Forms in DBMS
 Ensures Data Consistency:Prevents data anomalies by ensuring each
piece of data is stored in one place, reducing inconsistencies.
 Reduces Data Redundancy: Minimizes repetitive data, saving storage
space and avoiding errors in data updates or deletions.
 Improves Query Performance: Simplifies queries by breaking large
tables into smaller, more manageable ones, leading to faster data
retrieval.
 Enhances Data Integrity: Ensures that data is accurate and reliable by
adhering to defined relationships and constraints between tables.
 Easier Database Maintenance: Simplifies updates, deletions, and
modifications by ensuring that changes only need to be made in one
place, reducing the risk of errors.
 Facilitates Scalability: Makes it easier to modify, expand, or scale the
database structure as business requirements grow.
 Supports Better Data Modeling: Helps in designing databases that
are logically structured, with clear relationships between tables,
making it easier to understand and manage.
 Reduces Update Anomalies: Prevents issues like insertion, deletion, or
modification anomalies that can arise from redundant data.
 Improves Data Integrity and Security: By reducing unnecessary data
duplication, normal forms help ensure sensitive information is securely
and correctly maintained.
 Optimizes Storage Efficiency: By organizing data into smaller tables,
storage is used more efficiently, reducing the overhead for large
databases
Conclusion
In Conclusion, relational databases can be arranged according to a set of rules
called normal forms in database administration (1NF, 2NF, 3NF, BCNF, 4NF, and
5NF), which reduce data redundancy and preserve data integrity. By resolving
various kinds of data anomalies and dependencies, each subsequent normal
form expands upon the one that came before it. The particular requirements and
properties of the data being stored determine which normal form should be
used; higher normal forms offer stricter data integrity but may also result in
more complicated database structures.

Functional Dependency and Attribute Closure

Functional Dependency and Attribute Closure


Functional dependency and attribute closure are essential for maintaining data
integrity and building effective, organized, and normalized databases.
Functional Dependency
A functional dependency A->B in a relation holds if two tuples having the same
value of attribute A must have the same value for attribute B. For Example, in
relation to STUDENT shown in Table 1, Functional Dependencies
STUD_NO -> STUD_NAME and
STUD_NO -> STUD_PHONE hold
Note : A STUD_NO uniquely identifies a STUD_NAME and STUD_PHONE
but
STUD_NAME->STUD_STATE does not hold
Note : Two students can have same name (Like RAM in the below table) and
hence same state

Student Table

How to find Functional Dependencies for a Relation?


Functional Dependencies in a relation are dependent on the domain of the
relation. Consider the STUDENT relation given in Table 1.
 We know that STUD_NO is unique for each student. So STUD_NO-
>STUD_NAME, STUD_NO->STUD_PHONE, STUD_NO-
>STUD_STATE, STUD_NO->STUD_COUNTRY and STUD_NO ->
STUD_AGE all will be true.
 Similarly, STUD_STATE->STUD_COUNTRY will be true as if two
records have same STUD_STATE, they will have same
STUD_COUNTRY as well.
 For relation STUDENT_COURSE, COURSE_NO->COURSE_NAME will
be true as two records with same COURSE_NO will have same
COURSE_NAME.
Important Points About Functional Dependencies
 Ensure data consistency and integrity across the database.
 Simplify data operations like addition, editing, and deletion.
 Identifying dependencies can be complex for large databases.
 Overly restrictive dependencies may slow queries or cause
inconsistencies.
Functional Dependency Set
Functional Dependency set or FD set of a relation is the set of all FDs present in
the relation. For Example, FD set for relation STUDENT shown in table 1 is:

{ STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE, STUD_NO->STUD_STATE, STUD_NO-


>STUD_COUNTRY,
STUD_NO -> STUD_AGE, STUD_STATE->STUD_COUNTRY }
Attribute Closure
Attribute closure of an attribute set can be defined as set of attributes which
can be functionally determined from it.
How to find attribute closure of an attribute set?
To find attribute closure of an attribute set:
 Add elements of attribute set to the result set.
 Recursively add elements to the result set which can be functionally
determined from the elements of the result set.
Using FD set of table 1, attribute closure can be determined as:
(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY,
STUD_AGE}
(STUD_STATE)+ = {STUD_STATE, STUD_COUNTRY}
Important Points About Attribute Closure
 Helps to identify all possible attributes that can be derived from a set
of given attributes.
 Helps in database design by showing how attributes and tables are
related, which can improve query performance.
 Can be computationally expensive, especially for large datasets.
 Become complex to manage as the number of attributes and tables
increases.
How to Find Candidate Keys and Super Keys Using
Attribute Closure?
 If attribute closure of an attribute set contains all attributes of relation,
the attribute set will be super key of the relation.
 If no subset of this attribute set can functionally determine all
attributes of the relation, the set will be candidate key as well. For
Example, using FD set of table 1
(STUD_NO, STUD_NAME)+ = {STUD_NO, STUD_NAME, STUD_PHONE,
STUD_STATE, STUD_COUNTRY, STUD_AGE}
(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY, STUD_AGE}
(STUD_NO, STUD_NAME) will be super key but not candidate key because its
subset (STUD_NO)+ is equal to all attributes of the relation. So, STUD_NO will
be a candidate key.
Prime and Non-Prime Attributes
Attributes which are parts of any candidate key of relation are called as prime
attribute, others are non-prime attributes. For Example, STUD_NO in STUDENT
relation is prime attribute, others are non-prime attribute.
Conclusion
Tools like functional dependency and attribute closure are helpful when
designing and optimizing databases. They are useful for:
 Determine the connections between the tables and the attributes.
 Boost query efficiency
 Ascertain data coherence.

GATE Questions
Q.1: Consider the relation scheme R = {E, F, G, H, I, J, K, L, M, N} and
the set of functional dependencies {{E, F} -> {G}, {F} -> {I, J}, {E, H} ->
{K, L}, K -> {M}, L -> {N} on R. What is the key for R? (GATE-CS-
2014)
A. {E, F}
B. {E, F, H}
C. {E, F, H, K, L}
D. {E}
Solution:
Finding attribute closure of all given options, we get:
{E,F}+ = {EFGIJ}
{E,F,H}+ = {EFHGIJKLMN}
{E,F,H,K,L}+ = {{EFHGIJKLMN}
{E}+ = {E}
{EFH}+ and {EFHKL}+ results in set of all attributes, but EFH is minimal. So it
will be candidate key. So correct option is (B).

Q.2: How to check whether an FD can be derived from a given FD


set?
Solution:
To check whether an FD A->B can be derived from an FD set F,

1. Find (A)+ using FD set F.


2. If B is subset of (A)+, then A->B is true else not true.

Q.3: In a schema with attributes A, B, C, D and E following set of


functional dependencies are given
{A -> B, A -> C, CD -> E, B -> D, E -> A}
Which of the following functional dependencies is NOT implied by
the above set? (GATE IT 2005)
A. CD -> AC
B. BD -> CD
C. BC -> CD
D. AC -> BC
Solution:
Using FD set given in question,
(CD)+ = {CDEAB} which means CD -> AC also holds true.
(BD)+ = {BD} which means BD -> CD can’t hold true. So this FD is no implied in
FD set. So (B) is the required option.
Others can be checked in the same way.

Q.4: Consider a relation scheme R = (A, B, C, D, E, H) on which the


following functional dependencies hold: {A–>B, BC–> D, E–>C, D–
>A}. What are the candidate keys of R? [GATE 2005]
(a) AE, BE
(b) AE, BE, DE
(c) AEH, BEH, BCH
(d) AEH, BEH, DEH
Solution:
(AE)+ = {ABECD} which is not set of all attributes. So AE is not a candidate key.
Hence option A and B are wrong.
(AEH)+ = {ABCDEH}
(BEH)+ = {BEHCDA}
(BCH)+ = {BCHDA} which is not set of all attributes. So BCH is not a candidate
key. Hence option C is wrong.
So correct answer is D.

Types of Functional Dependency

Types of Functional dependencies in DBMS


In relational database management, functional dependency is a concept that
specifies the relationship between two sets of attributes where one attribute
determines the value of another attribute. It is denoted as X → Y, where the
attribute set on the left side of the arrow, X is called Determinant, and Y is called
the Dependent.
What is Functional Dependency?
A functional dependency occurs when one attribute uniquely determines another
attribute within a relation. It is a constraint that describes how attributes in a
table relate to each other. If attribute A functionally determines attribute B we
write this as the A→B.
Functional dependencies are used to mathematically express relations among
database entities and are very important to understanding advanced concepts in
Relational Database Systems.
Example:
roll_no name dept_name dept_building

42 abc CO A4

43 pqr IT A3

44 xyz CO A4

45 xyz IT A3

46 mno EC B2

47 jkl ME B2

From the above table we can conclude some valid functional dependencies:
 roll_no → { name, dept_name, dept_building }→ Here, roll_no can
determine values of fields name, dept_name and dept_building, hence
a valid Functional dependency
 roll_no → dept_name , Since, roll_no can determine whole set of
{name, dept_name, dept_building}, it can determine its subset
dept_name also.
 dept_name → dept_building , Dept_name can identify the
dept_building accurately, since departments with different dept_name
will also have a different dept_building
 More valid functional dependencies: roll_no → name, {roll_no, name}
⇢ {dept_name, dept_building}, etc.
Here are some invalid functional dependencies:
 name → dept_name Students with the same name can have different
dept_name, hence this is not a valid functional dependency.
 dept_building → dept_name There can be multiple departments in
the same building. Example, in the above table departments ME and
EC are in the same building B2, hence dept_building → dept_name is
an invalid functional dependency.
 More invalid functional dependencies: name → roll_no, {name,
dept_name} → roll_no, dept_building → roll_no, etc.
Read more about What is Functional Dependency in DBMS ?
Types of Functional Dependencies in DBMS
1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
1. Trivial Functional Dependency
In Trivial Functional Dependency, a dependent is always a subset of the
determinant. i.e. If X → Y and Y is the subset of X, then it is called trivial functional
dependency.
Symbolically: A→B is trivial functional dependency if B is a subset of A.
The following dependencies are also trivial: A→A & B→B
Example 1 :
 ABC -> AB
 ABC -> A
 ABC -> ABC
Example 2:
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, {roll_no, name} → name is a trivial functional dependency, since the


dependent name is a subset of determinant set {roll_no, name}. Similarly, roll_no
→ roll_no is also an example of trivial functional dependency.
2. Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of
the determinant. i.e. If X → Y and Y is not a subset of X, then it is called Non-
trivial functional dependency.
Example 1 :
 Id -> Name
 Name -> DOB
Example 2:
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, roll_no → name is a non-trivial functional dependency, since the dependent


name is not a subset of determinant roll_no. Similarly, {roll_no, name} → age is
also a non-trivial functional dependency, since age is not a subset of {roll_no,
name}
3. Semi Non Trivial Functional Dependencies
A semi non-trivial functional dependency occurs when part of the dependent
attribute (right-hand side) is included in the determinant (left-hand side), but
not all of it. This is a middle ground between trivial and non-trivial functional
dependencies. X -> Y is called semi non-trivial when X intersect Y is not NULL.
Example:
Consider the following table:
Student_ID Course_ID Course_Name

101 CSE101 Computer Science

102 CSE102 Data Structures

103 CSE101 Computer Science

Functional Dependency:
{StudentID,CourseID}→CourseID
This is semi non-trivial because:
 Part of the dependent attribute (Course_ID) is already included in the
determinant ({Student_ID, Course_ID}).
 However, the dependency is not completely trivial because
{StudentID}→CourseID is not implied directly.
4. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent set are not
dependent on each other. i.e. If a → {b, c} and there exists no functional
dependency between b and c, then it is called a multivalued functional
dependency.
Example:
bike_model manuf_year color

tu1001 2007 Black

tu1001 2007 Red

tu2012 2008 Black

tu2012 2008 Red

tu2222 2009 Black

tu2222 2009 Red

In this table:
 X: bike_model
 Y: color
 Z: manuf_year
For each bike model (bike_model):
1. There is a group of colors (color) and a group of manufacturing years
(manuf_year).
2. The colors do not depend on the manufacturing year, and the
manufacturing year does not depend on the colors. They are
independent.
3. The sets of color and manuf_year are linked only to bike_model.
That’s what makes it a multivalued dependency.
In this case these two columns are said to be multivalued dependent on
bike_model. These dependencies can be represented like this:
Read more about Multivalued Dependency in DBMS.
5. Transitive Functional Dependency
In transitive functional dependency, dependent is indirectly dependent on
determinant. i.e. If a → b & b → c, then according to axiom of transitivity, a → c.
This is a transitive functional dependency.
Example:
enrol_no name dept building_no

42 abc CO 4
enrol_no name dept building_no

43 pqr EC 2

44 xyz IT 1

45 abc EC 2

Here, enrol_no → dept and dept → building_no. Hence, according to the axiom
of transitivity, enrol_no → building_no is a valid functional dependency. This is
an indirect functional dependency, hence called Transitive functional
dependency.
6. Fully Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely
determines another attribute or set of attributes. If a relation R has attributes X,
Y, Z with the dependencies X->Y and X->Z which states that those dependencies
are fully functional.
Read more about Fully Functional Dependency.
7. Partial Functional Dependency
In partial functional dependency a non key attribute depends on a part of the
composite key, rather than the whole key. If a relation R has attributes X, Y, Z
where X and Y are the composite key and Z is non key attribute. Then X->Z is a
partial functional dependency in RBDMS.
Read more about Partial Dependency.
Conclusion
Functional dependency is very important concept in database management
system for ensuring the data consistency and accuracy. In this article we have
discuss what is the concept behind functional dependencies and why they are
important. The valid and invalid functional dependencies and the types of most
important functional dependencies in RDBMS. We have also discussed about the
advantages of FDs.

Finding Attribute Closure and Candidate Keys using Functional Dependencies

Finding Attribute Closure and Candidate Keys


using Functional Dependencies
In this article, we will find the attribute closure and also we will find the
candidate keys using the functional dependency. We will look into this topic in
detail. But before proceeding to this topic, we will first learn about what is
functional dependency.
A functional dependency X->Y in a relation holds if two tuples having the same
value for X also have the same value for Y i.e. X uniquely determines Y. Consider
the table given below.
In the EMPLOYEE relation given in Table,
 Functional Dependency E-ID->E-NAME holds because, for each E-ID,
there is a unique value of E-NAME.
 Functional Dependency E-ID->E-CITY and E-CITY->E-STATE also
holds.
 Functional Dependency E-NAME->E-ID does not hold because E-
NAME ‘John’ is not uniquely determining E-ID. There are 2 E-IDs
corresponding to John (E001 and E003).
Table EMPLOYEE
E-ID E-NAME E-CITY E-STATE

E001 John Delhi Delhi

E002 Mary Delhi Delhi

E003 John Noida U.P.

Table 1: The FD set for EMPLOYEE relation given in Table 1 are:


{E-ID->E-NAME, E-ID->E-CITY, E-ID->E-STATE, E-CITY->E-STATE}
Trivial and Non-Trivial Functional Dependency
Trivial Functional Dependency: A trivial functional dependency is one which will
always hold in a relation. In the example given above, E-ID, E-NAME->E-ID is a
trivial functional dependency and will always hold because {E-ID, E-NAME} ⊃
{E-ID}. You can also see from the table that for each value of {E-ID, E-NAME}, the
value of E-ID is unique, so {E-ID, E-NAME} functionally determines E-ID.
Non-Trivial Functional Dependency: If a functional dependency is not trivial, it
is called Non-Trivial Functional Dependency. Non-Trivial functional dependency
may or may not hold in a relation. e.g.; E-ID->E-NAME is a non-trivial functional
dependency that holds in the above relation.
Properties of Functional Dependencies
Let X, Y, and Z be sets of attributes in a relation R. There are several properties
of functional dependencies which always hold in R also known as Armstrong
Axioms.
 Reflexivity: If Y is a subset of X, then X → Y. e.g.; Let X represents {E-
ID, E-NAME} and Y represents {E-ID}. {E-ID, E-NAME}->E-ID is true
for the relation.
 Augmentation: If X → Y, then XZ → YZ. e.g.; Let X represents {E-ID}, Y
represents {E-NAME} and Z represents {E-CITY}. As {E-ID}->E-NAME
is true for the relation, so { E-ID, E-CITY}->{E-NAME, E-CITY} will also
be true.
 Transitivity: If X → Y and Y → Z, then X → Z. e.g.; Let X represents {E-
ID}, Y represents {E-CITY} and Z represents {E-STATE}. As {E-ID} ->{E-
CITY} and {E-CITY}->{E-STATE} is true for the relation, so { E-ID }-
>{E-STATE} will also be true.
 Attribute Closure: The set of attributes that are functionally
dependent on the attribute A is called Attribute Closure of A and it
can be represented as A+.
Steps to Find the Attribute Closure
Q.1. Given the FD set of a Relation R, The attribute closure set S is the set of
Attribute Closure A.
 Add A to S.
 Recursively add attributes that can be functionally determined from
attributes of the set S until done.
E-ID E-NAME E-CITY E-STATE

E001 John Delhi Delhi

E002 Mary Delhi Delhi

E003 John Noida U.P.

 From Table 1, FDs are


Given R (E-ID, E-NAME, E-CITY, E-STATE)
FDs = { E-ID->E-NAME, E-ID->E-CITY, E-ID->E-STATE, E-CITY->E-STATE }
The attribute closure of E-ID can be calculated as:
 Add E-ID to the set {E-ID}
 Add Attributes that can be derived from any attribute of the set. In this
case, E-NAME and E-CITY, E-STATE can be derived from E-ID. So
these are also a part of the closure.
 As there is one other attribute remaining in relation to be derived from
E-ID. So the result is:
(E-ID)+ = {E-ID, E-NAME, E-CITY, E-STATE }
Similarly,
(E-NAME)+ = {E-NAME}
(E-CITY)+ = {E-CITY, E_STATE}
Q.2 Find the attribute closures of given FDs R(ABCDE) = {AB->C, B->D, C->E,
D->A}. To find (B)+, we will add an attribute in the set using various FDs which
have been shown in the table below.
Attributes Added in Closure FD used

{B} Triviality

{B, D} B->D

{B, D, A} D->A

{B, D, A, C} AB->C

{B, D, A, C, E} C->E

 We can find (C, D)+ by adding C and D into the set (triviality) and then
E using(C->E) and then A using (D->A) and the set becomes.
(C,D)+ = {C,D,E,A}
 Similarly, we can find (B, C)+ by adding B and C into the set (triviality)
and then D using (B->D) and then E using (C->E), and then A using (D-
>A) and set becomes
(B,C)+ ={B,C,D,E,A}

Candidate Key
Candidate Key is a minimal set of attributes of a relationship that can be used
to identify a tuple uniquely. For Example, each tuple of EMPLOYEE relation given
in Table 1 can be uniquely identified by E-ID and it is minimal as well. So it will
be the Candidate key of the relationship.
A candidate key may or may not be a primary key. Super Key is a set of
attributes of a relationship that can be used to identify a tuple uniquely. For
Example, each tuple of EMPLOYEE relation given in Table 1 can be uniquely
identified by E-ID or (E-ID, E-NAME) or (E-ID, E-CITY) or (E-ID, E-STATE) or
(E_ID, E-NAME, E-STATE), etc. So all of these are super keys of EMPLOYEE
relation.
Note: A candidate key is always a super key but vice versa is not true.
Q.3 Finding Candidate Keys and Super Keys of a Relation using FD set.
The set of attributes whose attribute closure is a set of all attributes of the
relation is called the super key of the relation. For Example, the EMPLOYEE
relation shown in Table 1 has the following FD set. {E-ID->E-NAME, E-ID->E-
CITY, E-ID->E-STATE, E-CITY->E-STATE}. Let us calculate the attribute
closure of different sets of attributes:
(E-ID)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-ID,E-NAME)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-ID,E-CITY)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-ID,E-STATE)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-ID,E-CITY,E-STATE)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-NAME)+ = {E-NAME}
(E-CITY)+ = {E-CITY,E-STATE}
As (E-ID)+, (E-ID, E-NAME)+, (E-ID, E-CITY)+, (E-ID, E-STATE)+, (E-ID, E-CITY,
E-STATE)+ give set of all attributes of relation EMPLOYEE. So all of these are
super keys of relation.
The minimal set of attributes whose attribute closure is a set of all attributes of
relation is called the candidate key of the relation. As shown above, (E-ID)+ is a
set of all attributes of relation and it is minimal. So E-ID will be the candidate
key. On the other hand (E-ID, E-NAME)+ also is a set of all attributes but it is not
minimal because its subset (E-ID)+ is equal to the set of all attributes. So (E-ID,
E-NAME) is not a candidate key.

Number of possible Superkeys

Number of Possible Super Keys in DBMS


Any set of attributes of a table that can uniquely identify all the tuples of that
table is known as a Super key. It’s different from the primary and candidate
keys in the sense that only the minimal superkeys are the candidate/primary
keys.
Prerequisite - Relational Model Introduction and Codd Rules
This means that from a super key when we remove all the attributes that
are unnecessary for its uniqueness, only then it becomes a
primary/candidate key. So, in essence, all primary/candidate keys are super
keys but not all super keys are primary/candidate keys. By the formal definition
of a Relation(Table), we know that the tuples of a relation are all unique. So,
the set of all attributes itself is a super key.
Counting the possible number of super keys for a table is a common question
for GATE. The examples below will demonstrate all possible types of
questions on this topic.
Note: Number of super keys = 2n-k where “n” is the number of attributes in the
Relation R and “k” is the number of attributes in the Candidate Key
Example-1: Let a Relation R have attributes {a1,a2,a3} and a1 is the candidate
key. Then how many super keys are possible?
Here, any superset of a1 is the super key.
Super keys are = {a1, a1 a2, a1 a3, a1 a2 a3}
Thus we see that 4 Super keys are possible in this case.
In general, if we have ‘N’ attributes with one candidate key then the number of
possible superkeys is 2(N – 1).
Example-2 : Let a Relation R have attributes {a1, a2, a3,…,an}. Find Super key
of R.
Maximum Super keys = 2n – 1.
Proof: There are n attributes in R. So the total number of possible
subsets/combination of attributes of R is 2n
Now to be a Super key, there should be at least one attribute present i.e. the
NULL set or the set with no attribute can’t be a super key.
So, maximum possible number of Super keys of R = 2n – 1.
Example-3: Let a Relation R have attributes {a1, a2, a3,…, an} and the
candidate key is “a1 a2 a3” then the possible number of super keys?
Following the previous formula, we have 3 attributes instead of one. So, here
the number of possible super keys is 2(N-3).
Example-4: Let a Relation R have attributes {a1, a2, a3,…, an} and the
candidate keys are “a1”, “a2” then the possible number of super keys?
This problem now is slightly different since we now have two different
candidate keys instead of only one. Tackling problems like these is shown in
the diagram below:

→ |A1 ∪ A2| = |A1| + |A2| - |A1 ∩ A2|


= (super keys possible with candidate key A1) + (super keys possible with
candidate key A2) – (common superkeys from both A1 and A2)
= 2(n-1) + 2(n-1) - 2(n-2)
Example-5: Let a Relation R have attributes {a1, a2, a3,…, an} and the
candidate keys are “a1”, “a2 a3” then the possible number of super keys?
Super keys of (a1) + Super keys of (a2 a3) – Super keys of (a1 a2 a3)
⇒ 2(n - 1) + 2(n - 2) - 2(n - 3)
Example-6: Let a Relation R have attributes {a1, a2, a3,…, an} and the
candidate keys are “a1 a2”, “a3 a4” then the possible number of super keys?
Super keys of(a1 a2) + Super keys of(a3 a4) – Super keys of(a1 a2 a3 a4)
⇒ 2(n - 2) + 2(n - 2) - 2(n - 4)
Example-7: Let a Relation R have attributes {a1, a2, a3,…, an} and the
candidate keys are “a1 a2”, “a1 a3” then the possible number of super keys?
Super keys of (a1 a2) + Super keys of (a1 a3) – Super keys of(a1 a2 a3)
⇒ 2(n - 2) + 2(n - 2) - 2(n - 3)
Example-8 : Let a Relation R have attributes {a1, a2, a3,…,an} and the
candidate keys are “a1”, “a2”, “a3” then the possible number of super keys?
In this question, we have 3 different candidate keys. Tackling problems like
these are shown in the diagram below.

→ |A1 ∪ A2 ∪ A3| = |A1| + |A2| + |A3| – |A1 ∩ A2| – |A1 ∩ A3| – |A2 ∩ A3| +
|A1 ∩ A2 ∩ A3|
= (super keys possible with candidate key A1) + (super keys possible with
candidate key A2) + (super keys possible with candidate key A3) – (common
super keys from both A1 and A2) – (common super keys from both A1 and A3)
– (common super keys from both A2 and A3) + (common super keys from
both A1, A2, and A3)
= 2(n-1) + 2(n-1) + 2(n-1) – 2(n-2) – 2(n-2) – 2(n-2) + 2(n-3)

Example-9: A relation R (A, B, C, D, E, F, G, H)and set of functional


dependencies are
CH → G,
A → BC,
B → CFH,
E → A,
F → EG
Then how many possible super keys are present?
Step 1:- First of all, we have to find what the candidate keys are:-
as we can see in the given functional dependency D is missing but in relation, D
is given so D must be a prime attribute of the Candidate key.
A+ = E+ = B+ = F+ = all attributes of a relation except D
So, Candidate keys are = AD, BD, ED, FD
Step 2:-Find super keys due to a single candidate key there is a two
possibilities of attribute either we select or not hence there will be 2 chances
so,
A_ _D_ _ _ _ = _ B_ D_ _ _ _ = _ _ _ DE _ _ _ = _ _ _ D_F_ _ = 2 6
Step 3:-Find superkeys due to a combination of two Candidate Keys. So,
n(AD ∩ BD) = n(AD ∩ ED) = n(AD ∩ FD) = n(BD ∩ ED) = n(BD ∩ FD) = n(ED ∩
FD) = 25
Step 4:-Find super keys due to a combination of 3 Candidate Keys
So,
n(AD ∩ BD ∩ ED) = n(AD ∩ ED ∩ FD) = n(ED ∩ BD ∩ FD) = n(BD ∩ FD ∩ AD) =
24
Step 5:-Find super keys due to all. So,
n(AD ∩ BD ∩ ED ∩ FD) = AB_DEF_ _ = 23
So, According to the inclusion-exclusion principle :-
|W ∪ X ∪ Y ∪ Z| = |W| + |X| + |Y| + |Z| – |W ∩ X| – |W ∩ Y| – |W ∩ Z| – |X ∩Y| – |X
∩ Z| – |Y ∩ Z| + |W ∩ X ∩ Y| + |W ∩ X ∩ Z| + |W ∩ Y ? Z| + |X ∩ Y ∩
Z| – |W ∩ X ∩ Y ∩ Z|
# Super keys = 4 * 26 – 6 * 25 + 4 * 24 – 23 = 120
So the number of super keys is 120.
Example 10 : Let a Relation R have attributes {a1,a2,a3______ an} and
{a1a2a3____ak} as the candidate key where k<=n. Then how many super keys
are possible?
The possible number of superkeys is 2(n-k).
Example 11: Let a relation R have attributes {a1,a2,a3______ an} such that any
k of the attributes at a time determines all other attributes. Find the value of k
such that the number of candidate keys in the relation will be maximum.
Any k attributes at a time constitute one candidate key. These k attributes are
randomly chosen from the n attributes. So for some k, the possible no of
candidate keys is nCk,i.e, n!/(n-k)!k!. For the number of members to be maximum
k must be ⌊n/2⌋ so that nCk is the maximum for that value.
Lossy and Lossless Decomposition

Lossless Decomposition in DBMS


The original relation and relation reconstructed from joining decomposed
relations must contain the same number of tuples if the number is increased or
decreased then it is Lossy Join decomposition.
Lossless join decomposition ensures that never get the situation where spurious
tuples are generated in relation, for every value on the join attributes there will
be a unique tuple in one of the relations.
What is Lossless Decomposition?
Lossless join decomposition is a decomposition of a relation R into relations R1,
and R2 such that if we perform a natural join of relation R1 and R2, it will return
the original relation R. This is effective in removing redundancy from databases
while preserving the original data.
In other words by lossless decomposition, it becomes feasible to reconstruct the
relation R from decomposed tables R1 and R2 by using Joins.
Only 1NF,2NF,3NF, and BCNF are valid for lossless join decomposition.
In Lossless Decomposition, we select the common attribute and the criteria for
selecting a common attribute is that the common attribute must be a candidate
key or super key in either relation R1, R2, or both.
Decomposition of a relation R into R1 and R2 is a lossless-join decomposition if
at least one of the following functional dependencies is in F+ (Closure of
functional dependencies)
Example of Lossless Decomposition
— Employee (Employee_Id, Ename, Salary, Department_Id, Dname)
Can be decomposed using lossless decomposition as,
— Employee_desc (Employee_Id, Ename, Salary, Department_Id)
— Department_desc (Department_Id, Dname)
Alternatively the lossy decomposition would be as joining these tables is not
possible so not possible to get back original data.
– Employee_desc (Employee_Id, Ename, Salary)
– Department_desc (Department_Id, Dname)
R1 ∩ R2 → R1
OR
R1 ∩ R2 → R2
In a database management system (DBMS), a lossless decomposition is a
process of decomposing a relation schema into multiple relations in such a way
that it preserves the information contained in the original relation. Specifically, a
lossless decomposition is one in which the original relation can be reconstructed
by joining the decomposed relations.
To achieve lossless decomposition, a set of conditions known as Armstrong’s
axioms can be used. These conditions ensure that the decomposed relations will
retain all the information present in the original relation. Specifically, the two
most important axioms for lossless decomposition are the reflexivity and the
decomposition axiom.
The reflexivity axiom states that if a set of attributes is a subset of another set of
attributes, then the larger set of attributes can be inferred from the smaller set.
The decomposition axiom states that if a relation R can be decomposed into two
relations R1 and R2, then the original relation R can be reconstructed by taking
the natural join of R1 and R2.
There are several algorithms available for performing lossless decomposition in
DBMS, such as the BCNF (Boyce-Codd Normal Form) decomposition and
the 3NF (Third Normal Form) decomposition. These algorithms use a set of rules
to decompose a relation into multiple relations while ensuring that the original
relation can be reconstructed without any loss of information.
Advantages of Lossless Decomposition
1. Reduced Data Redundancy: Lossless decomposition helps in reducing
the data redundancy that exists in the original relation. This helps in
improving the efficiency of the database system by reducing storage
requirements and improving query performance.
2. Maintenance and Updates: Lossless decomposition makes it easier to
maintain and update the database since it allows for more granular
control over the data.
3. Improved Data Integrity: Decomposing a relation into smaller
relations can help to improve data integrity by ensuring that each
relation contains only data that is relevant to that relation. This can
help to reduce data inconsistencies and errors.
4. Improved Flexibility: Lossless decomposition can improve the
flexibility of the database system by allowing for easier modification of
the schema.
Disadvantages of Lossless Decomposition
 Increased Complexity: Lossless decomposition can increase the
complexity of the database system, making it harder to understand
and manage.
 Increased Processing Overhead: The process of decomposing a
relation into smaller relations can result in increased processing
overhead. This can lead to slower query performance and reduced
efficiency.
 Join Operations: Lossless decomposition may require additional join
operations to retrieve data from the decomposed relations. This can
also result in slower query performance.
 Costly: Decomposing relations can be costly, especially if the database
is large and complex. This can require additional resources, such as
hardware and personnel.
Conclusion
In Conclusion, a lossless decomposition is an important concept in DBMS that
ensures that the original relation can be reconstructed from the decomposed
relations without any loss of information. The use of Armstrong’s axioms and
decomposition algorithms such as BCNF and 3NF can help achieve lossless
decomposition in practice.
Question Asked in GATE
Q.1: Let R (A, B, C, D) be a relational schema with the following
functional dependencies:
A → B, B → C,
C → D and D → B.
The decomposition of R into
(A, B), (B, C), (B, D)
(A) gives a lossless join, and is dependency preserving
(B) gives a lossless join, but is not dependency preserving
(C) does not give a lossless join, but is dependency preserving
(D) does not give a lossless join and is not dependency preserving
Refer to this for a solution.
Q.2: R(A,B,C,D) is a relation. Which of the following does not have a
lossless join, dependency preserving BCNF decomposition?
(A) A->B, B->CD
(B) A->B, B->C, C->D
(C) AB->C, C->AD
(D) A ->BCD

Dependency Preserving Decomposition

Dependency Preserving Decomposition – DBMS


In a Database Management System (DBMS), dependency-preserving
decomposition refers to the process of breaking down a complex database
schema into simpler, smaller tables, such that all the functional dependencies of
the original schema are still enforceable without needing to perform additional
joins.
This approach is crucial for database normalization as it minimizes redundancy,
prevents anomalies, and improves the efficiency of database queries. To achieve
dependency-preserving decomposition, algorithms like lossless join
decomposition and dependency-preserving decomposition are applied,
ensuring that all original dependencies can be represented directly in the
decomposed tables.
Example:
Suppose R is a relational schema and F is the set of functional dependencies on
R. If R is decomposed into relations R1, R2, ………….…… Rn , each holding functional
dependencies F1, F2, …….……… Fn respectively. We can say, F` = F1 U F2 U ………..…
U Fn.
Now this decomposition will be considered as dependency preserving
decomposition if and only if-
Every dependency in F is logically implied by F` i.e. F`+ = F+ It is obvious that F1
⊆ F+, F2 ⊆ F + and so on.
If we verify that F` is satisfied in R, we have verified that decomposition is
dependency preserving decomposition i.e. F1 U F2 = F.
Let’s say:
 The original relation R has a set of functional dependencies (FDs)
called F.
 When we decompose R into R1 and R2, each gets its own FDs:
o f1: FDs in R1
o f2: FDs in R2
 The combined FDs from R1 and R2 are f1∪f2.
Now, there are three possible cases:
Case 1: f1∪f2=F
 This means the FDs from R1 and R2 together exactly match the
original FDs F.
 Result: The decomposition is dependency-preserving because we
haven’t lost any FDs.
Example:
Original R:
| StudentID | CourseID | Instructor |
Functional Dependencies F:
 CourseID→Instructor
 StudentID,CourseID→Instructor
After decomposition:
1. R1(StudentID,CourseID): f1={StudentID,CourseID→Instructor}
2. R2(CourseID,Instructor): f2={CourseID→Instructor}
Here, f1∪f2=F.
The decomposition is dependency-preserving.
Case 2: f1∪f2⊂F
 This means some FDs from the original set F are missing in f1∪f2.
 Result: The decomposition is not dependency-preserving, as we’ve
lost some FDs.
Example:
Original R:
| StudentID | CourseID | Instructor |
Functional Dependencies F:
 StudentID,CourseID→Instructor
 CourseID→Instructor
After decomposition:
1. R1(StudentID,CourseID): f1={StudentID,CourseID→Instructor}
2. R2(CourseID,Instructor): f2={}
Here, f1∪f2⊂F.
The FD CourseID→InstructorCourseID is missing.
The decomposition is not dependency-preserving.
Case 3: f1∪f2⊃F
 This means the FDs from R1R_1R1 and R2R_2R2 contain extra
dependencies that were not part of F.
 Result: This case is technically possible but uncommon. These extra
dependencies may not cause direct problems but could lead to
inconsistencies or unexpected behavior.
Example:
Original R:
| StudentID | CourseID | Instructor |
Functional Dependencies F:
 CourseID→Instructor
After decomposition:
1. R1(StudentID,CourseID): f1={CourseID→Instructor}
2. R2(CourseID,Instructor): f2={Instructor→CourseID}
Here, f1∪f2⊃F, as the FD Instructor→CourseID was added unnecessarily.
The decomposition has extra dependencies, which could lead to confusion but
doesn’t directly violate dependency preservation.
Key Concepts of Dependency Preserving Decomposition
in DBMS
The key concepts of dependency-preserving decomposition include:
 Functional Dependency Preservation: This means that after
decomposition, the functional dependencies in the original schema
must still hold true in the decomposed schema.
 Lossless Join Property: The decomposition must allow for the original
relation to be reconstructed from the decomposed relations without
any data loss, ensuring no information is discarded.
 Normalization: The decomposition often aims to normalize the schema
to higher normal forms (like 3NF or BCNF), which further eliminates
redundancy and dependency anomalies.
 Minimal Redundancy: By ensuring the decomposition
preserves functional dependencies, it minimizes data redundancy and
helps in avoiding data anomalies.
Problem: Let a relation R (A, B, C, D ) and functional dependency {AB
–> C, C –> D, D –> A}. Relation R is decomposed into R1( A, B, C) and
R2(C, D). Check whether decomposition is dependency preserving or
not.
Solution:
R1(A, B, C) and R2(C, D)
Let us find closure of F1 and F2
To find closure of F1, consider all combination of ABC. i.e., find closure of A, B,
C, AB, BC and AC
Note ABC is not considered as it is always ABC
closure(A) = { A } // Trivial
closure(B) = { B } // Trivial
closure(C) = {C, A, D} but D can’t be in closure as D is not present R1.
= {C, A}
C–> A // Removing C from right side as it is trivial attribute
closure(AB) = {A, B, C, D}
= {A, B, C}
AB –> C // Removing AB from right side as these are trivial attributes
closure(BC) = {B, C, D, A}
= {A, B, C}
BC –> A // Removing BC from right side as these are trivial attributes
closure(AC) = {A, C, D}
NULL SET
F1 {C–> A, AB –> C, BC –> A}.
Similarly F2 { C–> D }
In the original Relation Dependency { AB –> C , C –> D , D –> A}.
AB –> C is present in F1.
C –> D is present in F2.
D –> A is not preserved.
F1 U F2 is a subset of F. So given decomposition is not dependency
preserving.

How Dependency Preserving Decomposition Enhances


Database Efficiency?
Dependency-preserving decomposition enhances database efficiency by:
 Eliminating Redundancy: It helps reduce unnecessary repetition of
data, leading to smaller storage requirements.
 Maintaining Integrity: By preserving functional dependencies, the
database ensures consistent data with fewer chances of anomalies like
update, insert, or delete anomalies.
 Improving Query Performance: With a well-decomposed schema, it’s
easier to optimize queries as the smaller tables are often faster to
process.
 Simplifying Updates: Since data is more normalized, updates become
simpler and more efficient, reducing the risk of inconsistencies.
Imp Note: The 1NF, 2NF, and 3NF are valid for dependency-preserving
decomposition.
Step-by-Step Approach to Dependency Preserving
Decomposition in DBMS
 In this technique, the original relation is decomposed into smaller
relations in such a way that the resulting relations preserve the
functional dependencies of the original relation. This is important
because if the decomposition results in losing any of the original
functional dependencies, it can lead to data inconsistencies and
anomalies.
 To achieve dependency preserving decomposition, there are various
algorithms available, such as the Boyce-Codd Normal Form (BCNF)
decomposition and the Third Normal Form (3NF) decomposition. These
algorithms are based on the concept of functional dependencies and
are used to identify the attributes that should be grouped together to
form smaller relations.
 The BCNF decomposition algorithm is used to decompose a relation
into smaller relations in such a way that each resulting relation is in
BCNF. BCNF is a higher normal form than 3NF and is used when there
are multiple candidate keys in a relation.
 The 3NF decomposition algorithm is used to decompose a relation into
smaller relations in such a way that each resulting relation is in 3NF.
3NF is a normal form that ensures that there are no transitive
dependencies between the attributes of a relation.
 Overall, dependency preserving decomposition is an important
technique in DBMS for improving database efficiency while
maintaining data consistency and integrity. It is important to choose
the right decomposition algorithm based on the specific requirements
of the database to achieve the desired results.

Lossless Join and Dependency Preserving Decomposition

Lossless Join and Dependency Preserving


Decomposition
Decomposition of a relation is done when a relation in a relational model is not
in appropriate normal form. Relation R is decomposed into two or more relations
if decomposition is lossless join as well as dependency preserving.
Lossless Join Decomposition
If we decompose a relation R into relations R1 and R2,
Decomposition is lossy if R1 ⋈ R2 ⊃ R
Decomposition is lossless if R1 ⋈ R2 = R
To check for lossless join decomposition using the FD set, the following
conditions must hold:
1. The Union of Attributes of R1 and R2 must be equal to the attribute of R. Each
attribute of R must be either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
2. The intersection of Attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
3. The common attribute must be a key for at least one relation (R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into
R1(ABC) and R2(AD) which is a lossless join decomposition as:
1. First condition holds true as Att(R1) U Att(R2) = (ABC) U (AD) =
(ABCD) = Att(R).
2. Second condition holds true as Att(R1) ∩ Att(R2) = (ABC) ∩ (AD) ≠ Φ
3. The third condition holds as Att(R1) ∩ Att(R2) = A is a key of R1(ABC)
because A->BC is given.
Dependency Preserving Decomposition
If we decompose a relation R into relations R1 and R2, All dependencies of R
either must be a part of R1 or R2 or must be derivable from a combination
of functional dependency of R1 and R2. For Example, A relation R (A, B, C, D)
with FD set{A->BC} is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of R1(ABC).
Advantages of Lossless Join and Dependency Preserving
Decomposition
 Improved Data Integrity: Lossless join and dependency preserving
decomposition help to maintain the data integrity of the original
relation by ensuring that all dependencies are preserved.
 Reduced Data Redundancy: These techniques help to reduce data
redundancy by breaking down a relation into smaller, more
manageable relations.
 Improved Query Performance: By breaking down a relation into
smaller, more focused relations, query performance can be improved.
 Easier Maintenance and Updates: The smaller, more focused relations
are easier to maintain and update than the original relation, making it
easier to modify the database schema and update the data.
 Better Flexibility: Lossless join and dependency preserving
decomposition can improve the flexibility of the database system by
allowing for easier modification of the schema.
Disadvantages of Lossless Join and Dependency
Preserving Decomposition
 Increased Complexity: Lossless join and dependency-preserving
decomposition can increase the complexity of the database system,
making it harder to understand and manage.
 Costly: Decomposing relations can be costly, especially if the database
is large and complex. This can require additional resources, such as
hardware and personnel.
 Reduced Performance: Although query performance can be improved
in some cases, in others, lossless join and dependency-preserving
decomposition can result in reduced query performance due to the
need for additional join operations.
 Limited Scalability: These techniques may not scale well in larger
databases, as the number of smaller, focused relations can become
unwieldy.
GATE Question
Consider a schema R(A, B, C, D) and functional dependencies A->B and C->D.
Then the decomposition of R into R1(AB) and R2(CD) is [GATE-CS-2001]
(A) dependency preserving and lossless join
(B) lossless join but not dependency preserving
(C) dependency preserving but not lossless join
(D) not dependency preserving and not lossless join
Answer:
For lossless join decomposition, these three conditions must hold:
Att(R1) U Att(R2) = ABCD = Att(R)
Att(R1) ∩ Att(R2) = Φ, which violates the
condition of lossless join decomposition.
Hence the decomposition is not lossless.
For dependency preserving decomposition, A->B can be ensured in R1(AB) and
C->D can be ensured in R2(CD). Hence it is dependency preserving
decomposition. So, the correct option is C.

DBMS | How to find the highest normal form of a relation

How to find the highest normal form of a relation


Normalization is the process of structuring data in a database by creating tables
and defining relationships between them. This ensures data consistency,
protection, and improves the database’s efficiency and flexibility. Typically,
every table in a relational database is assumed to be in the first normal form
(1NF), which requires that all attributes contain atomic (indivisible) values,
meaning no multiple values are allowed in a single row.
For a table to achieve the second normal form (2NF), it must eliminate any
partial dependencies. To satisfy the third normal form (3NF), the table must
also be free of transitive dependencies. Lastly, for a table to be in Boyce-Codd
Normal Form (BCNF), every determinant in the functional dependencies must
be a super-key.
To understand this topic, you should have a basic idea about Functional
Dependency , Candidate keys and Normal forms .
Steps to find the highest normal form of relation:
1. Find all possible candidate keys of the relation.
2. Divide all attributes into two categories: prime attributes and non-
prime attributes.
3. Check for 1st normal form then 2nd and so on. If it fails to satisfy the
nth normal form condition, the highest normal form will be n-1.
Check for 1NF (First Normal Form) :
 Verify that all columns contain only single values (no lists or arrays).
o Example of non-atomic: {Math, Science}.
Check for 2NF (Second Normal Form) :
 Identify the primary key (composite or single).
 Check for partial dependencies (where non-prime attributes depend on
part of a composite key instead of the whole key).
 If partial dependencies exists, the relation is not in 2nf.
Check for 3NF (Third Normal Form) :
 Identify all functional dependencies.
 Check for transitive dependencies (non-prime attributes depend on
other non-prime attributes).
 If transitive dependencies exists, the relation is not in 3nf.
Check for BCNF (Boyce-Codd Normal Form) :
 Identify all functional dependencies.
 Check if the left-hand side (determinant) of every functional
dependency is a super-key.
 If not, the relation is not in BCNF.
Example 1. Find the highest normal form of a relation R(A,B,C,D,E) with FD
set {A->D, B->A, BC->D, AC->BE}
Step 1. As we can see, (AC)+ ={A, C, B, E, D} but none of its subsets can
determine all attributes of relation, So AC will be the candidate key. A can be
derived from B, so we can replace A in AC with B. So BC will also be a
candidate key. So there will be two candidate keys {AC, BC}.
Step 2. The prime attribute is those attribute which is part of candidate key {A,
B, C} in this example and others will be non-prime {D, E} in this example.
Step 3. The relation R is in 1st normal form as a relational DBMS does not allow
multi-valued or composite attributes.
The relation is not in the 2nd Normal form because A->D is partial dependency
(A which is a subset of candidate key AC is determining non-prime attribute D)
and the 2nd normal form does not allow partial dependency.
So the highest normal form will be the 1st Normal Form.
Example 2. Find the highest normal form of a relation R(A,B,C,D,E) with FD
set as {BC->D, AC->BE, B->E}
Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its subsets can
determine all attributes of relation, So AC will be the candidate key. A or C
can’t be derived from any other attribute of the relation, so there will be only 1
candidate key {AC}.
Step 2. The prime attribute is those attribute which is part of candidate key
{A,C} in this example and others will be non-prime {B,D,E} in this example.
Step 3. The relation R is in 1st normal form as a relational DBMS does not
allow multi-valued or composite attributes.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is
not a proper subset of candidate key AC) and AC->BE is in 2nd normal form (AC
is candidate key) and B->E is in 2nd normal form (B is not a proper subset of
candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super
key nor D is a prime attribute) and in B->E (neither B is a super key nor E is a
prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be
super key or RHS should be a prime attribute.
So the highest normal form of relation will be the 2 nd Normal form.
Example 3. Find the highest normal form of a relation R(A,B,C,D,E) with FD
set {B->A, A->C, BC->D, AC->BE}
Step 1. As we can see, (B)+ ={B,A,C,D,E}, so B will be candidate key. B can be
derived from AC using AC->B (Decomposing AC->BE to AC->B and AC->E). So
AC will be super key but (C)+ ={C} and (A)+ ={A,C,B,E,D}. So A (subset of AC)
will be candidate key. So there will be two candidate keys {A,B}.
Step 2. The prime attribute is those attribute which is part of candidate key
{A,B} in this example and others will be non-prime {C,D,E} in this example.
Step 3. The relation R is in 1st normal form as a relational DBMS does not
allow multi-valued or composite attributes.
The relation is in 2nd normal form because B->A is in 2nd normal form (B is a
super key) and A->C is in 2nd normal form (A is super key) and BC->D is in
2nd normal form (BC is a super key) and AC->BE is in 2nd normal form (AC is a
super key).
The relation is in 3rd normal form because the LHS of all FD’s is super keys. The
relation is in BCNF as all LHS of all FD’s are super keys. So the highest normal
form is BCNF.

Minimum relations satisfying 1NF

Minimum Relations Satisfying First Normal Form


(1NF)
A relation that does not contain any composite or multivalued attribute, then the
relation is in its First Normal Form. Relations that contain a single-valued
attribute comes under First Normal Form. In this article, we will be going to
discuss the minimum relations satisfying the First Normal Form. Before
proceeding to First Normal Form, let’s discuss how to design a database.
How to Design a Database?
 Talk to the stakeholder for which we are designing the database. Get
all the requirements, what attributes need to be stored, and establish
functional dependencies over the given set of attributes.
 Draw an Entity-Relationship Diagram on the basis of requirements
analysis.
 Convert the ER diagram into the relational model and finally create
these relations into our database with appropriate constraints.
Designing ER Diagrams is easier than finding minimum relations that satisfy the
First Normal Form. We establish certain simple rules which are formed after a
deep analysis of each case and hence, could be used directly by understanding
the logic behind them.
 If there is total participation on both sides; Merge the two entities
involved and the relationship into 1 table.
 Else if, one side is total participation and one side is partial
o M: N – Merge the relationship on the total participation
side.
o 1: N – Merge the relationship on the total participation
side.
o 1: 1 – Merge the two entities involved and the
relationship into 1 table.
 Else if, both sides are partial participation
o M: N – Separate table for each entity as well as
relationship. Hence, 3 tables.
o 1: N – Merge the relationship on the N-side using
foreign key referencing 1-side.
o 1: 1 – Merge the relationship and one entity into 1 table
using the foreign key and 1 table for the other entity.
Now, you would definitely have a question in your mind, how do we form such
rules? This is very easy and logical. Let’s understand the logic behind it for one
case and you can similarly establish the results for other cases too. We have
been given a scenario of a 1:N relationship with two entities E1(ABC) and
E2(DEF), where A and D are primary keys, respectively. E1 has partial
participation while E2 has total participation in the relationship R. Based on the
above scenario, we create certain tuples in E1:
A B C

a1 b1 c1

a2 b2 c2

a3 b3 c3
Similarly, create certain tuples for E2:
D E F

d1 e1 f1

d2 e2 f2

d3 e3 f3

Now, create a relationship R satisfying the above conditions, i.e. E1 is partial


participation and E2 is total participation, and E1 to E2 is a 1:N relationship.
A D

a1 d1

a1 d2

a2 d3

Ways of Merging Two Entities into a Single Table


 Way-1: Merge the two entities and relationships into a single table.
This is not correct as (AD) will become the primary key for this table,
but the primary key can never have a NULL value.
A B C D E F

a1 b1 c1 d1 e1 f1

a1 b1 c1 d2 e2 f2

a2 b2 c2 d3 e3 f3

a3 b3 c3 NULL NULL NULL

 Way-2: Merge relationship on 1-side. This is not correct as (AD) will


become the primary key for this table, but the primary key can never
have a NULL value.
A B C D

a1 b1 c1 d1

a1 b1 c1 d2

a2 b2 c2 d3

a3 b3 c3 NULL

 Way-3: Merge relationship on N-side. This is correct.


D E F A

d1 e1 f1 a1

d2 e2 f2 a1

d3 e3 f3 a2

On the same grounds, could you think why we allow merging the two entities as
well as relationships into 1 table when it is a 1:1 relationship? Simply, we would
not have a composite primary key there, so we will definitely have a primary key
with no NULL values present in it. Stress some more, why do we allow merging
the entities and relationship with both sides’ total participation? The reason is
even if we have a composite primary key for such a merged table, we are sure
that it will never have any NULL values for the primary key.
Note – You can follow the same procedure as stated above to establish all the
results.

Equivalence of Functional Dependencies

Equivalence of Functional Dependencies


Pre-Requisite: Functional Dependency, Finding Attribute Closure, and Candidate
Keys using Functional Dependency
For understanding the equivalence of Functional Dependencies Sets (FD sets),
the basic idea about Attribute Closure is given in this article Given a Relation with
different FD sets for that relation, we have to find out whether one FD set is a
subset of another or both are equal.
How To Find the Relationship Between Two Functional
Dependency Sets?
Let FD1 and FD2 be two FD sets for a relation R.
1. If all FDs of FD1 can be derived from FDs present in FD2, we can say
that FD2 ⊃ FD1.
2. If all FDs of FD2 can be derived from FDs present in FD1, we can say
that FD1 ⊃ FD2.
3. If 1 and 2 both are true, FD1=FD2.
All these three cases can be shown using the Venn diagram:

Equivalence of Functional Dependency

Why We Need to Compare Functional Dependencies?


Suppose in the designing process we convert the ER diagram to a relational
model and this task is given to two different engineers. Now those two engineers
give two different sets of functional dependencies. So, being an administrator we
need to ensure that we must have a good set of Functional Dependencies. To
ensure this we require to study the equivalence of Functional Dependencies.
Advantages
 It can help to identify redundant functional dependencies, which can
be eliminated to reduce data redundancy and improve database
performance.
 It can help to optimize database design by identifying equivalent sets
of functional dependencies that can be used interchangeably.
 It can ensure data consistency by identifying all possible combinations
of attributes that can exist in the database.
Disadvantages
 The process of determining the equivalence of functional
dependencies can be computationally expensive, especially for large
datasets.
 The process may require testing multiple candidate sets of functional
dependencies, which can be time-consuming and complex.
 The equivalence of functional dependencies may not always accurately
reflect the semantic meaning of data, and may not always reflect the
true relationships between data elements.
Sample Questions
Q.1 Let us take an example to show the relationship between two FD sets. A
relation R(A,B,C,D) having two FD sets FD1 = {A->B, B->C, AB->D} and FD2
= {A->B, B->C, A->C, A->D}
Step 1: Checking whether all FDs of FD1 are present in FD2
 A->B in set FD1 is present in set FD2.
 B->C in set FD1 is also present in set FD2.
 AB->D is present in set FD1 but not directly in FD2 but we will check
whether we can derive it or not. For set FD2, (AB)+ = {A, B, C, D}. It
means that AB can functionally determine A, B, C, and D. So AB->D
will also hold in set FD2.
As all FDs in set FD1 also hold in set FD2, FD2 ⊃ FD1 is true.
Step 2: Checking whether all FDs of FD2 are present in FD1
 A->B in set FD2 is present in set FD1.
 B->C in set FD2 is also present in set FD1.
 A->C is present in FD2 but not directly in FD1 but we will check
whether we can derive it or not. For set FD1, (A)+ = {A, B, C, D}. It
means that A can functionally determine A, B, C, and D. SO A->C will
also hold in set FD1.
 A->D is present in FD2 but not directly in FD1 but we will check
whether we can derive it or not. For set FD1, (A)+ = {A, B, C, D}. It
means that A can functionally determine A, B, C, and D. SO A->D will
also hold in set FD1.
As all FDs in set FD2 also hold in set FD1, FD1 ⊃ FD2 is true.
Step 3: As FD2 ⊃ FD1 and FD1 ⊃ FD2 both are true FD2 =FD1 is true. These
two FD sets are semantically equivalent.
Q.2 Let us take another example to show the relationship between two FD
sets. A relation R2(A,B,C,D) having two FD sets FD1 = {A->B, B->C,A->C} and
FD2 = {A->B, B->C, A->D}
Step 1: Checking whether all FDs of FD1 are present in FD2
 A->B in set FD1 is present in set FD2.
 B->C in set FD1 is also present in set FD2.
 A->C is present in FD1 but not directly in FD2 but we will check
whether we can derive it or not. For set FD2, (A)+ = {A, B, C, D}. It
means that A can functionally determine A, B, C, and D. SO A->C will
also hold in set FD2.
As all FDs in set FD1 also hold in set FD2, FD2 ⊃ FD1 is true.
Step 2: Checking whether all FDs of FD2 are present in FD1
 A->B in set FD2 is present in set FD1.,
 B->C in set FD2 is also present in set FD1.
 A->D is present in FD2 but not directly in FD1 but we will check
whether we can derive it or not. For set FD1, (A)+ = {A,B,C}. It means
that A can’t functionally determine D.
 So A->D will not hold in FD1.
As all FDs in set FD2 do not hold in set FD1, FD2 ⊄ FD1.
Step 3: In this case, FD2 ⊃ FD1 and FD2 ⊄ FD1, these two FD sets are not
semantically equivalent.

Canonical Cover

Canonical Cover of Functional Dependencies in


DBMS
Managing a large set of functional dependencies can result in unnecessary
computational overhead. This is where the canonical cover becomes useful. The
canonical cover of a set of functional dependencies F is a simplified version of F
that retains the same closure as the original set, ensuring no redundancy.
An attribute in a functional dependency is considered extraneous if it can be
removed without altering the closure of the set of functional dependencies.

Canonical Cover
A canonical cover is a set of functional dependencies that is equivalent to a given
set of functional dependencies but is minimal in terms of the number of
dependencies. Canonical Cover of functional dependency is also called minimal
set of functional dependency or irreducible form of functional dependency. The
process of finding the canonical cover of a set of functional dependencies
involves the following steps:
Step 1: Combine Functional Dependencies with the Same Left-Hand
Side
 If two or more functional dependencies in F have the same left-hand
side, combine them into a single functional dependency by taking the
union of their right-hand sides.
 Example:
o A→B and A→C become A→BC.

Step 2: Eliminate Extraneous Attributes


An attribute is extraneous if removing it does not change the closure of the
functional dependency set. There are two scenarios:
Extraneous Attributes on the Left-Hand Side:
For X→Y, check if any attribute in X can be removed without affecting the
closure.
To check:
 Remove an attribute A from X to form X′.
 Compute the closure of F with X′→Y instead of X→YX .
 If the closure remains unchanged, A is extraneous.
Extraneous Attributes on the Right-Hand Side:
For X→Y, check if any attribute in Y can be removed without affecting the
closure.
To check:
 Remove an attribute B from Y.
 Compute the closure of F with X→Y′, where Y′ is Y without B.
 If the closure remains unchanged, B is extraneous.
Step 3: Decompose Functional Dependencies
If the right-hand side of a functional dependency has multiple attributes (e.g.,
X→AB), decompose it into multiple functional dependencies, each with a single
attribute on the right-hand side.
Example:
X→AB becomes X→A and X→B.
Step 4: Check for Redundant Dependencies
A functional dependency FD in F is redundant if it can be removed without
changing the closure of F.
To check:
 Temporarily remove FD from F.
 Compute the closure of the remaining set.
 If the closure is the same as the closure of the original set, FD is
redundant and can be removed.
Step 5: Verify the Final Canonical Cover
Ensure that each functional dependency is in its simplest form:
 The left-hand side has no extraneous attributes.
 The right-hand side contains only one attribute.
Check that the closure of the canonical cover is the same as the closure of the
original set F.
Illustrative Examples
Example 1:
Consider a set of Functional dependencies: 𝐹={𝐴→𝐵𝐶,𝐵→𝐶,𝐴𝐵→𝐶}. Here are the
steps to find the canonical cover –
Step 1: Combine Functional Dependencies with the Same Left-Hand
Side
No two functional dependencies in F have the same left-hand side, so no
changes are needed at this stage.
Step 2: Eliminate Extraneous Attributes
Check A→BC :
 The left-hand side A has no extraneous attributes because it’s a single
attribute.
 Check the right-hand side for extraneous attributes:
o Split A→BC into A→B and A→C.
o Now, F={A→B,A→C,B→C,AB→C}.
Check B→C :
 The left-hand side B has no extraneous attributes (it’s a single
attribute).
 No changes are needed.
Check AB→C :
 Checking 𝐴𝐵→𝐶: First, check if 𝐴 or 𝐵 is extraneous.
 We can reach 𝐶 without using 𝐴𝐵→𝐶 with other functional
dependencies; therefore, we remove 𝐴𝐵→𝐶.
 Finally, we have {𝐴→𝐵, 𝐴→𝐶, 𝐵→𝐶}.
Step 3: Decompose Functional Dependencies
All functional dependencies in F={A→B,A→C,B→C} have single attributes on
the right-hand side, so no decomposition is needed.
Step 4: Check for Redundant Dependencies
Check A→C :
 Check each functional dependency to see if it can be reached without
using it. For example, 𝐴→𝐶 can be reached with 𝐴→𝐵 and 𝐵→𝐶.
Therefore, 𝐴→𝐶 is redundant and can be removed.
 Now F={A→B,B→C}.
Step 5: Final Canonical Cover
The final canonical cover is:
Fc={A→B,B→C}.
This is the simplified set of functional dependencies that has the same closure
as the original set F.
Example 2:
Given F = { A → BC, B → C, A → B, AB → C }
 Step 1 Reduction: There are two functional dependencies with the
same attributes on the left: A → BC, A → B are already in their
simplest form.
 Step 2 Elimination: In A → BC, C is extraneous because A → C can be
derived from A → B and B → C. Thus, we reduce it to A → B.
 Step 3 Minimization: No redundant dependencies remain.
Hence, the canonical cover is Fc = { A → B, B → C }
Example 3:
Given F = { A → BC, CD → E, B → D, E → A }
 Step 1 Reduction: Each left-hand side of the functional dependencies
is unique and cannot be combined further.
 Step 2 Elimination: None of the attributes on the left or right sides of
any functional dependency are extraneous.
 Step 3 Minimization: No dependencies are redundant.
Hence, the canonical cover is F = { A → BC, CD → E, B → D, E → A }.
How to Check Whether a Set of FDs F Canonically Covers
Another Set of FDs G?
To verify whether a set of functional dependencies (F) canonically covers
another set of functional dependencies (G), follow these steps:
Step 1: Compute the Closure of Each Set
Compute the closure of F:
 Use the attributes and dependencies in F to determine all the attribute
sets that can be functionally determined.
Compute the closure of G:
 Similarly, calculate the attribute closures using the dependencies in G.

Step 2: Compare the Closures


For F to canonically cover G, the following conditions must hold:
The closure of F must be equivalent to the closure of G. That is, for every
functional dependency in G, it must be derivable from F and vice versa.
Step 3: Derive Dependencies in G from F
For each functional dependency in G (e.g., X→Y):
Compute X+ (closure of X) under F.
Verify that Y⊆X+.
If this is true for all functional dependencies in G, F covers G.
Step 4: Derive Dependencies in F from G
To ensure F and G are equivalent:
For each dependency in FF (e.g., X→Y):
 Compute X+ (closure of X) under G.
 Check that Y⊆X+.
If all dependencies in F can be derived from G, the two sets are equivalent.
Step 5: Verify Minimality (Optional)
If F is already minimal (e.g., no extraneous attributes or redundant
dependencies), and it satisfies the above steps, then F is a canonical cover of G.
Example:
Let F={A→B,B→C} and G={A→BC}.
1. Compute Closure of F:
 A+={A,B,C} (using A→B and B→C).
2. Compute Closure of G:
 A+={A,B,C} (using A→BC).
3. Compare F with G:
 G can be derived from F: A→BC is equivalent to A→B and
B→C.
 F can be derived from G: A→B and B→C are derivable from
A→BC.
Since F and G have the same closure and F is minimal, F canonically covers G.
Features of the Canonical Cover
 Minimal: The canonical cover is the smallest set of dependencies that
can be derived from a given set of dependencies, i.e., it has the
minimum number of dependencies required to represent the same set
of constraints.
 Lossless: The canonical cover preserves all the functional
dependencies of the original set of dependencies, i.e., it does not lose
any information.
 Deterministic: The canonical cover is deterministic, i.e., it does not
contain any redundant or extraneous dependencies.
 Reduces Data Redundancy: The canonical cover helps to reduce data
redundancy by eliminating unnecessary dependencies that can be
inferred from other dependencies.
 Improves Query Performance: The canonical cover helps to improve
query performance by reducing the number of joins and redundant
data in the database.
 Facilitates Database Maintenance: The canonical cover makes it
easier to modify, update, and delete data in the database by reducing
the number of dependencies that need to be considered.
Conclusion
Using a canonical cover for a set of functional dependencies is essential for
optimizing database management systems. It simplifies and minimizes the
dependencies while preserving their properties, reducing computational
overhead, and improving efficiency. By reducing, eliminating, and minimizing
dependencies, the canonical cover creates a minimal, unique, and accurate
representation of the original set. It helps to reduce data redundancy, improves
query performance, and makes database maintenance easier.

Multivalued Dependency

Introduction of 4th and 5th Normal Form in DBMS


Two of the highest levels of database normalization are the fourth normal form
(4NF) and the fifth normal form (5NF). Multivalued dependencies are handled by
4NF, whereas join dependencies are handled by 5NF.
If two or more independent relations are kept in a single relation or we can say
multivalue dependency occurs when the presence of one or more rows in a table
implies the presence of one or more other rows in that same table. Put another
way, two attributes (or columns) in a table are independent of one another, but
both depend on a third attribute. A multivalued dependency always requires at
least three attributes because it consists of at least two attributes that are
dependent on a third.
For a dependency A -> B, if for a single value of A, multiple values of B exist,
then the table may have a multi-valued dependency. The table should have at
least 3 attributes and B and C should be independent for A ->> B multivalued
dependency.
Example:
Person Mobile Food_Likes

Mahesh 9893/9424 Burger/Pizza

Ramesh 9191 Pizza

Person->-> mobile,
Person ->-> food_likes
This is read as “person multi determines mobile” and “person multi determines
food_likes.”
Note that a functional dependency is a special case of multivalued dependency.
In a functional dependency X -> Y, every x determines exactly one y, never more
than one.
Fourth Normal Form (4NF)
The Fourth Normal Form (4NF) is a level of database normalization where there
are no non-trivial multivalued dependencies other than a candidate key. It builds
on the first three normal forms (1NF, 2NF, and 3NF) and the Boyce-Codd Normal
Form (BCNF). It states that, in addition to a database meeting the requirements
of BCNF, it must not contain more than one multivalued dependency.
Properties
A relation R is in 4NF if and only if the following conditions are satisfied:
1. It should be in the Boyce-Codd Normal Form (BCNF).
2. The table should not have any Multi-valued Dependency.
A table with a multivalued dependency violates the normalization standard of
the Fourth Normal Form (4NF) because it creates unnecessary redundancies and
can contribute to inconsistent data. To bring this up to 4NF, it is necessary to
break this information into two tables.
Example: Consider the database table of a class that has two relations R1
contains student ID(SID) and student name (SNAME) and R2 contains course
id(CID) and course name (CNAME).
Table R1
SID SNAME

S1 A

S2 B

Table R2
CID CNAME

C1 C

C2 D

When their cross-product is done it resulted in multivalued dependencies.


Table R1 X R2
SID SNAME CID CNAME

S1 A C1 C

S1 A C2 D

S2 B C1 C

S2 B C2 D

Multivalued dependencies (MVD) are:


SID->->CID; SID->->CNAME; SNAME->->CNAME
Join Dependency
Join decomposition is a further generalization of Multivalued dependencies. If the
join of R1 and R2 over C is equal to relation R then we can say that a
join dependency (JD) exists, where R1 and R2 are the decomposition R1(A, B, C)
and R2(C, D) of a given relations R (A, B, C, D). Alternatively, R1 and R2 are a
lossless decomposition of R. A JD ⋈ {R1, R2, …, Rn} is said to hold over a relation
R if R1, R2, ….., Rn is a lossless-join decomposition. The *(A, B, C, D), (C, D) will
be a JD of R if the join of joins attribute is equal to the relation R. Here, *(R1, R2,
R3) is used to indicate that relation R1, R2, R3 and so on are a JD of R. Let R is a
relation schema R1, R2, R3……..Rn be the decomposition of R. r( R ) is said to
satisfy join dependency if and only if

Joint Dependency

Example:
Table R1
Company Product

C1 Pendrive

C1 mic

C2 speaker

C2 speaker

Company->->Product
Table R2
Agent Company

Aman C1

Aman C2

Mohan C1

Agent->->Company
Table R3
Agent Product

Aman Pendrive

Aman Mic

Aman speaker

Mohan speaker

Agent->->Product
Table R1⋈R2⋈R3
Company Product Agent

C1 Pendrive Aman

C1 mic Aman

C2 speaker speaker

C1 speaker Aman

Agent->->Product
Fifth Normal Form/Projected Normal Form (5NF)
A relation R is in Fifth Normal Form if and only if everyone joins dependency in R
is implied by the candidate keys of R. A relation decomposed into two relations
must have lossless join Property, which ensures that no spurious or extra tuples
are generated when relations are reunited through a natural join.
Properties
A relation R is in 5NF if and only if it satisfies the following conditions:
1. R should be already in 4NF.
2. It cannot be further non loss decomposed (join dependency).
Example – Consider the above schema, with a case as “if a company makes a
product and an agent is an agent for that company, then he always sells that
product for the company”. Under these circumstances, the ACP table is shown
as:
Table ACP
Agent Company Product

A1 PQR Nut

A1 PQR Bolt

A1 XYZ Nut

A1 XYZ Bolt

A2 PQR Nut

The relation ACP is again decomposed into 3 relations. Now, the natural Join of
all three relations will be shown as:
Table R1
Agent Company

A1 PQR

A1 XYZ

A2 PQR

Table R2
Agent Product

A1 Nut

A1 Bolt
Agent Product

A2 Nut

Table R3
Company Product

PQR Nut

PQR Bolt

XYZ Nut

XYZ Bolt

The result of the Natural Join of R1 and R3 over ‘Company’ and then the Natural
Join of R13 and R2 over ‘Agent’and ‘Product’ will be Table ACP.
Hence, in this example, all the redundancies are eliminated, and the
decomposition of ACP is a lossless join decomposition. Therefore, the relation is
in 5NF as it does not violate the property of lossless join.
Conclusion
 Multivalued dependencies are removed by 4NF, and join dependencies
are removed by 5NF.
 The greatest degrees of database normalization, 4NF and 5NF, might
not be required for every application.
 Normalizing to 4NF and 5NF might result in more
complicated database structures and slower query speed, but it can
also increase data accuracy, dependability, and simplicity.
Extra

Anomalies in Relational Model


Anomalies in the relational model refer to inconsistencies or errors that can arise
when working with relational databases, specifically in the context of data
insertion, deletion, and modification. There are different types of anomalies that
can occur in referencing and referenced relations which can be discussed as:
These anomalies can be categorized into three types:
 Insertion Anomalies
 Deletion Anomalies
 Update Anomalies.
How Are Anomalies Caused in DBMS?
Anomalies in DBMS are caused by poor management of storing everything in the
flat database, lack of normalization, data redundancy, and improper use of
primary or foreign keys. These issues result in inconsistencies during insert,
update, or delete operations, leading to data integrity problems. The three
primary types of anomalies are:
 Insertion Anomalies: These anomalies occur when it is not possible to
insert data into a database because the required fields are missing or
because the data is incomplete. For example, if a database requires
that every record has a primary key, but no value is provided for a
particular record, it cannot be inserted into the database.
 Deletion anomalies: These anomalies occur when deleting a record
from a database and can result in the unintentional loss of data. For
example, if a database contains information about customers and
orders, deleting a customer record may also delete all the orders
associated with that customer.
 Update anomalies: These anomalies occur when modifying data in a
database and can result in inconsistencies or errors. For example, if a
database contains information about employees and their salaries,
updating an employee’s salary in one record but not in all related
records could lead to incorrect calculations and reporting.
These anomalies can be removed with the process of Normalization, which
generally splits the database which results in reducing the anomalies in the
database.
STUDENT Table
STUD-
STUD_N STUD_NA STUD_PHO STUD_STA COUNT STUD_A
O ME NE TE RY GE

1 RAM 9716271721 Haryana India 20

2 RAM 9898291281 Punjab India 19

3 SUJIT 7898291981 Rajasthan India 18

4 SURESH Punjab India 21

Table 1
STUDENT_COURSE
STUD_NO COURSE_NO COURSE_NAME

1 C1 DBMS

2 C2 Computer Networks

1 C2 Computer Networks

Table 2
Insertion Anomaly: If a tuple is inserted in referencing relation and referencing
attribute value is not present in referenced attribute, it will not allow insertion in
referencing relation.
OR
An insertion anomaly occurs when adding a new row to a table leads to
inconsistencies.
Example: If we try to insert a record into the STUDENT_COURSE table
with STUD_NO = 7, it will not be allowed because there is no
corresponding STUD_NO = 7 in the STUDENT table.
Deletion and Updation Anomaly: If a tuple is deleted or updated from
referenced relation and the referenced attribute value is used by referencing
attribute in referencing relation, it will not allow deleting the tuple from
referenced relation.
Example: If we want to update a record from STUDENT_COURSE with
STUD_NO =1, We have to update it in both rows of the table. If we try to delete
a record from the STUDENT table with STUD_NO = 1, it will not be allowed
because there are corresponding records in the STUDENT_COURSE table
referencing STUD_NO = 1. Deleting the record would violate the foreign
key constraint, which ensures data consistency between the two tables.
To avoid this, the following can be used in query:
 ON DELETE/UPDATE SET NULL: If a tuple is deleted or updated from
referenced relation and the referenced attribute value is used by
referencing attribute in referencing relation, it will delete/update the
tuple from referenced relation and set the value of referencing
attribute to NULL.
 ON DELETE/UPDATE CASCADE: If a tuple is deleted or updated from
referenced relation and the referenced attribute value is used by
referencing attribute in referencing relation, it will delete/update the
tuple from referenced relation and referencing relation as well.
Removal of Anomalies
Anomalies in DBMS can be removed by applying normalization. Normalization
involves organizing data into tables and applying rules to ensure data is stored
in a consistent and efficient manner. By reducing data redundancy and ensuring
data integrity, normalization helps to eliminate anomalies and improve the
overall quality of the database
According to E.F.Codd, who is the inventor of the Relational Database, the goals
of Normalization include:
 It helps in vacating all the repeated data from the database.
 It helps in removing undesirable deletion, insertion, and update
anomalies.
 It helps in making a proper and useful relationship between tables.
Key steps include:
1. First Normal Form (1NF): Ensures each column contains atomic values
and removes repeating groups.
2. Second Normal Form (2NF): Eliminates partial dependencies by
ensuring all non-key attributes are fully dependent on the primary key.
3. Third Normal Form (3NF): Removes transitive dependencies by
ensuring non-key attributes depend only on the primary key.
By implementing these normalization steps, the database becomes more
structured, reducing the likelihood of insertion, update, and deletion anomalies.
Read more about Normal Forms in DBMS.
Conclusion
Ensuring data integrity requires addressing anomalies such as insertion, update,
and deletion problems in the Relational Model. By effectively arranging data,
normalization techniques offer a solution that guarantees consistency and
dependability in relational databases.

Types of Keys in Relational Model (Candidate,


Super, Primary, Alternate and Foreign)
Keys are one of the basic requirements of a relational database model. It is widely
used to identify the tuples(rows) uniquely in the table. We also use keys to set
up relations amongst various columns and tables of a relational database.
Why do we require Keys in a DBMS?
We require keys in a DBMS to ensure that data is organized, accurate, and easily
accessible. Keys help to uniquely identify records in a table, which prevents
duplication and ensures data integrity.
Keys also establish relationships between different tables, allowing for efficient
querying and management of data. Without keys, it would be difficult to retrieve
or update specific records, and the database could become inconsistent or
unreliable.
Different Types of Database Keys
Super Key
The set of one or more attributes (columns) that can uniquely identify a tuple
(record) is known as Super Key. For Example, STUD_NO, (STUD_NO,
STUD_NAME), etc.
 A super key is a group of single or multiple keys that uniquely
identifies rows in a table. It supports NULL values in rows.
 A super key can contain extra attributes that aren’t necessary for
uniqueness. For example, if the “STUD_NO” column can uniquely
identify a student, adding “SNAME” to it will still form a valid super
key, though it’s unnecessary.
Example:
Table STUDENT
STUD_NO SNAME ADDRESS PHONE

1 Shyam Delhi 123456789

2 Rakesh Kolkata 223365796

3 Suraj Delhi 175468965

Consider the table shown above.


STUD_NO+PHONE is a super key.

Relation between Primary Key, Candidate Key, and Super Key

Now Try Questions discussed in Number of possible Superkeys to test your


understanding.
Candidate Key
The minimal set of attributes that can uniquely identify a tuple is known as
a candidate key. For Example, STUD_NO in STUDENT relation.
 A candidate key is a minimal super key, meaning it can uniquely
identify a record but contains no extra attributes.
 It is a super key with no repeated data is called a candidate key.
 The minimal set of attributes that can uniquely identify a record.
 A candidate key must contain unique values, ensuring that no two
rows have the same value in the candidate key’s columns.
 Every table must have at least a single candidate key.
 A table can have multiple candidate keys but only one primary key.
Example:
STUD_NO is the candidate key for relation STUDENT.
Table STUDENT
STUD_NO SNAME ADDRESS PHONE

1 Shyam Delhi 123456789

2 Rakesh Kolkata 223365796

3 Suraj Delhi 175468965

 The candidate key can be simple (having only one attribute) or


composite as well.
Example:
{STUD_NO, COURSE_NO} is a composite
candidate key for relation STUDENT_COURSE.
Table STUDENT_COURSE
STUD_NO TEACHER_NO COURSE_NO

1 001 C001

2 056 C005

Primary Key
There can be more than one candidate key in relation out of which one can be
chosen as the primary key. For Example, STUD_NO, as well as STUD_PHONE,
are candidate keys for relation STUDENT but STUD_NO can be chosen as
the primary key (only one out of many candidate keys).
 A primary key is a unique key, meaning it can uniquely identify each
record (tuple) in a table.
 It must have unique values and cannot contain any duplicate values.
 A primary key cannot be NULL, as it needs to provide a valid, unique
identifier for every record.
 A primary key does not have to consist of a single column. In some
cases, a composite primary key (made of multiple columns) can be
used to uniquely identify records in a table.
 Databases typically store rows ordered in memory according to
primary key for fast access of records using primary key.
Example:
STUDENT table -> Student(STUD_NO, SNAME, ADDRESS, PHONE) , STUD_NO is a
primary key
Table STUDENT
STUD_NO SNAME ADDRESS PHONE

1 Shyam Delhi 123456789

2 Rakesh Kolkata 223365796

3 Suraj Delhi 175468965

Alternate Key
An alternate key is any candidate key in a table that is not chosen as
the primary key. In other words, all the keys that are not selected as the
primary key are considered alternate keys.
 An alternate key is also referred to as a secondary key because it can
uniquely identify records in a table, just like the primary key.
 An alternate key can consist of one or more columns (fields) that can
uniquely identify a record, but it is not the primary key
 Eg:- SNAME, and ADDRESS is Alternate keys
Example:
Consider the table shown above.
STUD_NO, as well as PHONE both,
are candidate keys for relation STUDENT but
PHONE will be an alternate key
(only one out of many candidate keys).
Primary Key, Candidate Key, and Alternate Key

Foreign Key
A foreign key is an attribute in one table that refers to the primary key in
another table. The table that contains the foreign key is called the referencing
table, and the table that is referenced is called the referenced table.
 A foreign key in one table points to the primary key in another table,
establishing a relationship between them.
 It helps connect two or more tables, enabling you to create
relationships between them. This is essential for maintaining data
integrity and preventing data redundancy.
 They act as a cross-reference between the tables.
 For example, DNO is a primary key in the DEPT table and a non-key in
EMP
Example:
Refer Table STUDENT shown above.
STUD_NO in STUDENT_COURSE is a
foreign key to STUD_NO in STUDENT relation.
Table STUDENT_COURSE
STUD_NO TEACHER_NO COURSE_NO

1 005 C001

2 056 C005

It may be worth noting that, unlike the Primary Key of any given relation, Foreign
Key can be NULL as well as may contain duplicate tuples i.e. it need not follow
uniqueness constraint. For Example, STUD_NO in the STUDENT_COURSE
relation is not unique. It has been repeated for the first and third tuples. However,
the STUD_NO in STUDENT relation is a primary key and it needs to be always
unique, and it cannot be null.
Relation between Primary Key and Foreign Key

Composite Key
Sometimes, a table might not have a single column/attribute that uniquely
identifies all the records of a table. To uniquely identify rows of a table, a
combination of two or more columns/attributes can be used. It still can give
duplicate values in rare cases. So, we need to find the optimal set of attributes
that can uniquely identify rows in a table.
 It acts as a primary key if there is no primary key in a table
 Two or more attributes are used together to make a composite key .
 Different combinations of attributes may give different accuracy in
terms of identifying the rows uniquely.
Example:
FULLNAME + DOB can be combined
together to access the details of a student.

Different Types of Keys

Conclusion
In conclusion, the relational model makes use of a number of keys: Candidate
keys allow for distinct identification, the Primary key serves as the chosen
identifier, Alternate keys offer other choices, and Foreign keys create vital
linkages that guarantee data integrity between tables. The creation of strong and
effective relational databases requires the thoughtful application of these keys.

What is Functional Dependency in DBMS?


Functional dependency in DBMS is an important concept that describes the
relationship between attributes (columns) in a table. It shows that the value of
one attribute determines the other. In this article, we will learn about functional
dependencies and their types. Functional dependencies help maintain the quality
of data in the database.
Suppose we have a student table with attributes: Stu_Id, Stu_Name, Stu_Age.
Here Stu_Id attribute uniquely identifies the Stu_Name attribute of student table
because if we know the student id we can tell the student name associated with
it. This is known as functional dependency and can be written as
Stu_Id→Stu_Name or in words we can say Stu_Name is functionally dependent
on Stu_Id.
Formally: If column A of a table uniquely identifies the column B of same table
then it can represented as A->B (Attribute B is functionally dependent on
attribute A).

Example:

A B

1 3

2 3

4 0

1 3

4 0
How to represent functional dependency in DBMS?
Functional dependency is expressed in the form of equations. For example, if we
have an employee record with fields "EmployeeID", "FirstName" and "LastName"
we can specify the function as follows:
EmployeeID -> FirstName, LastName
To represent functional dependency in DBMS has two main features: left (LHS)
and right (RHS) of the arrow (->).
For example, if we have a table with attributes "X", "Y" and "Z" and the attribute
"X" can determine the value of the attributes "Y" and "Z".
X -> Y, Z
This symbol indicates that the value in property "X" determines the values in
property "Y" and "Z". So if you know the value of "X", you can also determine the
value of "Y" and "Z".
Types of Functional Dependency in DBMS
The following are some important types of FDs in DBMS:
Trivial Functional Dependency
The dependency of an attribute on a set of attributes is known as trivial
functional dependency if the set of attributes includes that attribute.
Non-trivial Functional Dependency
If a functional dependency X→Y holds true where Y is not a subset of X then
this dependency is called non trivial Functional dependency.
Multivalued Dependency
A multivalued dependency happens when there are at least three attributes (let
us say X, Y and Z), and for a value of X there is a well defined set of values of Y
and a well defined set of values of Z. However, the set of values of Y is
independent of set Z and vice versa.
Semi Non Trivial Functional Dependencies
X -> Y is called semi non-trivial when X intersect Y is not NULL.
Transitive Functional Dependency
Transitive functional dependency in DBMS is the relationship between attributes
(columns) of a database table. This occurs when the price of one property
determines the price of another property through an intermediate (third) factor.
Please refer types of functional dependencies for more details.

Armstrong’s Axioms in Functional


Dependency
Reflexivity: If A is a set of attributes and B is a part of A, then the function A -> B
is valid.
Augmentation: If the A -> B dependency is valid, adding multiple elements to
either side of the dependency will not affect the dependency.
Transitivity: If the functions X → Y and Y → Z are both valid, then X → Z is also
valid according to the transitivity rule.
Read more about Armstrong’s Axioms in Functional Dependency in DBMS.
Benefits of Functional Dependency in DBMS
Functional dependency in a database management system offers several
advantages for businesses and organizations:
 Prevents Duplicate Data:
Functional dependency helps avoid storing the same data repeatedly
in the database, reducing redundancy and saving storage space.
 Improves Data Quality and Accuracy:
By organizing data efficiently and minimizing duplication, functional
dependency ensures the data is reliable, consistent, and of high
quality.
 Reduces Errors:
Keeping data organized and concise lowers the chances of errors in
records or datasets, making it easier to manage and update
information.
 Saves Time and Costs:
Properly organized data allows for quicker and easier access,
improving productivity and reducing the time and cost of managing
information.
 Defines Rules and Behaviors:
Functional dependency allows setting rules and constraints that
control how data is stored, accessed, and maintained, ensuring better
data management.
 Helps Identify Poor Database Design:
It highlights issues like scattered or missing data across tables, helping
identify and fix design flaws to maintain consistency and integrity.
Conclusion
Functional dependencies define how things are related to each other. It helps
maintain the quality of the information in the database. It is represented by an
arrow. Analyzing these relationships can simplify data creation, reduce
complexity, and improve query performance. These are the building blocks for
managing processes and reliable databases that are important for modern data
management and data access.

First Normal Form (1NF)


Normalization in database management is the process of organizing data to
minimize redundancy and dependency, ensuring efficiency, consistency, and
integrity. This involves structuring data into smaller, logically related tables and
defining relationships between them to streamline data storage and retrieval.
Normal Forms are a set of guidelines in database normalization that define how
to structure data in tables to reduce redundancy and improve integrity. Each
normal form builds on the previous one, progressively organizing data more
efficiently.
Levels of Normalization
There are various levels of normalization. These are some of them:
 First Normal Form (1NF)
 Second Normal Form (2NF)
 Third Normal Form (3NF)
 Boyce-Codd Normal Form (BCNF)
 Fourth Normal Form (4NF)
 Fifth Normal Form (5NF)
In this article, we will discuss the First Normal Form (1NF).
First Normal Form
If a relation contains a composite or multi-valued attribute, it violates the first
normal form, or the relation is in the first normal form if it does not contain any
composite or multi-valued attribute. A relation is in first normal form if every
attribute in that relation is single-valued attribute.
A table is in 1 NF if:
 There are only Single Valued Attributes.
 Attribute Domain does not change.
 There is a unique name for every Attribute/Column.
 The order in which data is stored does not matter.
Rules for First Normal Form (1NF) in DBMS
To follow the First Normal Form (1NF) in a database, these simple rules must
be followed:
1. Every Column Should Have Single Values
Each column in a table must contain only one value in a cell. No cell should
hold multiple values. If a cell contains more than one value, the table does not
follow 1NF.
 Example: A table with columns like [Writer 1], [Writer 2], and [Writer
3] for the same book ID is not in 1NF because it repeats the same type
of information (writers). Instead, all writers should be listed in separate
rows.
2. All Values in a Column Should Be of the Same Type
Each column must store the same type of data. You cannot mix different types
of information in the same column.
 Example: If a column is meant for dates of birth (DOB), you cannot use
it to store names. Each type of information should have its own
column.
3. Every Column Must Have a Unique Name
Each column in the table must have a unique name. This avoids confusion when
retrieving, updating, or adding data.
 Example: If two columns have the same name, the database system
may not know which one to use.
4. The Order of Data Doesn’t Matter
In 1NF, the order in which data is stored in a table doesn’t affect how the table
works. You can organize the rows in any way without breaking the rules.
Example:
Consider the below COURSES Relation :
 In the above table, Courses has a multi-valued attribute, so it is not in
1NF. The Below Table is in 1NF as there is no multi-valued attribute.

Conclusion
In Conclusion, First Normal Form (1NF) is a key idea in
relational database architecture. It guarantees that data is organized to facilitate
data processing, remove redundancy, and support data integrity. 1NF establishes
the foundation for more complex normalization strategies that further improve
the correctness and efficiency of database systems by imposing atomic values
and forbidding recurring groupings inside rows.
Second Normal Form (2NF)
Normalization is a structural method whereby tables are broken down in a
controlled manner with an aim of reducing data redundancy. It refers to the
process of arranging the attributes and relations of a database in order to
minimize data anomalies such as update, insert and delete anomalies.
Normalization is usually a sequence of steps which are also called normal forms
(NF). This step helps improve data integrity, minimize redundancy, and ensure
that your databases are both efficient and manageable.
What is Second Normal Form (2NF)?
Second Normal Form (2NF) is based on the concept of fully functional
dependency. It is a way to organize a database table so that it
reduces redundancy and ensures data consistency. For a table to be in 2NF, it
must first meet the requirements of First Normal Form (1NF), meaning all
columns should contain single, indivisible values without any repeating groups.
Additionally, the table should not have partial dependencies.
The primary goal of Second Normal Form is to eliminate partial dependencies.
A partial dependency happens when a non-prime attribute (an attribute not part
of a candidate key) depends on only a part of a composite primary key, rather
than on the entire key. Removing these partial dependencies helps in reducing
redundancy and preventing update anomalies.
Example of Second Normal Form (2NF)
Consider a table storing information about students, courses, and their fees:

 There are many courses having the same course fee. Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or
STUD_NO.
 COURSE_FEE together with STUD_NO cannot decide the value of
COURSE_NO.
 COURSE_FEE together with COURSE_NO cannot decide the value of
STUD_NO.
 The candidate key for this table is {STUD_NO, COURSE_NO} because the
combination of these two columns uniquely identifies each row in the
table.
 COURSE_FEE is a non-prime attribute because it is not part of the
candidate key {STUD_NO, COURSE_NO}.
 But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key.
 Therefore, Non-prime attribute COURSE_FEE is dependent on a proper
subset of the candidate key, which is a partial dependency and so this
relation is not in 2NF.
In 2NF, we eliminate such dependencies by breaking the table into two
separate tables:
1. A table that links students and courses.
2. A table that stores course fees.
Now, each table is in 2NF:
 The Course Table ensures that COURSE_FEE depends only on COURSE_NO.
 The Student-Course Table ensures there are no partial dependencies
because it only relates students to courses.
Now, the COURSE_FEE is no longer repeated in every row, and each table is free
from partial dependencies. This makes the database more efficient and easier to
maintain.
Why is 2NF Important?
By ensuring that a database table adheres to Second Normal Form, we achieve
several key benefits:
1. Reduces Redundancy: In our example, we no longer store the same course fee
multiple times. Instead, we store it once in the Course Fee table and reference it
in the Student-Course table.
2. Minimizes Update Anomalies: With data being centralized in the right tables,
you’re less likely to run into problems when you update or delete information.
For example, if a course fee changes, you only need to update it in one place.
3. Improves Data Integrity: By eliminating partial dependencies, 2NF ensures
that the database structure is logical, which in turn ensures that data
relationships are consistent.
4. Enhances Query Efficiency: Queries will be more efficient, as tables are
smaller and more focused on specific data, making it faster to retrieve the
necessary information.
What is Partial Dependency?
A functional dependency denoted as X→Y where X and Y are an attribute set of
a relation, is a partial dependency , if some attribute A∈X can be removed and
the dependency still holds. For example, if you have a functional dependency
X→Y, where X is a composite candidate key (made of multiple columns), and we
can remove one column from X, but the dependency still works, then it’s a partial
dependency.
In a composite key (a key made of multiple attributes), a partial dependency
happens when one of the non-prime attributes depends only on a part of the
composite key. Here’s how to identify partial dependencies in your database:
 Look for functional dependencies where one attribute depends on a
part of the primary key, not the entire key.
 If an attribute (like COURSE_FEE in our example) depends on just a part
of the key (COURSE_NO), it’s a partial dependency.
 To remove partial dependencies, break the table into smaller tables
that store only relevant data together.
Conclusion
In conclusion, Second Normal Form (2NF) helps make databases more
organized by removing partial dependencies. It reduces duplicate data, prevents
errors, and ensures data is stored accurately. Following 2NF makes it easier to
manage, update, and retrieve information from your database. Whether we’re
building a small application or a large enterprise system, following 2NF
principles will lead to better performance and data consistency.

Third Normal Form (3NF)


In database design, normalization is an important process to organize data,
reduce duplication, and improve accuracy. The Third Normal Form (3NF) builds
on the rules of the First (1NF) and Second (2NF) Normal Forms. Reaching 3NF
ensures that the database is well-structured, efficient, and free from data
issues or inconsistencies.
Even though tables in 2NF have less duplication than 1NF, they can still face
problems like update errors. For example, if only one row is updated while
another is not, the data becomes inconsistent. This happens due to transitive
dependencies. To solve this, we move the table to 3NF, which removes such
dependencies and makes the database more reliable.
Third Normal Form (3NF)
A relation is in the third normal form, if there is no transitive dependency for non-
prime attributes as well as it is in the second normal form. A relation is in 3NF if
at least one of the following conditions holds in every non-trivial function
dependency X –> Y.
 X is a super key.
 Y is a prime attribute (each element of Y is part of some candidate key).
In other words,
A relation that is in First and Second Normal Form and in which no non-
primary-key attribute is transitively dependent on the primary key, then it is in
Third Normal Form (3NF).
Note:
If A->B and B->C are two FDs then A->C is called transitive dependency.
The normalization of 2NF relations to 3NF involves the removal of transitive
dependencies. If a transitive dependency exists, we remove the transitively
dependent attribute(s) from the relation by placing the attribute(s) in a new
relation along with a copy of the determinant. Consider the examples given
below.
Example : Consider the below Relation,

In the relation CANDIDATE given above:


 Functional dependency Set: {CAND_NO -> CAND_NAME, CAND_NO -
>CAND_STATE, CAND_STATE -> CAND_COUNTRY, CAND_NO ->
CAND_AGE}
 So, Candidate key here would be: {CAND_NO}
 For the relation given here in the table, CAND_NO -> CAND_STATE
and CAND_STATE -> CAND_COUNTRY are actually true. Thus,
CAND_COUNTRY depends transitively on CAND_NO. This transitive
relation violates the rules of being in the 3NF. So, if we want to
convert it into the third normal form, then we have to decompose the
relation CANDIDATE (CAND_NO, CAND_NAME, CAND_STATE,
CAND_COUNTRY, CAND_AGE) as:
CANDIDATE (CAND_NO, CAND_NAME, CAND_STATE, CAND_AGE)
STATE_COUNTRY (STATE, COUNTRY).
Example 2: Consider Relation R(A, B, C, D, E)
A -> BC,
CD -> E,
B -> D,
E -> A
All possible candidate keys in above relation are {A, E, CD, BC} . All attribute
are on right sides of all functional dependencies are prime. Therefore, the above
Note:
Third Normal Form (3NF) is considered adequate for normal relational database
design because most of the 3NF tables are free of insertion, update, and deletion
anomalies. Moreover, 3NF always ensures functional dependency preserving
and lossless .
What is Transitive Dependency?
A transitive dependency occurs when a non-key attribute depends on the
another non-key attribute rather than directly on the primary key. For instance,
consider a table with the attributes (A, B, C) where A is the primary key and B
and C are non-key attributes. If B determines C then C is transitively dependent
on the A through B. This can lead to data anomalies and redundancy which 3NF
aims to eliminate by the ensuring that all non-key attributes depend only on the
primary key.
Conclusion
In conclusion, a crucial stage in database normalization is Third Normal Form
(3NF). It deals with transitive dependencies and improves data integrity through
effective information organization. 3NF ensures that non-key properties only
depend on the primary key , removing redundancy and helping to create a well-
organized and normalized relational database model .

Boyce-Codd Normal Form (BCNF)

While Third Normal Form (3NF) is generally sufficient for organizing relational
databases, it may not completely eliminate redundancy. Redundancy can still
occur if there’s a dependency X→X where X is not a candidate key. This issue is
addressed by a stronger normal form known as Boyce-Codd Normal Form
(BCNF).
Applying the rules of 2NF and 3NF can help identify some redundancies caused
by dependencies that violate candidate keys. However, even with these rules,
certain dependencies may still lead to redundancy in 3NF. To overcome this
limitation, BCNF was introduced by Codd in 1974 as a more robust solution.
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form (BCNF) is a stricter version of Third Normal Form
(3NF) that ensures a more simplified and efficient database design. It enforces
that every non-trivial functional dependency must have a superkey on its left-
hand side. This approach addresses potential issues with candidate keys and
ensures the database is free from redundancy.
BCNF eliminates redundancy more effectively than 3NF by strictly requiring
that all functional dependencies originate from super-keys.
BCNF is essential for good database schema design in higher-level systems
where consistency and efficiency are important, particularly when there are many
candidate keys (as one often finds with a delivery system).
Rules for BCNF
Rule 1: The table should be in the 3rd Normal Form.
Rule 2: X should be a super-key for every functional dependency (FD) X−>Y in a
given relation.
Note: To test whether a relation is in BCNF, we identify all the determinants and
make sure that they are candidate keys.
To determine the highest normal form of a given relation R with functional
dependencies, the first step is to check whether the BCNF condition holds. If R is
found to be in BCNF, it can be safely deduced that the relation is also
in 3NF, 2NF, and 1NF. The 1NF has the least restrictive constraint – it only
requires a relation R to have atomic values in each tuple. The 2NF has a slightly
more restrictive constraint.
The 3NF has a more restrictive constraint than the first two normal forms but is
less restrictive than the BCNF. In this manner, the restriction increases as we
traverse down the hierarchy.
We are going to discuss some basic examples which let you understand the
properties of BCNF. We will discuss multiple examples here.
Example 1
Consider a relation R with attributes (student, teacher, subject).
FD: { (student, Teacher) -> subject, (student, subject) -> Teacher,
(Teacher) -> subject}
 Candidate keys are (student, teacher) and (student, subject).
 The above relation is in 3NF (since there is no transitive dependency).
A relation R is in BCNF if for every non-trivial FD X->Y, X must be a
key.
 The above relation is not in BCNF, because in the FD (teacher-
>subject), teacher is not a key. This relation suffers with anomalies −
 For example, if we delete the student Tahira , we will also lose the
information that N.Gupta teaches C. This issue occurs because the
teacher is a determinant but not a candidate key.

R is divided into two relations R1(Teacher, Subject) and R2(Student, Teacher).


For more, refer to BCNF in DBMS.
How to Satisfy BCNF?
For satisfying this table in BCNF, we have to decompose it into further tables.
Here is the full procedure through which we transform this table into BCNF. Let
us first divide this main table into two tables Stu_Branch and Stu_Course Table.
Stu_Branch Table
Stu_ID Stu_Branch

101 Computer Science & Engineering

102 Electronics & Communication Engineering

Candidate Key for this table: Stu_ID.


Stu_Course Table
Stu_Course Branch_Number Stu_Course_No

DBMS B_001 201

Computer Networks B_001 202

VLSI Technology B_003 401

Mobile Communication B_003 402

Candidate Key for this table: Stu_Course.


Stu_Enroll Table
Stu_ID Stu_Course_No

101 201

101 202

102 401

102 402

Candidate Key for this table: {Stu_ID, Stu_Course_No}.


After decomposing into further tables, now it is in BCNF, as it is passing the
condition of Super Key, that in functional dependency X−>Y, X is a Super Key.
Example 3
Find the highest normal form of a relation R(A, B, C, D, E) with FD set as:
{ BC->D, AC->BE, B->E }
Explanation:
 Step-1: As we can see, (AC)+ ={A, C, B, E, D} but none of its subsets
can determine all attributes of the relation, So AC will be the
candidate key. A or C can’t be derived from any other attribute of the
relation, so there will be only 1 candidate key {AC}.
 Step-2: Prime attributes are those attributes that are part of candidate
key {A, C} in this example and others will be non-prime {B, D, E} in this
example.
 Step-3: The relation R is in 1st normal form as a relational DBMS does
not allow multi-valued or composite attributes.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is
not a proper subset of candidate key AC) and AC->BE is in 2nd normal form (AC
is candidate key) and B->E is in 2nd normal form (B is not a proper subset of
candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super
key nor D is a prime attribute) and in B->E (neither B is a super key nor E is a
prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be super
key or RHS should be a prime attribute. So the highest normal form of relation
will be the 2nd Normal form.
Note: A prime attribute cannot be transitively dependent on a key in BCNF
relation.
Consider these functional dependencies of some relation R
AB ->C
C ->B
AB ->B
From the above functional dependency, we get that the candidate key of R is AB
and AC. A careful observation is required to conclude that the above dependency
is a Transitive Dependency as the prime attribute B transitively depends on the
key AB through C. Now, the first and the third FD are in BCNF as they both
contain the candidate key (or simply KEY) on their left sides. The second
dependency, however, is not in BCNF but is definitely in 3NF due to the presence
of the prime attribute on the right side. So, the highest normal form of R is 3NF
as all three FDs satisfy the necessary conditions to be in 3NF.
Example 3
For example consider relation R(A, B, C)
A -> BC,
B -> A
A and B both are super keys so the above relation is in BCNF.
Note: BCNF decomposition may always not be possible with dependency
preserving, however, it always satisfies the lossless join condition. For example,
relation R (V, W, X, Y, Z), with functional dependencies:
V, W -> X
Y, Z -> X
W -> Y
It would not satisfy dependency preserving BCNF decomposition.
Note: Redundancies are sometimes still present in a BCNF relation as it is not
always possible to eliminate them completely.
There are also some higher-order normal forms, like the 4th Normal Form and
the 5th Normal Form.
For more, refer to the 4th and 5th Normal Forms.
Conclusion
In conclusion, we can say that Boyce-Codd Normal Form (BCNF) is very much
essential as far as database normalization are concerned which help us in doing
normalizing beyond the limits of 3NF. By making sure all functional
dependencies depend on super-keys, this is how BCNF helps us avoid
redundancy and update anomalies. This makes the BCNF a highly desirable
property and helps in achieving Data Integrity which is number one concern for
any Database Designer.

You might also like