Normalization Databases
Normalization Databases
First Normal A relation is in first normal form if every attribute in that relation is
Form (1NF) single-valued attribute.
Boyce-Codd For BCNF the relation should satisfy the below conditions
Normal The relation should be in the 3rd Normal Form.
Form X should be a super-key for every functional dependency
(BCNF) (FD) X−>Y in a given relation.
Normal
Forms Description of Normal Forms
Student Table
GATE Questions
Q.1: Consider the relation scheme R = {E, F, G, H, I, J, K, L, M, N} and
the set of functional dependencies {{E, F} -> {G}, {F} -> {I, J}, {E, H} ->
{K, L}, K -> {M}, L -> {N} on R. What is the key for R? (GATE-CS-
2014)
A. {E, F}
B. {E, F, H}
C. {E, F, H, K, L}
D. {E}
Solution:
Finding attribute closure of all given options, we get:
{E,F}+ = {EFGIJ}
{E,F,H}+ = {EFHGIJKLMN}
{E,F,H,K,L}+ = {{EFHGIJKLMN}
{E}+ = {E}
{EFH}+ and {EFHKL}+ results in set of all attributes, but EFH is minimal. So it
will be candidate key. So correct option is (B).
42 abc CO A4
43 pqr IT A3
44 xyz CO A4
45 xyz IT A3
46 mno EC B2
47 jkl ME B2
From the above table we can conclude some valid functional dependencies:
roll_no → { name, dept_name, dept_building }→ Here, roll_no can
determine values of fields name, dept_name and dept_building, hence
a valid Functional dependency
roll_no → dept_name , Since, roll_no can determine whole set of
{name, dept_name, dept_building}, it can determine its subset
dept_name also.
dept_name → dept_building , Dept_name can identify the
dept_building accurately, since departments with different dept_name
will also have a different dept_building
More valid functional dependencies: roll_no → name, {roll_no, name}
⇢ {dept_name, dept_building}, etc.
Here are some invalid functional dependencies:
name → dept_name Students with the same name can have different
dept_name, hence this is not a valid functional dependency.
dept_building → dept_name There can be multiple departments in
the same building. Example, in the above table departments ME and
EC are in the same building B2, hence dept_building → dept_name is
an invalid functional dependency.
More invalid functional dependencies: name → roll_no, {name,
dept_name} → roll_no, dept_building → roll_no, etc.
Read more about What is Functional Dependency in DBMS ?
Types of Functional Dependencies in DBMS
1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
1. Trivial Functional Dependency
In Trivial Functional Dependency, a dependent is always a subset of the
determinant. i.e. If X → Y and Y is the subset of X, then it is called trivial functional
dependency.
Symbolically: A→B is trivial functional dependency if B is a subset of A.
The following dependencies are also trivial: A→A & B→B
Example 1 :
ABC -> AB
ABC -> A
ABC -> ABC
Example 2:
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18
42 abc 17
43 pqr 18
44 xyz 18
Functional Dependency:
{StudentID,CourseID}→CourseID
This is semi non-trivial because:
Part of the dependent attribute (Course_ID) is already included in the
determinant ({Student_ID, Course_ID}).
However, the dependency is not completely trivial because
{StudentID}→CourseID is not implied directly.
4. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent set are not
dependent on each other. i.e. If a → {b, c} and there exists no functional
dependency between b and c, then it is called a multivalued functional
dependency.
Example:
bike_model manuf_year color
In this table:
X: bike_model
Y: color
Z: manuf_year
For each bike model (bike_model):
1. There is a group of colors (color) and a group of manufacturing years
(manuf_year).
2. The colors do not depend on the manufacturing year, and the
manufacturing year does not depend on the colors. They are
independent.
3. The sets of color and manuf_year are linked only to bike_model.
That’s what makes it a multivalued dependency.
In this case these two columns are said to be multivalued dependent on
bike_model. These dependencies can be represented like this:
Read more about Multivalued Dependency in DBMS.
5. Transitive Functional Dependency
In transitive functional dependency, dependent is indirectly dependent on
determinant. i.e. If a → b & b → c, then according to axiom of transitivity, a → c.
This is a transitive functional dependency.
Example:
enrol_no name dept building_no
42 abc CO 4
enrol_no name dept building_no
43 pqr EC 2
44 xyz IT 1
45 abc EC 2
Here, enrol_no → dept and dept → building_no. Hence, according to the axiom
of transitivity, enrol_no → building_no is a valid functional dependency. This is
an indirect functional dependency, hence called Transitive functional
dependency.
6. Fully Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely
determines another attribute or set of attributes. If a relation R has attributes X,
Y, Z with the dependencies X->Y and X->Z which states that those dependencies
are fully functional.
Read more about Fully Functional Dependency.
7. Partial Functional Dependency
In partial functional dependency a non key attribute depends on a part of the
composite key, rather than the whole key. If a relation R has attributes X, Y, Z
where X and Y are the composite key and Z is non key attribute. Then X->Z is a
partial functional dependency in RBDMS.
Read more about Partial Dependency.
Conclusion
Functional dependency is very important concept in database management
system for ensuring the data consistency and accuracy. In this article we have
discuss what is the concept behind functional dependencies and why they are
important. The valid and invalid functional dependencies and the types of most
important functional dependencies in RDBMS. We have also discussed about the
advantages of FDs.
{B} Triviality
{B, D} B->D
{B, D, A} D->A
{B, D, A, C} AB->C
{B, D, A, C, E} C->E
We can find (C, D)+ by adding C and D into the set (triviality) and then
E using(C->E) and then A using (D->A) and the set becomes.
(C,D)+ = {C,D,E,A}
Similarly, we can find (B, C)+ by adding B and C into the set (triviality)
and then D using (B->D) and then E using (C->E), and then A using (D-
>A) and set becomes
(B,C)+ ={B,C,D,E,A}
Candidate Key
Candidate Key is a minimal set of attributes of a relationship that can be used
to identify a tuple uniquely. For Example, each tuple of EMPLOYEE relation given
in Table 1 can be uniquely identified by E-ID and it is minimal as well. So it will
be the Candidate key of the relationship.
A candidate key may or may not be a primary key. Super Key is a set of
attributes of a relationship that can be used to identify a tuple uniquely. For
Example, each tuple of EMPLOYEE relation given in Table 1 can be uniquely
identified by E-ID or (E-ID, E-NAME) or (E-ID, E-CITY) or (E-ID, E-STATE) or
(E_ID, E-NAME, E-STATE), etc. So all of these are super keys of EMPLOYEE
relation.
Note: A candidate key is always a super key but vice versa is not true.
Q.3 Finding Candidate Keys and Super Keys of a Relation using FD set.
The set of attributes whose attribute closure is a set of all attributes of the
relation is called the super key of the relation. For Example, the EMPLOYEE
relation shown in Table 1 has the following FD set. {E-ID->E-NAME, E-ID->E-
CITY, E-ID->E-STATE, E-CITY->E-STATE}. Let us calculate the attribute
closure of different sets of attributes:
(E-ID)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-ID,E-NAME)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-ID,E-CITY)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-ID,E-STATE)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-ID,E-CITY,E-STATE)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-NAME)+ = {E-NAME}
(E-CITY)+ = {E-CITY,E-STATE}
As (E-ID)+, (E-ID, E-NAME)+, (E-ID, E-CITY)+, (E-ID, E-STATE)+, (E-ID, E-CITY,
E-STATE)+ give set of all attributes of relation EMPLOYEE. So all of these are
super keys of relation.
The minimal set of attributes whose attribute closure is a set of all attributes of
relation is called the candidate key of the relation. As shown above, (E-ID)+ is a
set of all attributes of relation and it is minimal. So E-ID will be the candidate
key. On the other hand (E-ID, E-NAME)+ also is a set of all attributes but it is not
minimal because its subset (E-ID)+ is equal to the set of all attributes. So (E-ID,
E-NAME) is not a candidate key.
→ |A1 ∪ A2 ∪ A3| = |A1| + |A2| + |A3| – |A1 ∩ A2| – |A1 ∩ A3| – |A2 ∩ A3| +
|A1 ∩ A2 ∩ A3|
= (super keys possible with candidate key A1) + (super keys possible with
candidate key A2) + (super keys possible with candidate key A3) – (common
super keys from both A1 and A2) – (common super keys from both A1 and A3)
– (common super keys from both A2 and A3) + (common super keys from
both A1, A2, and A3)
= 2(n-1) + 2(n-1) + 2(n-1) – 2(n-2) – 2(n-2) – 2(n-2) + 2(n-3)
a1 b1 c1
a2 b2 c2
a3 b3 c3
Similarly, create certain tuples for E2:
D E F
d1 e1 f1
d2 e2 f2
d3 e3 f3
a1 d1
a1 d2
a2 d3
a1 b1 c1 d1 e1 f1
a1 b1 c1 d2 e2 f2
a2 b2 c2 d3 e3 f3
a1 b1 c1 d1
a1 b1 c1 d2
a2 b2 c2 d3
a3 b3 c3 NULL
d1 e1 f1 a1
d2 e2 f2 a1
d3 e3 f3 a2
On the same grounds, could you think why we allow merging the two entities as
well as relationships into 1 table when it is a 1:1 relationship? Simply, we would
not have a composite primary key there, so we will definitely have a primary key
with no NULL values present in it. Stress some more, why do we allow merging
the entities and relationship with both sides’ total participation? The reason is
even if we have a composite primary key for such a merged table, we are sure
that it will never have any NULL values for the primary key.
Note – You can follow the same procedure as stated above to establish all the
results.
Canonical Cover
Canonical Cover
A canonical cover is a set of functional dependencies that is equivalent to a given
set of functional dependencies but is minimal in terms of the number of
dependencies. Canonical Cover of functional dependency is also called minimal
set of functional dependency or irreducible form of functional dependency. The
process of finding the canonical cover of a set of functional dependencies
involves the following steps:
Step 1: Combine Functional Dependencies with the Same Left-Hand
Side
If two or more functional dependencies in F have the same left-hand
side, combine them into a single functional dependency by taking the
union of their right-hand sides.
Example:
o A→B and A→C become A→BC.
Multivalued Dependency
Person->-> mobile,
Person ->-> food_likes
This is read as “person multi determines mobile” and “person multi determines
food_likes.”
Note that a functional dependency is a special case of multivalued dependency.
In a functional dependency X -> Y, every x determines exactly one y, never more
than one.
Fourth Normal Form (4NF)
The Fourth Normal Form (4NF) is a level of database normalization where there
are no non-trivial multivalued dependencies other than a candidate key. It builds
on the first three normal forms (1NF, 2NF, and 3NF) and the Boyce-Codd Normal
Form (BCNF). It states that, in addition to a database meeting the requirements
of BCNF, it must not contain more than one multivalued dependency.
Properties
A relation R is in 4NF if and only if the following conditions are satisfied:
1. It should be in the Boyce-Codd Normal Form (BCNF).
2. The table should not have any Multi-valued Dependency.
A table with a multivalued dependency violates the normalization standard of
the Fourth Normal Form (4NF) because it creates unnecessary redundancies and
can contribute to inconsistent data. To bring this up to 4NF, it is necessary to
break this information into two tables.
Example: Consider the database table of a class that has two relations R1
contains student ID(SID) and student name (SNAME) and R2 contains course
id(CID) and course name (CNAME).
Table R1
SID SNAME
S1 A
S2 B
Table R2
CID CNAME
C1 C
C2 D
S1 A C1 C
S1 A C2 D
S2 B C1 C
S2 B C2 D
Joint Dependency
Example:
Table R1
Company Product
C1 Pendrive
C1 mic
C2 speaker
C2 speaker
Company->->Product
Table R2
Agent Company
Aman C1
Aman C2
Mohan C1
Agent->->Company
Table R3
Agent Product
Aman Pendrive
Aman Mic
Aman speaker
Mohan speaker
Agent->->Product
Table R1⋈R2⋈R3
Company Product Agent
C1 Pendrive Aman
C1 mic Aman
C2 speaker speaker
C1 speaker Aman
Agent->->Product
Fifth Normal Form/Projected Normal Form (5NF)
A relation R is in Fifth Normal Form if and only if everyone joins dependency in R
is implied by the candidate keys of R. A relation decomposed into two relations
must have lossless join Property, which ensures that no spurious or extra tuples
are generated when relations are reunited through a natural join.
Properties
A relation R is in 5NF if and only if it satisfies the following conditions:
1. R should be already in 4NF.
2. It cannot be further non loss decomposed (join dependency).
Example – Consider the above schema, with a case as “if a company makes a
product and an agent is an agent for that company, then he always sells that
product for the company”. Under these circumstances, the ACP table is shown
as:
Table ACP
Agent Company Product
A1 PQR Nut
A1 PQR Bolt
A1 XYZ Nut
A1 XYZ Bolt
A2 PQR Nut
The relation ACP is again decomposed into 3 relations. Now, the natural Join of
all three relations will be shown as:
Table R1
Agent Company
A1 PQR
A1 XYZ
A2 PQR
Table R2
Agent Product
A1 Nut
A1 Bolt
Agent Product
A2 Nut
Table R3
Company Product
PQR Nut
PQR Bolt
XYZ Nut
XYZ Bolt
The result of the Natural Join of R1 and R3 over ‘Company’ and then the Natural
Join of R13 and R2 over ‘Agent’and ‘Product’ will be Table ACP.
Hence, in this example, all the redundancies are eliminated, and the
decomposition of ACP is a lossless join decomposition. Therefore, the relation is
in 5NF as it does not violate the property of lossless join.
Conclusion
Multivalued dependencies are removed by 4NF, and join dependencies
are removed by 5NF.
The greatest degrees of database normalization, 4NF and 5NF, might
not be required for every application.
Normalizing to 4NF and 5NF might result in more
complicated database structures and slower query speed, but it can
also increase data accuracy, dependability, and simplicity.
Extra
Table 1
STUDENT_COURSE
STUD_NO COURSE_NO COURSE_NAME
1 C1 DBMS
2 C2 Computer Networks
1 C2 Computer Networks
Table 2
Insertion Anomaly: If a tuple is inserted in referencing relation and referencing
attribute value is not present in referenced attribute, it will not allow insertion in
referencing relation.
OR
An insertion anomaly occurs when adding a new row to a table leads to
inconsistencies.
Example: If we try to insert a record into the STUDENT_COURSE table
with STUD_NO = 7, it will not be allowed because there is no
corresponding STUD_NO = 7 in the STUDENT table.
Deletion and Updation Anomaly: If a tuple is deleted or updated from
referenced relation and the referenced attribute value is used by referencing
attribute in referencing relation, it will not allow deleting the tuple from
referenced relation.
Example: If we want to update a record from STUDENT_COURSE with
STUD_NO =1, We have to update it in both rows of the table. If we try to delete
a record from the STUDENT table with STUD_NO = 1, it will not be allowed
because there are corresponding records in the STUDENT_COURSE table
referencing STUD_NO = 1. Deleting the record would violate the foreign
key constraint, which ensures data consistency between the two tables.
To avoid this, the following can be used in query:
ON DELETE/UPDATE SET NULL: If a tuple is deleted or updated from
referenced relation and the referenced attribute value is used by
referencing attribute in referencing relation, it will delete/update the
tuple from referenced relation and set the value of referencing
attribute to NULL.
ON DELETE/UPDATE CASCADE: If a tuple is deleted or updated from
referenced relation and the referenced attribute value is used by
referencing attribute in referencing relation, it will delete/update the
tuple from referenced relation and referencing relation as well.
Removal of Anomalies
Anomalies in DBMS can be removed by applying normalization. Normalization
involves organizing data into tables and applying rules to ensure data is stored
in a consistent and efficient manner. By reducing data redundancy and ensuring
data integrity, normalization helps to eliminate anomalies and improve the
overall quality of the database
According to E.F.Codd, who is the inventor of the Relational Database, the goals
of Normalization include:
It helps in vacating all the repeated data from the database.
It helps in removing undesirable deletion, insertion, and update
anomalies.
It helps in making a proper and useful relationship between tables.
Key steps include:
1. First Normal Form (1NF): Ensures each column contains atomic values
and removes repeating groups.
2. Second Normal Form (2NF): Eliminates partial dependencies by
ensuring all non-key attributes are fully dependent on the primary key.
3. Third Normal Form (3NF): Removes transitive dependencies by
ensuring non-key attributes depend only on the primary key.
By implementing these normalization steps, the database becomes more
structured, reducing the likelihood of insertion, update, and deletion anomalies.
Read more about Normal Forms in DBMS.
Conclusion
Ensuring data integrity requires addressing anomalies such as insertion, update,
and deletion problems in the Relational Model. By effectively arranging data,
normalization techniques offer a solution that guarantees consistency and
dependability in relational databases.
1 001 C001
2 056 C005
Primary Key
There can be more than one candidate key in relation out of which one can be
chosen as the primary key. For Example, STUD_NO, as well as STUD_PHONE,
are candidate keys for relation STUDENT but STUD_NO can be chosen as
the primary key (only one out of many candidate keys).
A primary key is a unique key, meaning it can uniquely identify each
record (tuple) in a table.
It must have unique values and cannot contain any duplicate values.
A primary key cannot be NULL, as it needs to provide a valid, unique
identifier for every record.
A primary key does not have to consist of a single column. In some
cases, a composite primary key (made of multiple columns) can be
used to uniquely identify records in a table.
Databases typically store rows ordered in memory according to
primary key for fast access of records using primary key.
Example:
STUDENT table -> Student(STUD_NO, SNAME, ADDRESS, PHONE) , STUD_NO is a
primary key
Table STUDENT
STUD_NO SNAME ADDRESS PHONE
Alternate Key
An alternate key is any candidate key in a table that is not chosen as
the primary key. In other words, all the keys that are not selected as the
primary key are considered alternate keys.
An alternate key is also referred to as a secondary key because it can
uniquely identify records in a table, just like the primary key.
An alternate key can consist of one or more columns (fields) that can
uniquely identify a record, but it is not the primary key
Eg:- SNAME, and ADDRESS is Alternate keys
Example:
Consider the table shown above.
STUD_NO, as well as PHONE both,
are candidate keys for relation STUDENT but
PHONE will be an alternate key
(only one out of many candidate keys).
Primary Key, Candidate Key, and Alternate Key
Foreign Key
A foreign key is an attribute in one table that refers to the primary key in
another table. The table that contains the foreign key is called the referencing
table, and the table that is referenced is called the referenced table.
A foreign key in one table points to the primary key in another table,
establishing a relationship between them.
It helps connect two or more tables, enabling you to create
relationships between them. This is essential for maintaining data
integrity and preventing data redundancy.
They act as a cross-reference between the tables.
For example, DNO is a primary key in the DEPT table and a non-key in
EMP
Example:
Refer Table STUDENT shown above.
STUD_NO in STUDENT_COURSE is a
foreign key to STUD_NO in STUDENT relation.
Table STUDENT_COURSE
STUD_NO TEACHER_NO COURSE_NO
1 005 C001
2 056 C005
It may be worth noting that, unlike the Primary Key of any given relation, Foreign
Key can be NULL as well as may contain duplicate tuples i.e. it need not follow
uniqueness constraint. For Example, STUD_NO in the STUDENT_COURSE
relation is not unique. It has been repeated for the first and third tuples. However,
the STUD_NO in STUDENT relation is a primary key and it needs to be always
unique, and it cannot be null.
Relation between Primary Key and Foreign Key
Composite Key
Sometimes, a table might not have a single column/attribute that uniquely
identifies all the records of a table. To uniquely identify rows of a table, a
combination of two or more columns/attributes can be used. It still can give
duplicate values in rare cases. So, we need to find the optimal set of attributes
that can uniquely identify rows in a table.
It acts as a primary key if there is no primary key in a table
Two or more attributes are used together to make a composite key .
Different combinations of attributes may give different accuracy in
terms of identifying the rows uniquely.
Example:
FULLNAME + DOB can be combined
together to access the details of a student.
Conclusion
In conclusion, the relational model makes use of a number of keys: Candidate
keys allow for distinct identification, the Primary key serves as the chosen
identifier, Alternate keys offer other choices, and Foreign keys create vital
linkages that guarantee data integrity between tables. The creation of strong and
effective relational databases requires the thoughtful application of these keys.
Example:
A B
1 3
2 3
4 0
1 3
4 0
How to represent functional dependency in DBMS?
Functional dependency is expressed in the form of equations. For example, if we
have an employee record with fields "EmployeeID", "FirstName" and "LastName"
we can specify the function as follows:
EmployeeID -> FirstName, LastName
To represent functional dependency in DBMS has two main features: left (LHS)
and right (RHS) of the arrow (->).
For example, if we have a table with attributes "X", "Y" and "Z" and the attribute
"X" can determine the value of the attributes "Y" and "Z".
X -> Y, Z
This symbol indicates that the value in property "X" determines the values in
property "Y" and "Z". So if you know the value of "X", you can also determine the
value of "Y" and "Z".
Types of Functional Dependency in DBMS
The following are some important types of FDs in DBMS:
Trivial Functional Dependency
The dependency of an attribute on a set of attributes is known as trivial
functional dependency if the set of attributes includes that attribute.
Non-trivial Functional Dependency
If a functional dependency X→Y holds true where Y is not a subset of X then
this dependency is called non trivial Functional dependency.
Multivalued Dependency
A multivalued dependency happens when there are at least three attributes (let
us say X, Y and Z), and for a value of X there is a well defined set of values of Y
and a well defined set of values of Z. However, the set of values of Y is
independent of set Z and vice versa.
Semi Non Trivial Functional Dependencies
X -> Y is called semi non-trivial when X intersect Y is not NULL.
Transitive Functional Dependency
Transitive functional dependency in DBMS is the relationship between attributes
(columns) of a database table. This occurs when the price of one property
determines the price of another property through an intermediate (third) factor.
Please refer types of functional dependencies for more details.
Conclusion
In Conclusion, First Normal Form (1NF) is a key idea in
relational database architecture. It guarantees that data is organized to facilitate
data processing, remove redundancy, and support data integrity. 1NF establishes
the foundation for more complex normalization strategies that further improve
the correctness and efficiency of database systems by imposing atomic values
and forbidding recurring groupings inside rows.
Second Normal Form (2NF)
Normalization is a structural method whereby tables are broken down in a
controlled manner with an aim of reducing data redundancy. It refers to the
process of arranging the attributes and relations of a database in order to
minimize data anomalies such as update, insert and delete anomalies.
Normalization is usually a sequence of steps which are also called normal forms
(NF). This step helps improve data integrity, minimize redundancy, and ensure
that your databases are both efficient and manageable.
What is Second Normal Form (2NF)?
Second Normal Form (2NF) is based on the concept of fully functional
dependency. It is a way to organize a database table so that it
reduces redundancy and ensures data consistency. For a table to be in 2NF, it
must first meet the requirements of First Normal Form (1NF), meaning all
columns should contain single, indivisible values without any repeating groups.
Additionally, the table should not have partial dependencies.
The primary goal of Second Normal Form is to eliminate partial dependencies.
A partial dependency happens when a non-prime attribute (an attribute not part
of a candidate key) depends on only a part of a composite primary key, rather
than on the entire key. Removing these partial dependencies helps in reducing
redundancy and preventing update anomalies.
Example of Second Normal Form (2NF)
Consider a table storing information about students, courses, and their fees:
There are many courses having the same course fee. Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or
STUD_NO.
COURSE_FEE together with STUD_NO cannot decide the value of
COURSE_NO.
COURSE_FEE together with COURSE_NO cannot decide the value of
STUD_NO.
The candidate key for this table is {STUD_NO, COURSE_NO} because the
combination of these two columns uniquely identifies each row in the
table.
COURSE_FEE is a non-prime attribute because it is not part of the
candidate key {STUD_NO, COURSE_NO}.
But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key.
Therefore, Non-prime attribute COURSE_FEE is dependent on a proper
subset of the candidate key, which is a partial dependency and so this
relation is not in 2NF.
In 2NF, we eliminate such dependencies by breaking the table into two
separate tables:
1. A table that links students and courses.
2. A table that stores course fees.
Now, each table is in 2NF:
The Course Table ensures that COURSE_FEE depends only on COURSE_NO.
The Student-Course Table ensures there are no partial dependencies
because it only relates students to courses.
Now, the COURSE_FEE is no longer repeated in every row, and each table is free
from partial dependencies. This makes the database more efficient and easier to
maintain.
Why is 2NF Important?
By ensuring that a database table adheres to Second Normal Form, we achieve
several key benefits:
1. Reduces Redundancy: In our example, we no longer store the same course fee
multiple times. Instead, we store it once in the Course Fee table and reference it
in the Student-Course table.
2. Minimizes Update Anomalies: With data being centralized in the right tables,
you’re less likely to run into problems when you update or delete information.
For example, if a course fee changes, you only need to update it in one place.
3. Improves Data Integrity: By eliminating partial dependencies, 2NF ensures
that the database structure is logical, which in turn ensures that data
relationships are consistent.
4. Enhances Query Efficiency: Queries will be more efficient, as tables are
smaller and more focused on specific data, making it faster to retrieve the
necessary information.
What is Partial Dependency?
A functional dependency denoted as X→Y where X and Y are an attribute set of
a relation, is a partial dependency , if some attribute A∈X can be removed and
the dependency still holds. For example, if you have a functional dependency
X→Y, where X is a composite candidate key (made of multiple columns), and we
can remove one column from X, but the dependency still works, then it’s a partial
dependency.
In a composite key (a key made of multiple attributes), a partial dependency
happens when one of the non-prime attributes depends only on a part of the
composite key. Here’s how to identify partial dependencies in your database:
Look for functional dependencies where one attribute depends on a
part of the primary key, not the entire key.
If an attribute (like COURSE_FEE in our example) depends on just a part
of the key (COURSE_NO), it’s a partial dependency.
To remove partial dependencies, break the table into smaller tables
that store only relevant data together.
Conclusion
In conclusion, Second Normal Form (2NF) helps make databases more
organized by removing partial dependencies. It reduces duplicate data, prevents
errors, and ensures data is stored accurately. Following 2NF makes it easier to
manage, update, and retrieve information from your database. Whether we’re
building a small application or a large enterprise system, following 2NF
principles will lead to better performance and data consistency.
While Third Normal Form (3NF) is generally sufficient for organizing relational
databases, it may not completely eliminate redundancy. Redundancy can still
occur if there’s a dependency X→X where X is not a candidate key. This issue is
addressed by a stronger normal form known as Boyce-Codd Normal Form
(BCNF).
Applying the rules of 2NF and 3NF can help identify some redundancies caused
by dependencies that violate candidate keys. However, even with these rules,
certain dependencies may still lead to redundancy in 3NF. To overcome this
limitation, BCNF was introduced by Codd in 1974 as a more robust solution.
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form (BCNF) is a stricter version of Third Normal Form
(3NF) that ensures a more simplified and efficient database design. It enforces
that every non-trivial functional dependency must have a superkey on its left-
hand side. This approach addresses potential issues with candidate keys and
ensures the database is free from redundancy.
BCNF eliminates redundancy more effectively than 3NF by strictly requiring
that all functional dependencies originate from super-keys.
BCNF is essential for good database schema design in higher-level systems
where consistency and efficiency are important, particularly when there are many
candidate keys (as one often finds with a delivery system).
Rules for BCNF
Rule 1: The table should be in the 3rd Normal Form.
Rule 2: X should be a super-key for every functional dependency (FD) X−>Y in a
given relation.
Note: To test whether a relation is in BCNF, we identify all the determinants and
make sure that they are candidate keys.
To determine the highest normal form of a given relation R with functional
dependencies, the first step is to check whether the BCNF condition holds. If R is
found to be in BCNF, it can be safely deduced that the relation is also
in 3NF, 2NF, and 1NF. The 1NF has the least restrictive constraint – it only
requires a relation R to have atomic values in each tuple. The 2NF has a slightly
more restrictive constraint.
The 3NF has a more restrictive constraint than the first two normal forms but is
less restrictive than the BCNF. In this manner, the restriction increases as we
traverse down the hierarchy.
We are going to discuss some basic examples which let you understand the
properties of BCNF. We will discuss multiple examples here.
Example 1
Consider a relation R with attributes (student, teacher, subject).
FD: { (student, Teacher) -> subject, (student, subject) -> Teacher,
(Teacher) -> subject}
Candidate keys are (student, teacher) and (student, subject).
The above relation is in 3NF (since there is no transitive dependency).
A relation R is in BCNF if for every non-trivial FD X->Y, X must be a
key.
The above relation is not in BCNF, because in the FD (teacher-
>subject), teacher is not a key. This relation suffers with anomalies −
For example, if we delete the student Tahira , we will also lose the
information that N.Gupta teaches C. This issue occurs because the
teacher is a determinant but not a candidate key.
101 201
101 202
102 401
102 402