Module 3 Relational Databse Design
Module 3 Relational Databse Design
• Partial Dependency
• Transitive Dependency
• Multi-valued Dependency (MVD): It's like saying, "When one thing happens, another thing can happen independently."
Example: Imagine you have a table with information about employees and their projects:
Now, let's talk about a candidate key - a unique identifier for each row.
In this case, let's consider the combination of StudentID and CourseID
as a candidate key.
A candidate key dependency would mean that an attribute depends on
the entire candidate key, not just part of it.
Now, let's talk about a full functional dependency. Let's say we have a
full functional dependency: {EmployeeID, ProjectID} -> ProjectName.
This means that the ProjectName depends on both the EmployeeID and
the ProjectID together, not just on EmployeeID or ProjectID separately.
student_dob.
• Inference Rules
• The fundamental axioms of Armstrong provide the basis for inference
rules.
• The Functional dependencies that are present in a relational database
are deduced using Armstrong's axioms.
• Inference rule can be taken to be as a kind of assertion.
• It can be used to derive additional functional dependencies from a set
of FDs.
• It can also be used to infer many functional dependencies in addition
to the ones already present, from the initial set.
• There are 6 inference rules present for functional dependency
IR1 - reflexive rule: according to this rule, if b(a set of attributes) is a subset of a(another set of attributes), then B is
held by A.
If a ⊇ b, then a → b
A B
Dog Dog
Cat Cat
Elephant *****
Lion
*****
Example:
Student ID Name Grade Set C: {Subject}
1. If X → Y and Y → Z then X → Z
Union Rule (IR4)
This rule is also known as the additive rule. if X determines Y
and X determines Z, then X also determines both Y and Z.
1. If X → Y and X → Z then X → YZ
Proof:
1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX =
X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
Decomposition Rule (IR5)
This rule is the reverse of the Union rule and is also known as
the project rule.
1. If X → YZ then X → Y and X → Z
Proof:
1. X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)
Pseudo transitive Rule (IR6)
In the pseudo transitive rule, if X determines Y, and YZ
determines W, then XZ also determines W.
1. If X → Y and YZ → W then XZ → W
Proof:
1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)
Anomalies
• What are the Anomalies in DBMS?
• Normalization is required to organise data in a database. If it is not
done, the overall data integrity in the database will deteriorate over
time. This is related to data abnormalities in particular. These DBMS
anomalies are common, and they result in data that doesn’t match with
what the real-world database claims to reflect.
A database anomaly is a fault in a database that usually emerges as a result of shoddy planning and storing
everything in a flat database. In most cases, this is removed through the normalization procedure, which involves
the joining and splitting of tables. The purpose of the normalization process is to minimise the negative impacts of
generating tables that would generate anomalies in the DB.
Example
Consider a manufacturing firm that keeps worker information in a table called employee, which has four columns:
w_id for the employee’s id, w_name for the employee’s name, w_address for the employee’s address, and w_dept for
the employee’s department. The table will look like this at some point:
The table above has not been normalized. We’ll look at the issues that arise when the table isn’t normalized.
Type of Anomalies in DBMS
Various types of anomalies can occur in a DB. For instance, redundancy anomalies are a very significant issue for tests if you’re a student, and for job interviews if
you’re searching for a job. But these can be easily identified and fixed. The following are actually the ones about which we should be worried:
1. Update
2. Insert
3. Delete
Anomalies in databases can be, thus, divided into three major categories:
Update Anomaly
Employee David has two rows in the table given above since he works in two different departments. If we want to change David’s address, we must do so in two rows,
else the data would become inconsistent.
If the proper address is updated in one of the departments but not in another, David will have two different addresses in the database, which is incorrect and leads to
inconsistent data.
Insert Anomaly
If a new worker joins the firm and is currently unassigned to any department, we will be unable to put the data into the table because the w_dept field does not allow
nulls.
Delete Anomaly
If the corporation closes the department F890 at some point in the future, deleting the rows with w_dept as F890 will also erase the information of employee Mike,
who is solely assigned to this department.
What is Normalization?
• Minimizing Errors: Redundancy can cause errors. For example, if Alice's name is misspelled in
one row, it should be corrected in every row. Normalization reduces the chance of these errors.
• Improving Efficiency: By organizing data smartly, normalization makes it easier to update, insert,
and delete information without introducing problems.
Goals of Normalization:
• Eliminate Redundancy: Store each piece of information in one place, avoiding unnecessary
repetition.
• Minimize Data Anomalies: Reduce the risk of errors, like conflicting information or incomplete
updates.
• Improve Data Integrity: Ensure that relationships between pieces of information are well-defined
and consistent.
Data Anomalies
• Data Anomalies: Data anomalies are problems or irregularities that can occur in a database when
it is not properly designed or organized.
• These anomalies can lead to inconsistencies, errors, and difficulties in managing and retrieving
information.
• There are three main types of data anomalies: insertion anomalies, update anomalies, and deletion
anomalies.
Insertion Anomalies:
•What: These occur when you try to add new information to the database, but you can't because of incomplete data.
•Example: In a table tracking students and their courses, if you can't add a new course without assigning it to a student,
you have an insertion anomaly.
StudentsCourses Table:
Now, let's say you want to add a new course to the database, but there's a requirement that every course must be
assigned to a student. This could lead to an insertion anomaly because you cannot add a new course without assigning it
to a student.
Update Anomalies:
•What: These happen when updating information in one place, but not updating it everywhere it needs to be updated.
•Example: If a student changes their name, and you update it in one record but forget to update it in all the records, you have an update
anomaly.
StudentsCourses Table:
suppose Alice decides to change her name from "Alice" to "Alicia." An update to one record may be straightforward:
However, if you forget to update all occurrences of Alice's name, you would have an update anomaly:
Deletion Anomalies:
•What: These occur when deleting information unintentionally removes other related information.
•Example: If removing a course deletes information about the instructor who teaches that course, you have a deletion anomaly.
StudentsCourses Table:
you want to remove a course from the table. However, if deleting a course also removes information about the students enrolled in that
course, you have a deletion anomaly.
Normalization is a database design technique that involves organizing tables and relationships in a relational database to reduce redundancy and
improve data integrity. There are several normal forms, each building on the previous one. The most commonly discussed normal forms are:
1.First Normal Form (1NF):
1. Ensures that the values in each column of a table are atomic (indivisible) and that there are no repeating groups or arrays. It deals with
basic structure issues.
2.Second Normal Form (2NF):
1. Builds on 1NF and eliminates partial dependencies. A table is in 2NF if it's in 1NF, and no non-prime attribute is dependent on only a part
of any candidate key.
3.Third Normal Form (3NF):
1. Extends 2NF by removing transitive dependencies. A table is in 3NF if it's in 2NF, and no transitive dependencies exist (non-prime
attributes are not dependent on other non-prime attributes).
4.Boyce-Codd Normal Form (BCNF):
1. A more advanced form that addresses certain anomalies not covered by 3NF. A table is in BCNF if, for every non-trivial functional
dependency, the determinant is a superkey.
5.Fourth Normal Form (4NF):
1. Focuses on multi-valued dependencies. A table is in 4NF if it's in BCNF, and multi-valued dependencies are removed.
6.Fifth Normal Form (5NF):
1. Addresses cases where there are join dependencies. A table is in 5NF if it's in 4NF, and join dependencies are removed.
1st NORMAL FORM
• Definition:
• A relation (or table) is considered to be in 1NF if it satisfies the
following conditions:
• Every attribute (column) in the relation must be a single-valued
attribute.
• The attribute domain (the set of possible values for an attribute)
remains consistent.
• Each attribute has a unique name within the relation.
• The order in which data is stored does not impact the validity of the
relation.
• Why Is 1NF Important?
• Ensures data integrity: By eliminating redundancy and ensuring
atomic values, 1NF facilitates data processing.
• Prevents insertion, deletion, and update anomalies: These anomalies
occur when data is not properly normalized.
• Example:
• Let’s consider a relation called STUDENT:
• It contains attributes like ID, Name, and Courses.
• The Courses attribute is multi-valued, violating
1NF. Decomposed Relation (in 1NF):
3. Composite keys (primary keys with multiple attributes) often require 2NF.
Example:
Imagine a table with student information:
STUDENT_NO | COURSE_NO | COURSE_FEE
1 | C1 | 1000
2 | C2 | 1500
1 | C4 | 2000
4 | C3 | 1000
4 | C1 | 1000
• A given relation is said to be in its third normal form when it’s in 2NF but has no transitive
partial dependency. Meaning, when no transitive dependency exists for the attributes that are
non-prime, then the relation can be said to be in 3NF.
• In simpler words,
• In a relation that is in 1NF or 2NF, when none of the non-primary key attributes transitively
depend on their primary keys, then we can say that the relation is in the third normal form of
3NF.
• Rules Followed in 3rd Normal Form in DBMS
• We can say that a relation is in the third normal form when it holds
any of these given conditions in case of a functional dependency P ->
Q that is non-trivial:
• P acts as a super key.
• Q acts as a non-prime attribute. Meaning, every element of Q forms a
part of a candidate key.
• Example:
• Consider a student table:
STUDENT_NO | STUD_NAME | STUD_STATE | STUD_COUNTRY | STUD_AGE
1 | Alice | CA | USA | 20
2 | Bob | NY | USA | 22
3 | Carol | TX | USA | 21
• BCNF (Boyce Codd Normal Form) is an advanced version of the third normal
form (3NF), and often, it is also known as the 3.5 Normal Form. 3NF doesn't
remove 100% redundancy in the cases where for a functional
dependency (say, A->B), A is not the candidate key of the table. To deal with
such situations, BCNF was introduced.
• BCNF is based on functional dependencies, and all the candidate keys of the
relation are taken into consideration. BCNF is stricter than 3NF and has some
additional constraints along with the general definition of 3NF.
• A table or relation is said to be in BCNF in DBMS if the table or the
relation is already in 3NF, and also, for every functional dependency
(let's say, X->Y), X is either the super key or the candidate key. In
simple terms, for any case (let's say, X->Y), X can't be a non-prime
attribute.
• Rules for BCNF in DBMS
• A table or relation is said to be in BCNF (Boyce Codd Normal Form)
if it satisfies the following two conditions that we have already studied
in its definition:
• It should satisfy all the conditions of the Third Normal Form (3NF).
• For any functional dependency (A->B), A should be either the super
key or the candidate key. In simple words, it means that A can't be
a non-prime attribute if B is given as a prime attribute.
Example:
Let’s consider a student database:
Stu_ID | Stu_Branch | Stu_Course | Branch_Number | Stu_Course_No
101 | CS&E | DBMS | B_001 | 201
101 | CS&E | Comp Net | B_001 | 202
102 | ECE | VLSI Tech | B_003 | 401
102 | ECE | Mobile Comm| B_003 | 402
Functional Dependencies:
Stu_ID → Stu_Branch
Stu_Course → {Branch_Number, Stu_Course_No}
Candidate Keys: {Stu_ID, Stu_Course}
4th Normal Form
• A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
• For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation
will be a multi-valued dependency.
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and
two hobbies, Dancing and Singing.
So to make the above table into 4NF, we can decompose it into two tables:
5NF
• A relation is in 5NF if it is in 4NF and not contains any join dependency
and joining should be lossless.
• 5NF is satisfied when all the tables are broken into as many tables as
possible in order to avoid redundancy.
• 5NF is also known as Project-join normal form (PJ/NF).
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class
for Semester 2. In this case, combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking
that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary key,
so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
CLOSURE SET OF ATTRIBUTES
• The closure of a set of attributes in a database is all the other attributes that you can
figure out based on the given set of attributes.
• You use rules called functional dependencies to determine these additional
attributes.
• This helps in understanding how different attributes are related to each other in a
database table.
• Finding the closure of a set of attributes is needed for problems related to
NORMALIZATION.
• For example, you need to know how to compute the closure of a set of attributes to
check if a set of attributes is a candidate key or a superkey.
• You also need this algorithm to decompose non-normal tables into NORMAL
FORMS.
• The closure of a set of attributes X is the set of those attributes that can be
functionally determined from X. The closure of X is denoted as X+.
• The closure of X is the set of all attributes such that two records that have
the same value of X also have the same value of X+.
• Steps to Find Closure of an Attribute Set
• Following steps are followed to find the closure of an attribute set
• Step-01: Add the attributes contained in the attribute set for which
closure is being calculated to the result set.
• Step-02: Recursively add the attributes to the result set which can be
functionally determined from the attributes already contained in the
result set.