Module3 Dbms
Module3 Dbms
NORMALIZATION
• Normalization is the process of removing redundant
data from your tables in order to improve storage
efficiency, data integrity and scalability.
• This improvement is balanced against an increase in
complexity and potential performance losses from the
joining of the normalized tables at query-time.
• There are two goals of the normalization process:
eliminating redundant data (for example, storing the
same data in more than one table) and ensuring data
dependencies make sense (only storing related data in
a table).
• Both of these are worthy goals as they reduce the
amount of space a database consumes and ensure
that data is logically stored.
Why We Need Normalization?
• Normalization is the aim of well design Relational Database
Management System (RDBMS). It is step by step set of rules
by which data is put in its simplest forms.
• We normalize the relational database management system
because of the following reasons:
.
Complex queries required by the user should be easy to
handle.
• On decomposition of a relation into smaller relations
with fewer attributes on normalization the resulting
relations whenever joined must result in the same
relation without any extra rows. The join operations
can be performed in any order. This is known as
Lossless Join decomposition.
• For instance, for the row that contains the course Visual Basic, we
fill in the remaining “blank” entries by copying the values of the
Course_Code, Course_Name and Teacher_Name columns. This
row has now a single value in each of its entries. We have repeated
a similar process for the students· of the remaining two courses
• The normalized representation of the
STUDENT table is:
• Second Approach: Decomposition of the table
• The second approach for normalizing a table
requires that the table be decomposed into two
new tables that will replace the original table.
• Decomposition of a relation involves separating
the attributes of the relation to create the
schemes of two new relations. However, before
decomposing the original table it is necessary to
identify an attribute or a set of its attributes that
can be used as table identifiers.
• Rule of decomposition
• One of the two tables contains the table
identifier of the original table and all the non-
repeating attributes.
• The other table contains a copy of the table
identifier and all the repeating attributes.
• To transform these tables in to relations, it
may be necessary to identify a Primary Key for
each table. The Tuples of the new relations are
the projection of the original relation into
their respective schemes
• To normalize the STUDENT table we need to
replace it by two new tables.
• The first table COURSE contains the table
identifier and the non-repeating groups. These
attributes are Course_Code (the table
identifier), Course_Name, anc Teacher_Name.
• The second table contains the table .identifier
and· all the repeating groups. Therefore, the
attributes of COURSE_STUDENT table are
Course_Code, RolIno, Name, System Used,
Hourly Rate and Total_Hrs.
Functional Dependency
• The set of all those attributes which can be
functionally determined from an attribute set
is called as a closure of that attribute set.
• Closure of attribute set {X} is denoted as {X}+.
Steps to Find Closure of an Attribute Set-
• Step-01:
• Add the attributes contained in the attribute set for
which closure is being calculated to the result set.
• Step-02:
• Recursively add the attributes to the result set which
can be functionally determined from the attributes
already contained in the result set.
Example
Consider a relation R ( A , B , C , D , E , F , G ) with
the functional dependencies-
• A → BC
• BC → DE
• D→F
• CF → G
Now, let us find the closure of some attributes
and attribute sets-
So, number of super keys possible = 2 x 2 x 2 x 2 = 16.
Thus, total number of super keys possible = 16.
• Total Number of Super Keys-
Step-01:
Write the given set of functional dependencies in such a way that each
functional dependency contains exactly one attribute on its right side.
Step-02:
Consider each functional dependency one by one from the set obtained in
Step-01.
Determine whether it is essential or non-essential.
To determine whether a functional dependency is essential or not, compute
the closure of its left side-
• Once by considering that the particular functional dependency is present
in the set
• Once by considering that the particular functional dependency is not
present in the set
Steps To Find Canonical Cover-
Case-01: Results Come Out to be Same-
• If results come out to be same,
• It means that the presence or absence of that functional dependency does
not create any difference.
• Thus, it is non-essential.
• Eliminate that functional dependency from the set.
Case-01: No-
There exists no functional dependency containing more than one attribute on its left side.
In this case, the set obtained in Step-02 is the canonical cover.
Case-01: Yes-
• There exists at least one functional dependency containing more than one attribute on its left side.
• In this case, consider all such functional dependencies one by one.
• Check if their left side can be reduced.
Use the following steps to perform a check-
1. Consider a functional dependency.
2. Compute the closure of all the possible subsets of the left side of that functional dependency.
3. If any of the subsets produce the same closure result as produced by the entire left side, then
replace the left side with that subset.
4. After this step is complete, the set obtained is the canonical cover.
Considering WZ🡪X
(WZ)+= {W,Z}
(WZ)+={W,Z,X,Y}
Ignoring WZ🡪X
(WZ)+={W,Z}
(WZ)+={W,Z,Y}
(WZ)+={W,Z,Y,X}
X🡪W
WZ🡪Y
Y🡪Z
Problem :Find the minimal cover of the set of functional dependencies given;
{A → BC, B → C, AB → D}
• Solution:
A🡪BC is decomposed into A🡪B and A🡪C
So the FD’s are
1:A🡪B
2:A🡪C
3:B🡪C
4:AB🡪D
1. A🡪 B
Considering FD A🡪B
1:A🡪B
2:A🡪C
3:B🡪C
4:AB🡪D
{A}+ ={A}
={A,B}
={A,B,D}--------(AB🡪D)
={A,B,C,D}---------(B-🡪C)
{A}+ ={A}
={A,C}------------(A🡪C)
{A}+ ={A}
={A,C}-------------(A🡪C)
={A,B,C}--------(A🡪B)
={A,B,C,D}---------(AB-🡪D)
{A}+ ={A}
={A,B}------------(A🡪B)
={A,B,C}--------(B🡪C)
={A,B,C,D}------(AB🡪D)
{B}+ ={B}
={B,C}--------(B🡪C)
{B}+ ={B}
{AB}+ ={A,B}
={A,B,D}--------(AB🡪D)
={A,B,D,C}------(B🡪C)
{AB}+ ={A,B}
={A,B,C} ---------B🡪C
The closure of A is
A+={A,B,C,D}
The closure of B is
B+={B,C,D}
Here the subset closure of A has result set equal to
Closure of AB.
Hence AB🡪 D can be replaced by A🡪D
and B-🡪 D can be eliminated
• Hence the Minimal cover of F is
1.A🡪B
2.B🡪C
4.A🡪D
Normal Forms
1NF Example
NOTE-
By default, every relation is in 1NF.
This is because formal definition of a relation states that value of all the
attributes must be atomic.
2NF
Example 2:
Example 3
• Candidate Keys:
{student_id, programming_language}
• Non-prime attribute: student_age
• One more important point to note here is, one professor teaches only one subject,
but one subject may have two different professors.
• This table satisfies the 1st Normal form because all the values are atomic, column
names are unique and all the values stored in a particular column are of same
domain.
• This table also satisfies the 2nd Normal Form as their is no Partial Dependency.
• And, there is no Transitive Dependency, hence the table also satisfies the 3rd
Normal Form.
• But this table is not in Boyce-Codd Normal Form
BCNF
BCNF
BCNF
Decomposition
• Decomposition is a process of dividing a single
relation into two or more sub relations.
• Decomposition of a relation can be completed
in the following two ways-
Decomposition
Lossless join decomposition
• Lossless join decomposition is also known
as non-additive join decomposition.
• This is because the resultant relation after
joining the sub relations is same as the
decomposed relation.
• No extraneous tuples appear after joining of
the sub-relations.
Lossy Join Decomposition
• In this extraneous tuples get introduced in the
natural join of the sub-relations.
• Extraneous tuples make the identification of
the original tuples difficult.
Problem-01:
A B C D
R1 α α α α
R2 ---- --- α α
Here the Row values of R1 are α
R2 α α α
R3 α α α
A B C D E
R1 α α α α
R2 α α α
R3 α α α
Here the none of the values of R1, R2 or R3 are
α
6.Closure(AC)={A,C,D}
Since D is not in R1 , remove D
Closure(AC)={A,C}
2.Closure(D)={D}
3.Closure(CD)={C,D}
C🡪D
Hence F2={C🡪D}
• In the original Relation dependency
{AB🡪C,C🡪D,D🡪A}
AB🡪 C is present in F1
C-->D is present in F2
D🡪A is not preserved.
F1 U F2 is a subset of F.
Hence given decomposition is not dependency
preserving.
Problem 2: Dependency Preserving
Consider R1(ABC)
1.Closure(A)={A,B,D}
={A,B,D,E}
={A,B}
Hence A🡪B
2.Closure(B)={B}
3.CLosure(C)={C}
4.Closure (AB)={A,B}
={A,B,D,E}
={A,B}
5.Closure (AC)={A,C}
={A,B,D,C}
={A,B,C}
AC🡪B
6.Closure(BC)={B,C}
={B,C,E}
={B.C}
Consider R2(A,D)
1. Closure(A)={A,B,D}
={A,D}
Hence A🡪D
2. Closure(D)={D}
3.Closure(AD)={A,B,D}
={A,D}
Consider R3(B,D,E)
1. Closure(B)={E}
Hence B🡪E
2. Closure(E)={E}
3.Closure(BD)={B,D}
={B,D,E}
Hence BD🡪E
4.Closure(DE)={D,E}
5.Closure(BE)={B,E}
• F1={A🡪B,AC🡪B}
• F2={A🡪D}
• F3={B🡪E,BD🡪E}
Consider original relation FDs
1.A🡪BD
2. B🡪E
1.A🡪BD is in F1(A—B)UF2(A🡪D)
2.B🡪E is in F3
Hence the given decomposition is dependency
preserving
Multi Valued Dependency
• A table is said to have multi-valued dependency, if the
following conditions are true,
1.For a dependency A → B, if for a single value of A,
multiple value of B exists, then the table may have multi-
valued dependency.
2.Also, a table should have at-least 3 columns for it to
have a multi-valued dependency.
3.And, for a relation R(A,B,C), if there is a multi-valued
dependency between, A and B, then B and C should be
independent of each other.
• If all these conditions are true for any relation(table), it
is said to have multi-valued dependency.
Multi Valued Dependency
Multi Valued Dependency
4NF
Rules for 4th Normal Form
For a table to satisfy the Fourth Normal Form, it
should satisfy the following two conditions:
1.It should be in the Boyce-Codd Normal Form.
2.And, the table should not have any Multi-
valued Dependency.
4NF
4NF
Join Dependency
• Join decomposition is a further generalization of Multivalued
dependencies.
• If the join of R1 and R2 over C is equal to relation R, then we
can say that a join dependency (JD) exists.
• Where R1 and R2 are the decompositions R1(A, B, C) and
R2(C, D) of a given relations R (A, B, C, D).
• Alternatively, R1 and R2 are a lossless decomposition of R.
• A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1,
R2,....., Rn is a lossless-join decomposition.
• The *(A, B, C, D), (C, D) will be a JD of R if the join of join's
attribute is equal to the relation R.
• Here, *(R1, R2, R3) is used to indicate that relation R1, R2,
R3 and so on are a JD of R.
5NF
• Fifth normal form (5NF) is also known as project-
join normal form (PJ/NF). It is designed to
minimize redundancy in relational databases by
separating semantically connected relationships in
multiple formats to store multi-valued facts.
• A relation R with attributes, its values and tuples
is in 5NF if and only if the following conditions are
satisfied,
• The relation R should be already in 4NF.
• The relation R cannot be additionally non loss
decomposed (join dependency).
5NF
5NF
• In the above table, John takes both Computer and
Math class for Semester 1 but he doesn't take Math
class for Semester 2. In this case, combination of all
these fields required to identify a valid data.
• Suppose we add a new Semester as Semester 3 but do
not know about the subject and who will be taking
that subject so we leave Lecturer and Subject as NULL.
But all three columns together acts as a primary key,
so we can't leave other two columns blank.
• So to make the above table into 5NF, we can
decompose it into three relations P1, P2 & P3:
First Normal Form-1NF
1NF
2NF
• A relation is in 2NF if it is in 1NF and every non-
key attribute is fully dependent on each candidate
key of the relation.
• That means in second normal form each table
have only one entity which uniquely identify other
entities.
• This particular entity contain only primary key
value. In another way we can say that if there is
more than one primary key then the table is
required to convert into second normal form.
2NF
• Example -
The "Office" table which shown in First Normal
Form is require to convert into Second Normal
Form.
• Functional Dependency in "Office" Table
(Department_id, Employee_id) →
(Department_name, Employee_name, Salary)
• Functional Dependecy
book_id → (Book_name, Author_name, Bookshelf_number,
Book_category)
• Transitive Dependecy
Bookshelf_number → Book_category
3NF
3NF
3NF
BCNF
• A table is in BCNF when every determinant in
the table is a candidate key.
BCNF
BCNF
4NF
• Fourth Normal Form is related to Multi-value Dependency. Under
fourth normal form, a record type should not contain two or more
independent multi-value facts about an entity. In addition the
record must satisfy third normal form.
A multi-value dependency exists when
• There are at least three attributes A, B and C in a relation.
• For each value of A there is a well-defined set of values for B, and
a well-defined set of values for C.
• The set of values of B is independent of set C.