Unit 4
Unit 4
Normalization is the process of minimizing redundancy in the Relation (Table) and avoids the unnecessary
anomalies(errors) from the database when we perform operations like Insertion, Update and Delete.
It helps to divide large database tables into smaller tables and make a relationship between them.
It can remove the redundant data and ease to add, manipulate or delete table fields.
It is a process that evaluates each relation against defined criteria and removes multi-valued dependencies, join
dependencies, functional and trivial dependencies from a Relation.
It ensures if any data is updated, deleted or inserted, it does not cause any problem for database tables and help
to improve the relational table's integrity and efficiency.
Objective of Normalization
1. It is used to remove the duplicate data and database anomalies from the relational table.
2. Normalization helps to reduce redundancy and complexity by examining new data types used in the
table.
3. It is helpful to divide the large database table into smaller tables and link them using relationship.
4. It avoids duplicate data or no repeating groups into a table.
5. It reduces the chances for anomalies to occur in a database.
Anomalies: Anomalies refers to the problems occurred after poorly planned and normalized databases where all
the data is stored in one table which is sometimes called a flat file database.
Types of Anomalies
Following are the types of anomalies that make the table inconsistency, loss of integrity, and redundant data.
1. Data redundancy occurs in a relational database when two or more rows or columns have the same
value or repetitive value leading to unnecessary utilization of the memory.
Student Table:
Student Table
For example, In the Student table, if we want to insert a new CourseID, we need to wait until the student
enrolled in a course. In this way, it is difficult to insert new record in the table. Hence, it is called insertion
anomalies.
Student Table
3. Update Anomalies: The anomaly occurs when duplicate data is updated only in one place and not in all
instances. Hence, it makes our data or table inconsistent state.
For example, there is a student 'James' who belongs to Student table. If we want to update the addressof the
Student, we need to update the same everywhere, where ever that student’s address exists; otherwise, the data
will be inconsistent. And it reflects the changes in a table with updated values where some of them will not
display updated values.
Student Table
Student Table
Functional Dependencies
In a relational database management, functional dependency is a concept that specifies the relationship
between two sets of attributes where one attribute determines the value of another attribute. It is denoted
as X → Y, where the attribute set on the left side of the arrow, X is called Determinant, and Y is called
the Dependent.
Types of Functional Dependencies in DBMS
1. Trivial functional dependency
42 abc 17
43 pqr 18
44 xyz 18
Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a subset of
determinant set {roll_no, name}.
Similarly, roll_no → roll_no is also an example of trivial functional dependency (self).
42 abc 17
43 pqr 18
44 xyz 18
Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a subset
of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial functional dependency,
since age is not a subset of {roll_no, name}
3. Multivalued Functional Dependency
In Multivalued functional dependency, attributes of the dependent set should not depend on each
other. i.e. If a → {b, c} and there exists no functional dependency between b and c, then it is called
a multivalued functional dependency.
For example,
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18
45 abc 19
1244 xyz IT 1
345 abc ME 2
Here, regd_no → dept and dept → block_no. Hence, according to the transitivity, regd_no → block_no is
a valid functional dependency. This is an indirect functional dependency, hence called Transitive functional
dependency.
5. Full Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely determines another attribute or set of
attributes. If a relation R has attributes W,X, Y, Z with the dependencies WX->Y and WX->Z which states
that those dependencies are fully functional. If W or X is removed Y and Z does not satisfy the functional
dependency rule.
6. Partial Functional Dependency
In partial functional dependency a non key attribute depends on a part of the composite key, rather than the
whole key. If a relation R has attributes W,X, Y, Z with the dependencies WX->Y and WX->Z , If W or X is
removed even though Y and Z does satisfy the functional dependency rule.
Normal Forms
Key: We must have a way to identify tuples within a given relation separately. Using the
attribute values, we can identify tuples individually. That attribute is called as ‘Key’
Key Attribute: It’s an attribute which involved in formation of Candidate key.
Non-Key Attribute: Means a key which does not involved in formation of Candidate key
Partial Key: Some part (some attributes) of the Candidate key is called as Partial key
Total Key: All Attributes of the Candidate key/key is called as Total key
Functional Dependency: If one attribute is depending on another attribute that is called
Functional Dependency (or) If one attribute determines another attribute then we can say that
it is functional dependency A->B. Here A is an attribute which determines attribute B
Total Dependency: If a non-key attribute is totally depending upon on Candidate key
attribute/s then it is called total dependency
Partial Dependency: If a non-key attribute is depending some part the Candidate key that is
said to be Partial Dependency
Decomposition: Dividing a big relation into smaller relations that smaller relations contain a
subset of attributes o main relation
After Decomposition:
Table which is in 1st Normal Form
Roll No Name Course Id Course Name Marks Percentage Grade
5402 Harsha 22100 DE 95 95 A
5403 Aditya 22111 SDS 96 96 A
5470 Rishi 22101 DMS 97 97 A
In the below table ‘Grade’ is depending upon ‘Marks’. Here both Marks and Grade are non-key attributes. So
decomposition is required.
Marks Grade
95 A
96 A
97 A
BCNF is a strict form of 3NF that ensures that each Determinant in a table must be a Candidate key. In other
words, BCNF ensures that every Non-key attribute must depend on the Candidate key only.
BCNF (Boyce-Codd Normal Form) is just an advanced version of Third Normal Form. Here we have some
additional rules than Third Normal Form. Basic condition for any relation to be in BCNF is that, it must be in
Third Normal Form.
In the above relation, S(Student), T(Tutor) collectively determine S, T, C(Course). So, S&T can be
considered as a Super key. S & T are prime attributes.
Here T->C it does not satisfy BCNF rule because T is not a Super Key. So, relation is not in BCNF.
Now we have to decompose relation in order to convert into BCNF.
Decomposed Relation:
Fourth Normal Form (4NF): 4NF is a further refinement of BCNF that ensures that a table does not
contain any Multi-Valued Dependencies.
Basic rules are for 4NF is mentioned below:
1. It must be in BCNF.
2. It does not have any multi-valued dependency.
A->(B,C) B and C must be individual
B must not dependent on C & C must not dependent on B, then we can say that the table is having Multi-
Valued Dependency.
Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend
on a third attribute.
A multivalued dependency consists of at least Two attributes that are dependent on a Third attribute that's why
it always requires at least three attributes.
Relation with Multi-Valued Dependency:
Any relation in order to be in the fifth normal form, it must satisfy the following conditions:
Example:-
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take
Math class for Semester 2.
In this case, combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will
be taking that subject so we leave Lecturer and Subject as NULL.
But all three columns together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
Summary of Normalization
Transformation Output Relation
Eliminate Composite attributes 1NF
Remove dependency of non-key attribute on part of a key-attribute 2NF
(Key-attribute might consists of more than one attribute)
Remove dependency of non-key attribute on other non-key attributes 3NF
(non-key attribute should depend on key-attribute only but not on any
other non-key attribute)
Determinant in a table must be a Candidate key BCNF
(Every non-key attribute must depend on the Candidate key)
Remove multi-valued dependency from relation by slitting relation 4NF
(In a relation there should not be more than one independent)
It should not have Join dependency 5NF
Lossless Join:
Lossless join decomposition is a decomposition of a relation R into relations R1, and R2 such that if we
perform a natural join of relation R1 and R2, it will return the original relation R. This is effective in
removing redundancy from databases while preserving the original data.
In other words, by Lossless decomposition, it becomes feasible to reconstruct the relation R from
decomposed tables R1 and R2 by using Joins.
This Lossless join property guarantees that the extra or less tuple generation problem does not occur and no
information is lost from the original relation during the decomposition.
• If we Union the sub Relation R1 and R2 then it must contain all the attributes that are available in the
original relation R before decomposition.
• Intersections of R1 and R2 cannot be Null. The sub relation must contain a common attribute. The
common attribute must contain unique data.
Here,
R = (A, B, C)
R1 = (A, B)
R2 = (A, C)
The relation R has three attributes A, B, and C. The relation R is decomposed into two relation R1 and R2.
R1 and R2 both have 2-2 attributes. The common attribute is A.
The Value in Column A must be unique. if it contains a duplicate value then the Lossless-join decomposition is
not possible.
The common attribute must be a super key of sub relations either R1 or R2.
A B C
1 2 1
2 2 2
3 3 2
After Decomposition:
A B A C
1 2 1 1
2 2 2 2
3 3 3 2
The goal is to improve the efficiency of the database by reducing redundancy and improving query
performance.
In this technique, the original relation is decomposed into smaller relations in such a way that the resulting
relations preserve the functional dependencies of the original relation.
This is important because if the decomposition results in losing any of the original functional dependencies, it
can lead to data inconsistencies and anomalies.
Dependency is an important constraint on the database. Every dependency must be satisfied by at least one
decomposed table.
If {A → B} holds, then two sets are functional dependent. and, it becomes more useful for checking the
dependency easily if both sets in a same relation.
This decomposition property can only be done by maintaining the functional dependency.
In this property, it allows to check the updates without computing the natural join of the database structure.
Surrogate Key
• Surrogate key: A column that is not generated from the data in the database is known as a surrogate key.
Rather, the DBMS generates a unique identifier. In database tables, surrogate keys are frequently utilized
as primary keys.
•
In case we do not have a natural primary key in a table, then we need to artificially create one key in order
to uniquely identify a row in the table, this key is called as the surrogate key or synthetic primary key of
the table.
Consider an example:
Suppose we have two tables of two different schools having the same column registration_no, name, and
percentage, each table having its own natural primary key, that is registration_no.
Chaitanya
R.No Name Marks Percentage Address
1 A. BHARGAV 945 94 Gwk
2 A. PRAVEEN 900 90 Akp
3 A. RAJESH 930 93 NAD
4 J. LIKITHA 850 85 Dvd
5 J.KRISHNA 800 80 Hyd
6 K. ASHOK 870 87 Rjy
7 M SRIPRIYA 960 96 Kkd
8 M GANESH 880 88 Vzm
9 A HARSHITHA 750 75 Sklm
10 D LAKSHMI 780 78 Tuni
11 K JAGADISH 810 81 Ylm
Narayana
R.No Name Marks Percentage Address
1 J. LIKITHA 945 94 Gwk
2 J.KRISHNA 900 90 Akp
3 K JAGADISH 930 93 NAD
4 D LAKSHMI 850 85 Dvd
5 M SRIPRIYA 800 80 Hyd
6 M GANESH 870 87 Rjy
7 A. PRAVEEN 960 96 Kkd
8 A. BHARGAV 880 88 Vzm
9 A. RAJESH 750 75 Sklm
10 K. ASHOK 780 78 Tuni
11 A HARSHITHA 810 81 Ylm
2019_X_Class_Results
Surr.No R.No Name Marks Percentage Address
S1 1 A. BHARGAV 945 94 Gwk
S2 2 A. PRAVEEN 900 90 Akp
S3 3 A. RAJESH 930 93 NAD
S4 4 J. LIKITHA 850 85 Dvd
S5 5 J.KRISHNA 800 80 Hyd
S6 6 K. ASHOK 870 87 Rjy
S7 7 M SRIPRIYA 960 96 Kkd
S8 8 M GANESH 880 88 Vzm
S9 9 A HARSHITHA 750 75 Sklm
S10 10 D LAKSHMI 780 78 Tuni
S11 11 K JAGADISH 810 81 Ylm
S12 1 J. LIKITHA 945 94 Gwk
S13 2 J.KRISHNA 900 90 Akp
S14 3 K JAGADISH 930 93 NAD
S15 4 D LAKSHMI 850 85 Dvd
S16 5 M SRIPRIYA 800 80 Hyd
S17 6 M GANESH 870 87 Rjy
S18 7 A. PRAVEEN 960 96 Kkd
S19 8 A. BHARGAV 880 88 Vzm
S20 9 A. RAJESH 750 75 Sklm
S21 10 K. ASHOK 780 78 Tuni
S22 11 A HARSHITHA 810 81 Ylm
De-normalization
Student
Roll Name Age Branch_id
12 Raju 21 54
19 Ravi 21 05
Branch
Branch_id Branch_Name Total_Students
54 AI_DS 133
05 CSE 245
De-normalization is a database optimization technique in which we add redundant data to one or more tables.
This can help us avoid costly joins in a relational database.
Note that denormalization does not mean ‘reversing normalization’ or ‘not to normalize’. It is an
optimization technique that is applied after normalization.
Basically, The process of taking a normalized schema and making it non-normalized is called de-
normalization, and designers use it to tune the performance of systems to support time-critical operations.
Pros of De-normalization:
1. Retrieving data is faster since we do fewer joins
2. Queries to retrieve can be simpler (and therefore less likely to have bugs),
since we need to look at fewer tables.
Cons of De-normalization:
1. Updates and inserts are more expensive.
2. De-normalization can make update and insert code harder to write.
3. Data may be inconsistent.
4. Data redundancy necessitates more storage.
Case(i)
When we need student name with branch name then join operation has to be done
Case(ii)
If we need to modify all the student’s names, then it is possible because table size is small
If the table is big, then issues raise. It takes too long time on both the cases
In this case(i) we will update database with de-normalization, redundancy and extra effort to maximize the
benefits.
So we can add branch name data from branch table to student table
De-normalized table:
Student
Roll Name Age Branch_id Branch_Name
12 Raju 21 54 AI_DS
19 Ravi 21 05 CSE