DBMS Unti-4
DBMS Unti-4
Normalization
Database Normalization is a technique of organizing the data in the database.
Normalization is a systematic approach of decomposing tables to eliminate data
redundancy (repetition) and undesirable characteristics like Insertion, Update and
Deletion Anomalies. It is a multi-step process that puts data into tabular form, removing
duplicated data from the relation tables.
Purpose of Normalization
Normalization is the process of structuring and handling the relationship between
data to minimize redundancy in the relational table and avoid the unnecessary anomalies
properties from the database like insertion, update and delete. It helps to divide large
database tables into smaller tables and make a relationship between them. It can remove
the redundant data and ease to add, manipulate or delete table fields.
A normalization defines rules for the relational table as to whether it satisfies the
normal form. A normal form is a process that evaluates each relation against defined
criteria and removes the multivalued, joins, functional and trivial dependency from a
relation. If any data is updated, deleted or inserted, it does not cause any problem for
database tables and help to improve the relational table integrity and efficiency.
Objective of Normalization
It is used to remove the duplicate data and database anomalies from the relational
table.
It is helpful to divide the large database table into smaller tables and link them using
relationship.
There are three types of anomalies that occur when the database is not
normalized. These are – Insertion, update and deletion anomaly. Let’s take an example to
understand this.
The above table is not normalized. We will see the problems that we face when a table is
not normalized.
Update anomaly: In the above table we have two rows for employee Venkat as he
belongs to two departments of the company. If we want to update the address of Venkat
then we have to update the same in two rows or the data will become inconsistent. If the
correct address gets updated in one department but not in other then as per the
Insert anomaly: Suppose a new employee joins the company, who is under training
and currently not assigned to any department then we would not be able to insert the
data into the table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department
D890 then deleting the rows that are having emp_dept as D890 would also delete the
information of employee Bhanu since he is assigned only to this department.
So, we need to avoid these types of anomalies from the tables and maintain the integrity,
accuracy of the database table. Therefore, we use the normalization concept in the
database management system.
Example:
In this example, if we know the value of Employee number, we can obtain Employee
Name, city, salary, etc. By this, we can say that the city, Employee Name, and salary are
functionally depended on Employee number.
Transitivity rule: This rule is very much similar to the transitive rule in algebra if
x y holds and y z holds, then x z also holds. X y is called as functionally
that determines y.
There are mainly four types of Functional Dependency in DBMS. Following are the
types of Functional Dependencies in DBMS:
Multivalued Dependency
Trivial Functional Dependency
Non-Trivial Functional Dependency
Transitive Dependency
car_model maf_year
car_model colour
The Trivial dependency is a set of attributes which are called a trivial if the set of
attributes are included in that attribute.
For example:
Emp_id Emp_name
AS555 Harry
AS811 George
AS999 Kevin
But CEO is not a subset of Company, and hence it's non-trivial functional dependency.
Example:
Company CEO Age
Alibaba Jack Ma 54
{Company} {CEO} (If we know the company, we know its CEO's name)
{Company} {Age} should hold, that makes sense because if we know the company
name, we can know his age.
Note: You need to remember that transitive dependency can only occur in a relation of
three or more attributes.
Normal Forms
Normal forms are used to eliminate or reduce redundancy in database tables. Here
are the most commonly used normal forms:
It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.
First normal form disallows the multi-valued attribute, composite attribute, and
their combinations.
7272826385,
14 John UP
9064738238
7390372389,
12 Sam Punjab
8589830302
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
In the second normal form, all non-key attributes are fully functional dependent on
the primary key.
Example: Let's assume, a school can store the data of teachers and the subjects they
teach. In a school, a teacher can teach more than one subject.
TEACHER table
To convert the given table into 2NF, we decompose it into two tables:
83 38 47 English
83 Math
83 Computer
A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.
A relation is in third normal form if it holds atleast one of the following conditions for
every non-trivial function dependency X Y.
1. X is a super key.
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee,
they create a table named employee_details that looks like this:
To make this table complies with 3NF we have to break the table into two tables to
remove the transitive dependency:
Production and
1001 Austrian D001 200
planning
design and
1002 American D134 100
technical support
Purchasing
1002 American D134 600
department
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_id emp_dept
1001 stores
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.
Surrogate key
A surrogate key in DBMS is the key or can say a unique identifier that uniquely
identifies an object or an entity in their respective fields. A surrogate key is used for
representing existence for data analysis. It is the unique identifier in a database. It
represents an outside entity as a database object but is not visible to the user and
application. A surrogate key is also known by various other names, which are pseudo key,
technical key, synthetic key, arbitrary unique identifier, entity identifier and database
sequence number.
Let's implement an example to understand the working and role of a Surrogate key in
DBMS:
Track_item: An attribute holding the name of the item that is being tracked.
From the above table, we can see that the Key attribute of the Tracking_System
table is the Surrogate key because the value of the Key column is different for different
locations and id of the item.
A table is said to have multi-valued dependency, if the following conditions are true,
If all these conditions are true for any relation(table), it is said to have multi -valued
dependency.
Example
Below we have a college enrolment table with columns s_id, course and hobby.
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entities. Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and
Math and two hobbies, Dancing and Singing. So, there is a multi-valued dependency on
STU_ID, which leads to unnecessary repetition of data.
So, to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE STUDENT_HOBBY
21 Computer 21 Dancing
21 Math 21 Singing
34 Chemistry 34 Dancing
74 Biology 74 Cricket
59 Physics 59 Hockey
5NF is satisfied when all the tables are broken into as many tables as possible in
order to avoid redundancy.
Example
In the above table, John takes both Computer and Math class for Semester 1 but he
doesn't take Math class for Semester 2. In this case, combination of all these fields
required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and
who will be taking that subject so we leave Lecturer and Subject as NULL. But all three
columns together act as a primary key, so we can't leave other two columns blank.
So, to make the above table into 5NF, we can decompose it into three relations P1, P2 &
P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the decomposition of a
relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of information.
o Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies,
and redundancy.
Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same relation as it was
decomposed.
o The relation is said to be lossless decomposition if natural joins of all the decomposition give the original
relation.
Example:
EMPLOYEE_DEPARTMENT table:
EMPLOYEE table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation will look
like:
Employee ⋈ Department
Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a part of
R1 or R2 or must be derivable from the combination of functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The
relational R is decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD A->BC is
a part of relation R1(ABC).