0% found this document useful (0 votes)
2 views16 pages

Unit 4

Normalization is a process aimed at reducing redundancy and preventing anomalies in relational databases by organizing data into smaller, related tables. It addresses issues such as insert, update, and delete anomalies, and ensures data integrity through various normal forms, including 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF. Each normal form has specific criteria that must be met to achieve a structured and efficient database design.

Uploaded by

23l31a1263
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views16 pages

Unit 4

Normalization is a process aimed at reducing redundancy and preventing anomalies in relational databases by organizing data into smaller, related tables. It addresses issues such as insert, update, and delete anomalies, and ensures data integrity through various normal forms, including 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF. Each normal form has specific criteria that must be met to achieve a structured and efficient database design.

Uploaded by

23l31a1263
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Unit-IV

Normalization is the process of minimizing redundancy in the Relation (Table) and avoids the unnecessary
anomalies(errors) from the database when we perform operations like Insertion, Update and Delete.

It helps to divide large database tables into smaller tables and make a relationship between them.

It can remove the redundant data and ease to add, manipulate or delete table fields.

It is a process that evaluates each relation against defined criteria and removes multi-valued dependencies, join
dependencies, functional and trivial dependencies from a Relation.

It ensures if any data is updated, deleted or inserted, it does not cause any problem for database tables and help
to improve the relational table's integrity and efficiency.

Objective of Normalization
1. It is used to remove the duplicate data and database anomalies from the relational table.
2. Normalization helps to reduce redundancy and complexity by examining new data types used in the
table.
3. It is helpful to divide the large database table into smaller tables and link them using relationship.
4. It avoids duplicate data or no repeating groups into a table.
5. It reduces the chances for anomalies to occur in a database.

Anomalies: Anomalies refers to the problems occurred after poorly planned and normalized databases where all
the data is stored in one table which is sometimes called a flat file database.

Types of Anomalies

Following are the types of anomalies that make the table inconsistency, loss of integrity, and redundant data.

1. Data redundancy occurs in a relational database when two or more rows or columns have the same
value or repetitive value leading to unnecessary utilization of the memory.

Student Table:

Student Table

Sid CourseID StudName Address Course

205 6204 James Los Angeles Economics

205 6247 James Los Angeles Economics

224 6247 Trent Bolt New York Mathematics

230 6204 Ritchie Rich Egypt Computer

230 6208 Ritchie Rich Egypt Accounts


2. Insert Anomaly: An insert anomaly occurs in the relational database when some attributes or data items
are to be inserted into the relation without existence of other attributes.

For example, In the Student table, if we want to insert a new CourseID, we need to wait until the student
enrolled in a course. In this way, it is difficult to insert new record in the table. Hence, it is called insertion
anomalies.

Student Table

Sid CourseID StudName Address Course

205 6204 James Los Angeles Economics

205 6247 James Los Angeles Economics

224 6247 Trent Bolt New York Mathematics

230 6204 Ritchie Rich Egypt Computer

230 6208 Ritchie Rich Egypt Accounts

3. Update Anomalies: The anomaly occurs when duplicate data is updated only in one place and not in all
instances. Hence, it makes our data or table inconsistent state.

For example, there is a student 'James' who belongs to Student table. If we want to update the addressof the
Student, we need to update the same everywhere, where ever that student’s address exists; otherwise, the data
will be inconsistent. And it reflects the changes in a table with updated values where some of them will not
display updated values.

Student Table

Sid CourseID StudName Address Course

205 6204 James Los Angeles Economics

205 6247 James Los Angeles Economics

224 6247 Trent Bolt New York Mathematics

230 6204 Ritchie Rich Egypt Computer

230 6208 Ritchie Rich Egypt Accounts


4. Delete Anomalies: This kind of anomaly occurs in database table when some data is lost or deleted from
the database table due to the deletion of some other data. For example, if we want to remove Trent Bolt from
the Student table, it also removes his address, course and other details from the Student table. Therefore, we
can say that deleting some attributes can remove other attributes of the database table.

Student Table

Sid CourseID StudName Address Course

205 6204 James Los Angeles Economics

205 6247 James Los Angeles Economics

224 6247 Trent Bolt New York Mathematics

230 6204 Ritchie Rich Egypt Computer

230 6208 Ritchie Rich Egypt Accounts

Functional Dependencies
In a relational database management, functional dependency is a concept that specifies the relationship
between two sets of attributes where one attribute determines the value of another attribute. It is denoted
as X → Y, where the attribute set on the left side of the arrow, X is called Determinant, and Y is called
the Dependent.
Types of Functional Dependencies in DBMS
1. Trivial functional dependency

2. Non-Trivial functional dependency

3. Multi-valued functional dependency

4. Transitive functional dependency

5. Full functional dependency

6. Partial functional dependency


1. Trivial Functional Dependency
In Trivial Functional Dependency, a dependent is always a subset of the determinant. i.e.
If XY → Y and Y is the subset of XY(Key Attribute), then it is called trivial functional dependency
Example:
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a subset of
determinant set {roll_no, name}.
Similarly, roll_no → roll_no is also an example of trivial functional dependency (self).

2. Non-trivial Functional Dependency


In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant. i.e.
If XY→ Zand Z is not a subset of XY, then it is called Non-trivial functional dependency.
Example:
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a subset
of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial functional dependency,
since age is not a subset of {roll_no, name}
3. Multivalued Functional Dependency
In Multivalued functional dependency, attributes of the dependent set should not depend on each
other. i.e. If a → {b, c} and there exists no functional dependency between b and c, then it is called
a multivalued functional dependency.
For example,
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

45 abc 19

Here, roll_no → {name, age} is a multivalued functional dependency, since the


dependents name & age are not dependent on each other (i.e. name → age or age → name doesn’t exist !)

4. Transitive Functional Dependency


In transitive functional dependency, dependent is indirectly dependent on determinant. i.e. If a → b & b → c,
then according to transitivity, a → c. This is a transitive functional dependency.
For example,
regd_no name dept block_no

542 abc CSE 4

443 pqr ECE 2

1244 xyz IT 1

345 abc ME 2

Here, regd_no → dept and dept → block_no. Hence, according to the transitivity, regd_no → block_no is
a valid functional dependency. This is an indirect functional dependency, hence called Transitive functional
dependency.
5. Full Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely determines another attribute or set of
attributes. If a relation R has attributes W,X, Y, Z with the dependencies WX->Y and WX->Z which states
that those dependencies are fully functional. If W or X is removed Y and Z does not satisfy the functional
dependency rule.
6. Partial Functional Dependency
In partial functional dependency a non key attribute depends on a part of the composite key, rather than the
whole key. If a relation R has attributes W,X, Y, Z with the dependencies WX->Y and WX->Z , If W or X is
removed even though Y and Z does satisfy the functional dependency rule.

Normal Forms

Key: We must have a way to identify tuples within a given relation separately. Using the
attribute values, we can identify tuples individually. That attribute is called as ‘Key’
Key Attribute: It’s an attribute which involved in formation of Candidate key.
Non-Key Attribute: Means a key which does not involved in formation of Candidate key
Partial Key: Some part (some attributes) of the Candidate key is called as Partial key
Total Key: All Attributes of the Candidate key/key is called as Total key
Functional Dependency: If one attribute is depending on another attribute that is called
Functional Dependency (or) If one attribute determines another attribute then we can say that
it is functional dependency A->B. Here A is an attribute which determines attribute B
Total Dependency: If a non-key attribute is totally depending upon on Candidate key
attribute/s then it is called total dependency
Partial Dependency: If a non-key attribute is depending some part the Candidate key that is
said to be Partial Dependency
Decomposition: Dividing a big relation into smaller relations that smaller relations contain a
subset of attributes o main relation

General Table: (Which is not in any normal form)

Roll No Name Course Id Course Name Marks Percentage-Grade


5402 Harsha 22100 DE 95 95-A
5403 Aditya 22111 SDS 96 96-A
5499 Hemanth 22101 DMS 97 97-A
1st Normal Form
In the above table ‘Roll No & Course Id’ is collectively considered as ‘Key’. In the above table we cannot
access the Percentage exclusively.Similarly, in this table we cannot access the Grade exclusively. If anyone is
accessed ‘Percentage’ then remaining other data ‘Grade’ will also be displayed. To avoid this problem, we are
refining the schema. Here ‘Percentage-Grade attribute can be further subdivided which is a composite attribute.
Thus, table needs to be decomposed. (Here composite attribute means an attribute which can be further sub
divided, example Address. Address can be further sub divided into Door_No, Street, Area, Village)

After Decomposition:
Table which is in 1st Normal Form
Roll No Name Course Id Course Name Marks Percentage Grade
5402 Harsha 22100 DE 95 95 A
5403 Aditya 22111 SDS 96 96 A
5470 Rishi 22101 DMS 97 97 A

2nd Normal Form


In the above table we considered Roll No & Course Id is collectively considered as ‘Key’(TOTAL KEY) Here
partial dependency is not accepted. Means every Non-key attribute should totally depends on all key attributes
(TOTAL KEY) (This is the Condition). But here Course Name which is a non-key attribute is depending on
Course id which is partial key. It should be eliminated by splitting Table.

After splitting resultant Table will be in 2nd Normal Form

Roll No Name Marks Percentage Grade Course Id


5402 Harsha 95 95 A 22100
5403 Aditya 96 96 A 22111
5470 Rishi 97 97 A 22101

Course Id Course Name


22100 DE
22111 SDS
22101 DMS
3rd Normal Form
Here condition is every non-key attribute should depend on Key attribute only but never depends on any other
non-key attributes. If a non-key attribute depends on any other non-key attribute, then we can say that relation
is not in 3rd Normal Form. To achieve it we are going to decompose table then the resultant table will be in 3rd
Normal Form. 3NF builds on 2NF by requiring that all non-key attributes are independent of each other. This
means that each column should be directly related to the Key attribute only, and not to any other attribute in
the same table.

In the below table ‘Grade’ is depending upon ‘Marks’. Here both Marks and Grade are non-key attributes. So
decomposition is required.

Roll No Name Marks Grade Course Id


5402 Harsha 95 A 22100
5403 Aditya 96 A 22111
5470 Rishi 97 A 22101

Course Id Course Name


22100 DE
22111 SDS
22101 DMS

Marks Grade
95 A
96 A
97 A

Boyce-Codd Normal Form (BCNF):


(As we already discussed that, if X → Y, where the attribute set on the left side of the arrow, X is
called Determinant, and Y is called the Dependent.)

BCNF is a strict form of 3NF that ensures that each Determinant in a table must be a Candidate key. In other
words, BCNF ensures that every Non-key attribute must depend on the Candidate key only.
BCNF (Boyce-Codd Normal Form) is just an advanced version of Third Normal Form. Here we have some
additional rules than Third Normal Form. Basic condition for any relation to be in BCNF is that, it must be in
Third Normal Form.

Basic rules for a relation to be in BCNF:


1. Relation must be in Third Normal Form.
2. In relation X->Y, X must be a Super-Key/Candidate-Key in a relation.

Student Course Tutor


(S) (C) (T)
101 Java Sundeep
101 Python Sagar
102 Java Sundeep
102 Python Surya

In the above relation, S(Student), T(Tutor) collectively determine S, T, C(Course). So, S&T can be
considered as a Super key. S & T are prime attributes.

Here T->C it does not satisfy BCNF rule because T is not a Super Key. So, relation is not in BCNF.
Now we have to decompose relation in order to convert into BCNF.

Decomposed Relation:

Student (S) Tutor (T) Course (C) Tutor (T)


101 Sundeep Java Sundeep
101 Sagar Python Sagar
102 Sundeep Python Surya
102 Surya

Fourth Normal Form (4NF): 4NF is a further refinement of BCNF that ensures that a table does not
contain any Multi-Valued Dependencies.
Basic rules are for 4NF is mentioned below:
1. It must be in BCNF.
2. It does not have any multi-valued dependency.
A->(B,C) B and C must be individual
B must not dependent on C & C must not dependent on B, then we can say that the table is having Multi-
Valued Dependency.
Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend
on a third attribute.
A multivalued dependency consists of at least Two attributes that are dependent on a Third attribute that's why
it always requires at least three attributes.
Relation with Multi-Valued Dependency:

STU_ID COURSE HOBBY


21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey

Decomposed Relations which are in 4th Normal form:

STU_ID COURSE STU_ID HOBBY


21 Computer 21 Dancing
21 Math 21 Singing
34 Chemistry 34 Dancing
74 Biology 74 Cricket
59 Physics 59 Hockey

Fifth Normal Form (5NF)

Any relation in order to be in the fifth normal form, it must satisfy the following conditions:

It must be in Fourth Normal Form (4NF).


It should not have Join dependency and also the Joining must be lossless.
Join Dependency:
Join Dependency (JD) can be explained as when the relation R is equal to the Join of the sub-
relations R1, R2,...Rn.
Join Dependency arises when the attributes in one relation are dependent on attributes in
another relation, which means certain rows will exist in the table if there is the same row in
another table.
Multiple tables are joined to create a single table where one of the attributes is common in the
sub-tables.
Types of Join Dependency
There are two types of Join Dependencies:
• Lossless Join Dependency: It means that whenever the join occurs between the tables,
then no information should be lost, the new table must have all the content in the original
table.
• Lossy Join Dependency: In this type of join dependency, data loss may occur at some
point in time which includes the absence of a tuple from the original table or duplicate
tuples within the database.
SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

Example:-

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take
Math class for Semester 2.
In this case, combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will
be taking that subject so we leave Lecturer and Subject as NULL.
But all three columns together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
Summary of Normalization
Transformation Output Relation
Eliminate Composite attributes 1NF
Remove dependency of non-key attribute on part of a key-attribute 2NF
(Key-attribute might consists of more than one attribute)
Remove dependency of non-key attribute on other non-key attributes 3NF
(non-key attribute should depend on key-attribute only but not on any
other non-key attribute)
Determinant in a table must be a Candidate key BCNF
(Every non-key attribute must depend on the Candidate key)
Remove multi-valued dependency from relation by slitting relation 4NF
(In a relation there should not be more than one independent)
It should not have Join dependency 5NF
Lossless Join:
Lossless join decomposition is a decomposition of a relation R into relations R1, and R2 such that if we
perform a natural join of relation R1 and R2, it will return the original relation R. This is effective in
removing redundancy from databases while preserving the original data.
In other words, by Lossless decomposition, it becomes feasible to reconstruct the relation R from
decomposed tables R1 and R2 by using Joins.
This Lossless join property guarantees that the extra or less tuple generation problem does not occur and no
information is lost from the original relation during the decomposition.

• If we Union the sub Relation R1 and R2 then it must contain all the attributes that are available in the
original relation R before decomposition.
• Intersections of R1 and R2 cannot be Null. The sub relation must contain a common attribute. The
common attribute must contain unique data.
Here,
R = (A, B, C)
R1 = (A, B)
R2 = (A, C)
The relation R has three attributes A, B, and C. The relation R is decomposed into two relation R1 and R2.
R1 and R2 both have 2-2 attributes. The common attribute is A.
The Value in Column A must be unique. if it contains a duplicate value then the Lossless-join decomposition is
not possible.

The common attribute must be a super key of sub relations either R1 or R2.

A B C
1 2 1
2 2 2
3 3 2
After Decomposition:
A B A C
1 2 1 1
2 2 2 2
3 3 3 2

Dependency preserving Decomposition:

Dependency Preserving Decomposition is a technique used in Database Management System (DBMS) to


decompose a relation into smaller relations while preserving the functional dependencies between the
attributes.

The goal is to improve the efficiency of the database by reducing redundancy and improving query
performance.

In this technique, the original relation is decomposed into smaller relations in such a way that the resulting
relations preserve the functional dependencies of the original relation.
This is important because if the decomposition results in losing any of the original functional dependencies, it
can lead to data inconsistencies and anomalies.
Dependency is an important constraint on the database. Every dependency must be satisfied by at least one
decomposed table.
If {A → B} holds, then two sets are functional dependent. and, it becomes more useful for checking the
dependency easily if both sets in a same relation.
This decomposition property can only be done by maintaining the functional dependency.
In this property, it allows to check the updates without computing the natural join of the database structure.
Surrogate Key
• Surrogate key: A column that is not generated from the data in the database is known as a surrogate key.
Rather, the DBMS generates a unique identifier. In database tables, surrogate keys are frequently utilized
as primary keys.

In case we do not have a natural primary key in a table, then we need to artificially create one key in order
to uniquely identify a row in the table, this key is called as the surrogate key or synthetic primary key of
the table.

Features of the Surrogate Key


• It is automatically generated by the system.
• It holds an anonymous integer.
• It contains a unique value for all records of the table.
• This value never be modified by the user or application.
• The surrogate key is called the factless (fictitious) key as it is added just for our ease of identification of
unique values and contains no relevant fact(or information) that is useful for the table.

Consider an example:
Suppose we have two tables of two different schools having the same column registration_no, name, and
percentage, each table having its own natural primary key, that is registration_no.

Chaitanya
R.No Name Marks Percentage Address
1 A. BHARGAV 945 94 Gwk
2 A. PRAVEEN 900 90 Akp
3 A. RAJESH 930 93 NAD
4 J. LIKITHA 850 85 Dvd
5 J.KRISHNA 800 80 Hyd
6 K. ASHOK 870 87 Rjy
7 M SRIPRIYA 960 96 Kkd
8 M GANESH 880 88 Vzm
9 A HARSHITHA 750 75 Sklm
10 D LAKSHMI 780 78 Tuni
11 K JAGADISH 810 81 Ylm
Narayana
R.No Name Marks Percentage Address
1 J. LIKITHA 945 94 Gwk
2 J.KRISHNA 900 90 Akp
3 K JAGADISH 930 93 NAD
4 D LAKSHMI 850 85 Dvd
5 M SRIPRIYA 800 80 Hyd
6 M GANESH 870 87 Rjy
7 A. PRAVEEN 960 96 Kkd
8 A. BHARGAV 880 88 Vzm
9 A. RAJESH 750 75 Sklm
10 K. ASHOK 780 78 Tuni
11 A HARSHITHA 810 81 Ylm

In the below table Surr.No acts a Surrogate Key.

2019_X_Class_Results
Surr.No R.No Name Marks Percentage Address
S1 1 A. BHARGAV 945 94 Gwk
S2 2 A. PRAVEEN 900 90 Akp
S3 3 A. RAJESH 930 93 NAD
S4 4 J. LIKITHA 850 85 Dvd
S5 5 J.KRISHNA 800 80 Hyd
S6 6 K. ASHOK 870 87 Rjy
S7 7 M SRIPRIYA 960 96 Kkd
S8 8 M GANESH 880 88 Vzm
S9 9 A HARSHITHA 750 75 Sklm
S10 10 D LAKSHMI 780 78 Tuni
S11 11 K JAGADISH 810 81 Ylm
S12 1 J. LIKITHA 945 94 Gwk
S13 2 J.KRISHNA 900 90 Akp
S14 3 K JAGADISH 930 93 NAD
S15 4 D LAKSHMI 850 85 Dvd
S16 5 M SRIPRIYA 800 80 Hyd
S17 6 M GANESH 870 87 Rjy
S18 7 A. PRAVEEN 960 96 Kkd
S19 8 A. BHARGAV 880 88 Vzm
S20 9 A. RAJESH 750 75 Sklm
S21 10 K. ASHOK 780 78 Tuni
S22 11 A HARSHITHA 810 81 Ylm
De-normalization
Student
Roll Name Age Branch_id
12 Raju 21 54
19 Ravi 21 05

Branch
Branch_id Branch_Name Total_Students
54 AI_DS 133
05 CSE 245

De-normalization is a database optimization technique in which we add redundant data to one or more tables.
This can help us avoid costly joins in a relational database.
Note that denormalization does not mean ‘reversing normalization’ or ‘not to normalize’. It is an
optimization technique that is applied after normalization.
Basically, The process of taking a normalized schema and making it non-normalized is called de-
normalization, and designers use it to tune the performance of systems to support time-critical operations.
Pros of De-normalization:
1. Retrieving data is faster since we do fewer joins
2. Queries to retrieve can be simpler (and therefore less likely to have bugs),
since we need to look at fewer tables.
Cons of De-normalization:
1. Updates and inserts are more expensive.
2. De-normalization can make update and insert code harder to write.
3. Data may be inconsistent.
4. Data redundancy necessitates more storage.

Case(i)
When we need student name with branch name then join operation has to be done
Case(ii)
If we need to modify all the student’s names, then it is possible because table size is small
If the table is big, then issues raise. It takes too long time on both the cases
In this case(i) we will update database with de-normalization, redundancy and extra effort to maximize the
benefits.
So we can add branch name data from branch table to student table
De-normalized table:

Student
Roll Name Age Branch_id Branch_Name
12 Raju 21 54 AI_DS
19 Ravi 21 05 CSE

You might also like