Normalization
Normalization
Database
Normalizati
on
Normalization
Normalization Types
Normalization Process
BCNF
2
Database Redundancies
and Anomalies
Data redundancy in database management systems (DBMS) refers to the unnecessary duplication of data
within a database. It occurs when the same piece of data is stored in multiple places or multiple times within a
database. Data redundancy can manifest in various forms, such as:
1.Duplicate Records: Two or more records in a database contain identical or very similar information. For
example, if there are multiple entries for the same customer with slight variations in spelling or formatting.
2.Repeated Attributes: The same attribute or set of attributes is stored in multiple tables. This can happen when
the same piece of information is needed in different contexts but is stored separately each time. For instance,
storing the address of a customer in both an "Orders" table and a "Customers" table.
3.Repetitive Values: Certain values are repeated unnecessarily within a single table. For example, if a product
table includes the same description for multiple products rather than referencing a centralized list of
descriptions.
Implications of Data Redundancy:
1.Increased Storage Requirements: Redundant data consumes additional storage space within the database,
leading to inefficiency and increased storage costs.
2.Data Inconsistency: Redundant data introduces the risk of inconsistency, as updates made to one copy of the
data may not be reflected in all instances. This can result in discrepancies and inaccuracies within the
database.
3
Database Redundancies
and Anomalies
1.Insertion Anomalies: These occur when it's not possible to add certain information into
the database without having other, unrelated information. In other words, to insert a
new record, you might be required to add redundant data that doesn't relate to the new
record. For instance, if you can't add a new customer without also adding a related order,
you're facing an insertion anomaly.
2.Deletion Anomalies: Deletion anomalies happen when deleting certain information
results in the loss of other, unrelated information. For example, if deleting a record about
a specific product also removes details about a customer who bought that product,
you're encountering a deletion anomaly.
3.Update Anomalies: These anomalies occur when updating information leads to
inconsistencies because the same data is stored redundantly and updated inconsistently.
For instance, if the address of a customer is stored in multiple places and updated in one
location but not the others, you'll have inconsistent data, which is an update anomaly.
Addressing data redundancy through normalization and proper database design can help
mitigate these anomalies and ensure data integrity and consistency within the database.
4
Database Redundancies
and Anomalies
5
Database Redundancies
and Anomalies
•Example:
6
Normalization
•Normalization is the process of organizing the data
in the database.
•Normalization is used to minimize the redundancy
from a relation or set of relations. It is also used to
eliminate undesirable characteristics like Insertion,
Update, and Deletion Anomalies.
•Normalization divides the larger table into smaller
and links them using relationships.
7
Normalization
Normalization in DBMS provides these advantages:
1.Data Redundancy Elimination: Reduces redundancy by organizing
data efficiently.
2.Data Consistency: Minimizes anomalies, ensuring data remains
consistent.
3.Data Integrity Improvement: Enforces referential integrity,
maintaining accurate data.
4.Simplified Maintenance: Facilitates easy schema modifications
without disrupting the system.
5.Query Performance Optimization: Can lead to faster query retrieval
times.
6.Facilitates Database Design: Guides systematic organization of data
for better understanding.
7.Storage Space Reduction: Despite additional tables, it often
optimizes storage usage.
8
In summary, normalization ensures efficient, scalable, and maintainable
Normalization Types
Normalization rules are divided into the following normal
forms:
1.First Normal Form
2.Second Normal Form
3.Third Normal Form
4.BCNF
5.Fourth Normal Form
6.Fifth Normal Form
9
First Normal Form
(1NF)
For a table to be in the First Normal Form, it should follow
the following rules:
1.It should only have single(atomic) valued
attributes/columns.
2.Values stored in a column should be of the same
domain.
3.All the columns in a table should have unique names.
4.And the order in which data is stored should not matter.
5.First normal form disallows the multi-valued attribute,
composite attribute, and their combinations.
10
First Normal Form
(1NF)
Example:
11
First Normal Form
(1NF)
Example:
12
First Normal Form
(1NF)
Example:
13
First Normal Form
(1NF)
Example:
14
Second Normal Form
(2NF)
For a table to be in the Second Normal Form,
1.It should be in the First Normal form.
2.And, it should not have Partial Dependency. Means all non-
prime attributes should be fully functionally dependent on CK.
• Partial Dependency exists, when for a composite primary key or
CK, any attribute in the table depends only on a part of the
primary key and not on the complete primary key.
• Part of CK determines non-prime attribute (partial dependency)
• AB is CK but B only determines C this creates partial dependency.
• To remove Partial dependency, we can divide the table, remove
the attribute which is causing partial dependency, and move it to
some other table where it fits in well.
15
Second Normal Form
(2NF)
Example: If we have two tables Students and Subjects, to store student
information and information related to subjects.
16
Second Normal Form
(2NF)
Example: And we have another table Score to store the
marks scored by students in any subject like this,
17
Second Normal Form
(2NF)
Example:
The column teacher_name should be in
the Subjects table. And then the entire system will be
Normalized as per the Second Normal Form.
18
Second Normal Form
(2NF)
Example:
19
Second Normal Form
(2NF)
Example:
20
Second Normal Form
(2NF)
Example :
21
Second Normal Form
(2NF)
Example :
22
Second Normal Form
(2NF)
How to check a relation in 2nd normal form from given FD’s ?
Check
There is no partial dependency
Simply check for partial dependency and it should not exist
Here partial dependency means:
Proper subset of CK -> Non-Prime Attribute
If below condition exists in the table, it has partial dependency
and it is not in 2nd normal form
1. LHS of all FDs should be a proper subset of CK and RHS
should be a non-prime attribute .
23
Second Normal Form
(2NF)
Example
R (ABCDEF)
FDs are C->F, E->A, EC->D, A->B
(ABCDE) + = ABCDEF
(ACDEF) + = ABCDEF
(ACDE) + = ABCDEF
(ACE) + = ABCDEF
(CE) + = ABCDEF is CK
Here prime attributes are C,E
Non-Prime attributes are ABDF
Now check the rule : LHS of a FD should be a proper subset of CK and RHS should be a
non-prime attribute .
In above given FDs above rule is not followed so this relation is not 2 nd normal form.
24
Second Normal Form
(2NF)
R(A,B,C,D)
Find CK?
PA: {A, B, C, D}
No Prime Attribute.
R(A,B,C,D)
CK: A
PA: A
05/21/2025 25
Third Normal Form
(3NF)
A relation R is said to be in 3 NF (Third Normal Form) if and only if:
1.R is already in 2 NF
2.There is no transitive dependency for non-prime attributes.
Here transitive dependency means non prime attribute-> non prime
attribute
A transitive dependency exists when another non-key attribute
determines a non-key attribute. In other words, If A determines B and B
determines C, then automatically, A determines C.
26
Third Normal Form
(3NF)
Example: Roll no State City
1 Sindh Hyderabad
2 Punjab Lahore
3 Sindh Karachi
4 Punjab Faislabad
CK is roll no 5 KPK Peshawar
Roll no-> State 6 Sindh Sukkur
State -> City
Here Prime Attribute is Roll no
Non Prime Attributes are State, City
Here transitive dependency exists as Roll no ->State and State -> City so we can say Roll no ->
City.
So this relation is not in 3rd Normal form
27
Third Normal Form
(3NF)
Example: Before 3NF Roll no State City
1 Sindh Hyderabad
2 Punjab Lahore
3 Sindh Karachi
4 Punjab Faislabad
5 KPK Peshawar
6 Sindh Sukkur
After Normalization
Roll no State ID City ID
State ID State
City ID City
28
Third Normal Form
(3NF)
How to check a relation in 3nd normal form from given FD’s ?
A table is in 3rd normal form if and only if :
For each of its non-trivial FDs at least one of the following conditions
holds
Check
1. LHS of all FDs should be SK or CK
2. or RHS should be a prime attribute.
29
Third Normal Form
(3NF)
Example
R (ABCD)
FDs are AB->CD, D->A,
(AB) + = ABCD
(AB) + = ABCD is CK
Check right hand side of CK in FDs is present so replace its value with corresponding LHS.
(DB) + = ABCD is also CK
Here prime attributes are A,B,D
Non-Prime attribute is only C
Now check the rule :LHS of all FDs should be SK or CK or RHS should be a prime attribute.
In above given FDs above rule is followed so this relation is in 3rd normal form.
30
Third Normal Form
(3NF)
Example
R (ABCD)
FDs are A->B, B->C, C->D
Is this relation in 3NF?
Answer: No
31
Third Normal Form
(3NF)
Example
R (ABCDEF)
FDs are AB->CDEF, BD->F
Is this relation in 3NF?
Answer: No
32
BCNF
It is an upgraded version of the 3rd Normal form. A relation
R is said to be in 3 NF (Third Normal Form) if and only if:
•R is already in 3 NF
•For any dependency A –> B, then A must be the Super key
or candidate key.
In simple words, if A –> B, then A cannot be a non-prime
Attribute if B is a prime attribute which means that A non-
prime attribute cannot determine a prime attribute.
33
BCNF
• BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address
certain types of anomalies not dealt with by 3NF as originally defined.
• A relation is in BCNF if and only if: LHS of each FD should be CK or SK.
• Example
For example consider relation R(A, B, C)
A -> BC,
B -> A
Here
(ABC)+ = {A,B,C}
A+ = {A,B,C}
B+ = {A,B,C}
Now check A relation is in BCNF if and only if: LHS of each FD should be CK or SK.
The above condition is satisfied therefore relation is in BCNF
34
BCNF
•A relation is in BCNF if and only if: LHS of each FD should
be CK or SK.
•Example
•R (Rollno, name, age, voterid)
FD are
Rollno -> name
Rollno -> voteid
Voteid ->age
Voteid -> Rollno
Check is above relation in BCNF?
35
BCNF
• A relation is in BCNF if and only if: LHS of each FD should be CK or SK.
• Example
• R (Rollno, name, age, voterid)
FD are
Rollno -> name
Rollno -> voteid
Voteid ->age
Voteid -> Rollno
Check is above relation in BCNF?
Yes above relation in BCNF as candidate keys are Rollno and Voteid
and for each FD LHS is CK.
36
BCNF
•Example
R(A,B,C)
FD are
A->B, B->C, C->A
Is this relation in BCNF?
Answer: Yes
37
Finding possible NFs
•Example NF A->BCDE BC->AEC D->E
BCNF Yes Yes No
R(A,B,C,D,E)
3NF Yes Yes No
FD are 2NF Yes Yes Yes
38
Finding possible NFs
•Example NF AB->CDE D->A
BCNF
R(A,B,C,D,E)
3NF
FD are 2NF
39
Finding possible NFs
• Example NF ABC->DE E->GH H->G ABCD ->EF
R(A,B,C,D,E,F,G,H) BCNF
3NF
2NF
1NF
FD are
ABC->DE, E->GH, H->G, ABCD ->EF
Find possible normal forms and which is here highest
normal form?
Ans: 2nd NF
40
Finding possible NFs
• Example NF AB->CD AC->BD BC ->D
R(A,B,C,D) BCNF
3NF
2NF
1NF
FD are
AB->CD, AC->BD, BC ->D
Find possible normal forms and which is here highest
normal form?
Ans: 2nd NF
41
Decomposition of Tables for
Normalization
Decomposition is a process of dividing a relation into
multiple relations to remove redundancy while maintaining
the original data.
Types of decomposition:
1. Dependency Preservation Decomposition
2. Lossless decomposition
3. Lossy decomposition
42
Dependency Preservation
Decomposition
Dependency Preservation Decomposition
If we decompose a relation R into relations R1 and R2, All dependencies of R
either must be a part of R1 or R2 or must be derivable from a combination of
functional dependency of R1 and R2. For Example, A relation R (A, B, C, D)
with FD set{A->BC} is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of R1(ABC).
The dependencies that exists in the original relation, exists after
decomposition.
43
Dependency Example
A B C
Now we can divide the relation into two relations
1 1 1
R1(A, B) R2(B, C)
2 1 2 11 11
3 2 1 21 12
4 2 2 32 21
42 2.2
Find FDs?
F1: {A-> A, A->B} F2: { }
How to derive FD?
Now combine F1 U F2 = F
FD: A-> B, A-> C
F1 U F2 = A-> B
Using Union property, A -> BC
F1 U F2 = F Can’t preserve
B and C alone can’t derive
Means FD not preserving
anything but combine BC can
derive A
BC -> A
FDs: {A->BC, BC->A}
05/21/2025 44
Dependency Example
R(A, B, C, D, E) Using same method I can find FD
of R2(C, D, E)
FD: {A->B, B->C, C->D, D->A}
FDs: {C-> D, D->c}
R1(A, B, C) R2(C, D, E)
Now combine F1 U F2 = F
How to find the FDs of sub relations?
F1 U F2 =
A+: ABCD A->BC {A->BC, B->CA, C->AB, C->D, D->C}
F1 U F2 = F
B+ : ABCD B-> CA
Means FD is preserved
C+: ABCD C-> AB
AB+: ABCD AB-> C
This is duplicate b/c A can derive BC
FD: {A->BC, B->CA, C->AB}
05/21/2025 45
Lossless Decomposition
Lossless Decomposition
A lossless decomposition of a relation ensures that:
a) No information is lost during decomposition. This is why the term lossless is used in this
decomposition as no information is lost.
b) If a relation R is divided into two relations R1 and R2 using lossless decomposition then the
natural join of R1 and R2 would return the original relation R.
Rules of Lossless decomposition: For these rules, we are assuming that a relation R is divided into
two relations R1 and R2.
1. Natural join of R1 and R2 should return the original relation R.
R1 U R2 = R
2. The intersection of R1 and R2 should not be null. This is because there are some common
attributes present in relation R1 and R2.
R1 ∩ R2 ≠ 0
3. The intersection of R1 and R2 is either a super key of R1 or R2, or both the relations R1 and
R2.
R1 ∩ R2 = super key of R1 or R2 or both
46
Lossless Decomposition
The primary key of given relation is {Student_Id, Course_Id}
This table has redundant data as the Course_Id and Course_Detail are
common for several students. Let’s decompose this relation into two
relations.
47
Lossless Decomposition
48
Lossless Decomposition
49
Lossless Decomposition
Let’s check all the three rules of lossless decomposition to check whether this
decomposition is lossless or not.
The union results in the original relation StudentCourse so we can say that the
first rule holds true
50
Lossless Decomposition
Let’s check all the three rules of lossless decomposition to check whether this
decomposition is lossless or not.
51
Lossy Decomposition
As the name suggests, in lossy decomposition, the information is lost
during decomposition. The three rules that we discussed above would
not apply in lossy decomposition. In lossy decomposition, one or more
rules will fail.
52
Lossy Decomposition
Now if we divide this relation like this:
53
Lossy Decomposition
•This is a lossy decomposition as the intersection of Student
and Course relation will return null so the second and third
rule of lossless decomposition will fail here.
54
Lossless or Lossy
Example
R(A B C)
111
212
321
432
Lossy or lossless?
R1(A, B) R2(B, C)
Ans: Lossy
R1(A, B) R2(A, C)
Ans: Lossless
05/21/2025 55
Lossless or Lossy
Example
R( A B C) R( A B ) R( B C)
113 13
1 1
213
21
1 3 duplicate
326
32 26
437
43 37
Lossy or
lossless? R1 natural Join R2
After Joining R1 and R2
R1(A, B)
R2(B, C) 113 We get original table.
This is lossless
213
decomposition
326
437
05/21/2025 56
How to decompose a relation into 2nd
NF
R(A,B,C,D) B+ BC means B-> C
FD: {A-> B, B->C} C+: C
CK: AD FD: {A->BC, B->C}
PA: A, D Now find CK?
CK: A
In FD: Proper subset of CK is determining non-PA If the CK is only attribute, then there is no chance of PD. It is
A-> B is violating the 2nd NF in 2nd NF.
For R2(A, D)
Now we have to decompose relation into two subrelation A+: ABC
A-> B is creating problem D+: D
Find A+? No FD found.
A: ABC This FD has only two attributes. It is already in BCNF
Put these attributes in one relation
R1(A, B, C) R2(D) Overall, we will have the relation is in 2 nd NF.
Decomposition must be dependency preserving and
lossless We check the decomposition; this relation is dependency
Is there any common attribute between these two preserving or not?
relations?
This is lossy decomposition F1UF2 = F
To make it lossless we put common attribute We need to check the original FD available or not in F1UF2
Common attribute should be SK of at least one relation The answer is yes.
R1(A, B, C) R2(A, D)
Using A+ we can find all attributes.
Now find the FD of sub relation This decomposition is dependency preserving
T0 find the FD, we have to find the closure of all attributes
A+: ABC means A->BC and lossless.
05/21/2025 57
How to decompose a relation into 2nd
NF
R(A,B,C,D)
F:{A->B, C->D}
CK:AC
Check the dependencies. Both are PD. Not in 2nd NF.
A+:AB
R1{A,B}
C+: CD
R2{C, D}
If we are joining R1 and R2. For lossless decomposition the common attribute should SK of either relation
R3{A, C}
We have only decomposed the relation into one level. The decomposition is in BCNF.
05/21/2025 58
Convert a relation from 2nd NF to 3rd NF
R(A, B, C, D) F:{A->B, B->C, C->D}
F:{A-> B, B-> C, C-> D} Sub relations: R1{B,C,D} R2{C,D},R3{A,B}
There is no partial dependency. This is already in 2nd NF. Find dependencies of all these relations
There is a transitive dependency means this is violating the F1: {B->CD, C->D}, F2:{C->D}, F3{A->B}
3NF.
If LHS is SK or RHS is PA then there is no transitive R2 and R3 are in BCNF
dependency.
Which FD is violating the 3rd NF? Now check R1 CK.
First find CK? CK:B, PA:{B}
CK: A C->D is TD and creating problem
PA: {A} Take out this C->D
Non-PA: {B, C, D} Find C+?
C+: CD
B->C and C-> D are TD and violating the 3rd NF. R11{C, D} and R12{B, C}
Take out B->C CK: C
Find out the B+ ? Find FD?
B+: BCD F11:{C->D}, F12:{B->C}
Create a separate relation for BCD
R1{B, C, D} Now there are only two attributes in each relation. Definitely
C-> D this relation is in BCNF.
R2:{C, D} Check this relation dependency preserving or not?
R3:{A, B, C} R12(B, C), R2(C, D), R3(A, B)
To make this relation dependency preserving and lossless, the {C->D, B->C, A->B}
common attribute should be SK of either relation. Yes, this relation is Dependency preserving.
R(A, B, C, D)
05/21/2025 59