0% found this document useful (0 votes)
17 views41 pages

12 Normalization

The document discusses database normalization, a technique for organizing data to eliminate redundancy and anomalies such as insertion, update, and deletion issues. It outlines the various normal forms (1NF, 2NF, 3NF, BCNF, and 4NF) and their requirements, emphasizing the importance of maintaining data integrity and reducing duplication. Examples are provided to illustrate the problems caused by lack of normalization and the benefits of applying these rules.

Uploaded by

Ahmad ullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views41 pages

12 Normalization

The document discusses database normalization, a technique for organizing data to eliminate redundancy and anomalies such as insertion, update, and deletion issues. It outlines the various normal forms (1NF, 2NF, 3NF, BCNF, and 4NF) and their requirements, emphasizing the importance of maintaining data integrity and reducing duplication. Examples are provided to illustrate the problems caused by lack of normalization and the benefits of applying these rules.

Uploaded by

Ahmad ullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

1

The University of Agriculture, Peshawar

Institute of Computer Science & Information Technology


ICS&IT Department

Yasir Ahmad

[email protected]
Normalization of 2

Database

 Goals:
 Database Normalization is a technique of organizing the data in the
database.
 Normalization is a systematic approach of decomposing tables to eliminate
data redundancy(repetition) and undesirable characteristics like
Insertion, Update and Deletion Anomalies.
 It is a multi-step process that puts data into tabular form, removing
duplicated data from the relation tables.
3

 Used

 1. During conceptual data Modeling i.e. during E-R Diagram


4
Used

 2. During Logical Data Base Design.


i.e. during mapping E-R diagrams into Relations
5
Used

 3. Reverse Engineering Older Systems


6
Problems Without Normalization

 Ifa table is not properly normalized and have data


redundancy then it will not only eat up extra
memory space but will also make it difficult to handle
and update the database, without facing data loss.
Insertion, Updation and Deletion Anomalies are
very frequent if database is not normalized. To
understand these anomalies let us take an example of
a Student table.
7

STUDENT
8

Inthe table above, we have data of 4 Computer


Sci. students.
As we can see, data for the fields branch,
hod(Head of Department) and office_tel is
repeated for the students who are in the same
branch in the college, this is Data Redundancy.
9
Insertion Anomaly

 Suppose for a new admission, until and unless a student opts


for a branch, data of the student cannot be inserted, or else we
will have to set the branch information as NULL.

 Also,if we have to insert data of 100 students of same branch,


then the branch information will be repeated for all those 100
students.

 These scenarios are nothing but Insertion anomalies.


10
Updation Anomaly

 What if Mr. X leaves the college? or is no longer the HOD of


computer science department? In that case all the student
records will have to be updated, and if by mistake we miss any
record, it will lead to data inconsistency. This is Updation
anomaly.
11
Deletion Anomaly

 Inour Student table, two different informations are kept


together, Student information and Branch information. Hence, at
the end of the academic year, if student records are deleted,
we will also lose the branch information. This is Deletion
anomaly.
12
Normalization Rule

 Normalization rules are divided into the following normal forms:


 First Normal Form
 Second Normal Form
 Third Normal Form
 BCNF
 Fourth Normal Form
 Essential tuple normal form
 fifth normal form
 Domain-key normal form:
13
First Normal Form (1NF)

 For a table to be in the First Normal Form, it should


follow the following 4 rules:
 Itshould only have single(atomic) valued
attributes/columns.
 Values stored in a column should be of the same domain
 All the columns in a table should have unique names.
 And the order in which data is stored, does not matter.
14
15
Second Normal Form (2NF)

 For a table to be in the Second Normal Form,


 It should be in the First Normal form.
 And, it should not have Partial Dependency(because table has
more than one primary keys) OR
the table must be based on Single Column Primary Key.

 What is Partial Dependency? Do not worry about it. First let's


16
What is Dependency(Functional
Dependency)?
 In this table, student_id is the primary key and will be unique for every row,
hence we can use student_id to fetch any row of data from this table
 Even for a case, where student names are same, if we know the student_id
we can easily fetch the correct record.
What is Partial Dependency? 17

 Toremove Partial
Dependency and violation
on 2NF, decompose the
tables
18
NOTE !!!
 Partial
Dependency exists, when for a composite
primary key(more than one primary keys), any
attribute in the table depends only on a part of the
primary key and not on the complete primary key.

 To remove Partial dependency, we can divide the


table, remove the attribute which is causing partial
dependency, and move it to some other table where
it fits in well.
19
Third Normal Form (3NF)

A table is said to be in the Third Normal Form when,


 It is in the Second Normal form.
 And, it doesn't have Transitive Dependency.

 IfP -> Q and Q -> R is true, then P-> R is a transitive


dependency.
20
Transitive Dependency

 The advantage of removing transitive dependency is,


 Amount of data duplication is reduced.
 Data integrity achieved.

 The term data integrity refers to the accuracy and consistency of data. ... A good database will
enforce data integrity whenever possible. For example, a user could accidentally try to enter a
phone number into a date field. If the system enforces data integrity, it will prevent the user from
making these mistakes.
21
Third Normal Form
Relation with transitive dependency
22
P -> Q and Q -> R is true, then P-> R is a transitive dependency

CustID  Name
CustID  Salesperson BUT
CustID  Region
CustID  Salesperson  Region
All this is OK Transitive dependency
(2nd NF) (not 3rd NF)
Database Systems
Removing a transitive dependency
23
Decomposing the SALES relation
Relations in 3NF
24

Salesperson  Region

CustID  Name
CustID  Salesperson

Now, there are no transitive dependencies…


Both relations are in 3rd NF
Database Systems
30
31
32
Boyce and Codd Normal Form (BCNF)

 Boyce and Codd Normal Form is a higher version of the Third Normal form. This form
deals with certain type of anomaly that is not handled by 3NF. A 3NF table which does
not have multiple overlapping candidate keys is said to be in BCNF. For a table to be
in BCNF, following conditions must be satisfied:
 R must be in 3rd Normal Form
 and, for each functional dependency ( X → Y ), X should be a super Key.

 CANDIDATE KEY is a set of attributes that uniquely identify tuples in a table. Candidate Key is a
super key with no repeated attributes. The Primary key should be selected from the candidate
keys. Every table must have at least a single candidate key.

 Boyce-Codd Normal Form or BCNF is an extension to the third normal


form, and is also known as 3.5 Normal Form.
33

( X → Y ), X should be a super Key


34
ANOMALIES

 Deleting student deletes advisor info


 Insert a new advisor – need a student
 Update – inconsistencies

 Note: no single attribute is a candidate key Primary key can be


student_id,major or student_id,advisor
35
36
37
38
4th Normal Form

 For a table to satisfy the Fourth Normal Form, it should satisfy the
following two conditions:
 1.It should be in the Boyce-Codd Normal Form.
 2.And, the table should not have any Multi-valued Dependency.
39

 As you can see in the table above, student with s_id 1 has opted for two courses, Science and Maths, and has
two hobbies, Cricket and Hockey.

 Well the two records for student with s_id 1, will give rise to two more records, as shown below, because for one
student, two hobbies exists, hence along with both the courses, these hobbies should be specified.

 And, in the table above, there is no relationship between the columns course and hobby. They are independent
of each other.

 So there is multi-value dependency, which leads to un-necessary repetition of data and other anomalies as well.
40
41

You might also like