0% found this document useful (0 votes)
5 views

Lesson04 Normalization

Uploaded by

mwangibrian1293
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lesson04 Normalization

Uploaded by

mwangibrian1293
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

4 Lesson 4: Normalization

Normalization is a database design technique that reduces data redundancy and eliminates undesirable
characteristics like insert, update and delete anomalies. Normalization rules divides larger tables into
smaller tables and links them using relationships. The purpose of normalization is to eliminate
redundant (repetitive) data and ensure data is stored logically.
Reading Assignment: Watch Youtube videos on “Database Normalization, Anomalies”

Update anomalies − If data items are scattered and are not linked to each other properly, then it could
lead to strange situations. For example, if you try to update one data item having its copies scattered
over several places, a few instances get updated properly while a few others are left with old values.
Such instances leave the database in an inconsistent state.

Deletion anomalies – If you try to delete a record, but some parts are left undeleted because the data is
also saved somewhere else.

Insert anomalies – If you try to insert data in a record that does not exist at all.

Normalization Concepts

A Key is a value used to identify a record in a table uniquely. A key could be a single column or
combination of multiple columns. Columns in a table that are NOT used to identify a record
uniquely are called non-key columns.

A primary key is a single column value used to identify a database record uniquely. It has following
characteristics

 A primary key cannot be NULL


 A primary key value must be unique
 The primary key values should rarely be changed
 The primary key must be given a value when a new record is inserted.
 A composite key is a primary key composed of multiple columns used to identify a record
uniquely

A foreign key references the primary key of another table/relation. It has the following
characteristics

 A foreign key can have a different name from its primary key
 It ensures rows in one table have corresponding rows in another
 Unlike the Primary key, they do not have to be unique. (1:1, 1:m)
 Foreign keys can be null even though primary keys can not

Multivalued dependency occurs when two attributes in a table are independent of each other but,
both depend on a third attribute.

1
The process of Normalization - Normal Forms (NF)

1NF A relation is in 1NF if it contains only atomic values.

The first normal form signifies that each cell of the table must only have single value.
Therefore, each intersection of rows and columns must hold atomic values. For example: If
we have a column name phone number then each row for that column must save only single
phone number.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional
dependent on the primary key.

In simple words, if the table is representative of multiple different entities then it should be
broken down into their own entities. For example: If we have a table (Student ID, Student
Name, ProgramID, Program name, ClubID, Club name) this is representing information about
each student enrolled in each program, and subscribed to different clubs. Since it is a
representative of three different entities it must be normalized into 2NF form.

3NF A relation will be in 3NF if it is in 2NF and no transitive dependency exists.

Columns in a table should depend on the primary key non-transitively (Each attribute must
directly depend upon the primary key). For example, if we have a table (Transaction ID, price,
quantity, total_sale) here the total sale is the product of price and quantity (price*quantity).
Hence sale is transitively dependent on Transaction ID which is a primary key here.

4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.

5NF A relation is in 5NF if it is in 4NF and does not contain any join dependency and joining
should be lossless.

NB: In most practical applications, normalization achieves its best in 3rd Normal Form.

Simplified process
1NF Eliminate repeating groups in individual tables.
Create a separate table for each set of related data.
Identify each set of related data with a primary key.
2NF Create separate tables for sets of values that apply to multiple records.
Relate these tables with a foreign key.
3NF Eliminate fields that do not depend on the key

Case study Scenarios;


a) Normalizing a conceptual model
b) Normalizing a table of value
c) Normalizing a set of attributes

Example 1: Consider the following table, containing information on different clubs that students
are subscribed to:

2
student_name email program Clubs
Martha Wood [email protected] BIT PEC, CU, IT
Albert Powell [email protected] DIT IT
Chris Hall [email protected] BPH RC, CU
Brandon Griffin [email protected] BIT IT, WC

INF
The relation should only contain atomic values with each tuple identified using a primary key
student_no student_name email program club
1 Martha Wood [email protected] BIT PEC
1 Martha Wood [email protected] BIT CU
1 Martha Wood [email protected] BIT IT
2 Albert Powell [email protected] DIT IT
3 Chris Hall [email protected] BPH RC
3 Chris Hall [email protected] BPH CU
4 Brandon Griffin [email protected] BIT IT
4 Brandon Griffin [email protected] BIT WC

2NF

Create separate tables for sets of values that apply to multiple records and then relate the tables using
primary key - foreign key relationships
student_no student_name email
1 Martha Wood [email protected]
2 Albert Powell [email protected]
3 Chris Hall [email protected]
4 Brandon Griffin [email protected]

club_no club
1 PEC program_no program
2 CU 1 BIT
3 IT 2 DIT
4 RC 3 BPH
5 WC

3NF

Remove the attributes that do not depend on the key (ie. remove transitive dependencies)

For example, consider a table with columns A, B, and C, where column A is the primary key. If B
is functionally dependent on A (A → B) and C is functionally dependent on B (B → C), then C is
transitively dependent on A via B (provided that A is not functionally dependent on B or C).

If a transitive dependency exists on the primary key, then the table is not in 3NF.

3
Example: suppose the extended student data is stored as follows;
student_no student_name email town postal_code
1 Martha Wood [email protected] Meru 60200
2 Albert Powell [email protected] Embu 60100
3 Chris Hall [email protected] Nanyuki 10400
4 Brandon Griffin [email protected] Meru 60200
We observe that town can be identified by student_no as well as postal_code itself. However, neither
postal_code is a superkey nor is town a prime attribute. Additionally, student_no → postal_code →
town, indicates that there exists transitive dependency. To bring this relation into third normal form,
we break the relation into two relations as follows.
student_no student_name email postal_code
1 Martha Wood [email protected] 60200
2 Albert Powell [email protected] 60100
3 Chris Hall [email protected] 10400
4 Brandon Griffin [email protected] 60200

postal_code Town
60200 Meru
60100 Embu
10400 Nanyuki

Normalizing a set of attributes

Example: consider the following data captured by a certain hospital.

Patientname, patientno, pphone, dob, wardname, doctor, doctorno, treatmentgiven, dateoftreatment,


amountpaid

Identify the separate entities in the set of attributes given using the concept of Function dependency.

Patients => patientno, Patientname, pphone, dob

Wards=>wardno, wardname

Doctors=>doctorno, doctor

Treatments=>tno, treatmentgiven, dateoftreatment, doctorno, patientno

Payments=>paymentno, amountpaid, patientno

Lab 3: Extra Examples and Questions on Normalization


i. Find the attached material “normalization_notes_and_exercises.pdf” for further reading.
ii. Attempt to answer the questions

You might also like