0% found this document useful (0 votes)
4 views

Normalization - InClass Demo

Normalization is the process of organizing database relations to eliminate redundancy and anomalies, ensuring well-structured relations that allow for error-free data manipulation. It addresses issues such as insertion, deletion, and modification anomalies caused by redundant data. The document outlines the steps of normalization, including functional dependencies and the various normal forms, to achieve a more efficient database structure.

Uploaded by

aliu104
Copyright
© © All Rights Reserved
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Normalization - InClass Demo

Normalization is the process of organizing database relations to eliminate redundancy and anomalies, ensuring well-structured relations that allow for error-free data manipulation. It addresses issues such as insertion, deletion, and modification anomalies caused by redundant data. The document outlines the steps of normalization, including functional dependencies and the various normal forms, to achieve a more efficient database structure.

Uploaded by

aliu104
Copyright
© © All Rights Reserved
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Normalization

The process of decomposing relations with anomalies to produce smaller, well-structured relations

Well-structured Relation
A relation that contains minimal redundancy and allows users to insert, modify, and delete the rows in a table without errors
or inconsistencies.

First, we will discuss: Problems we encounter if we don’t perform Normalization. In other words, problems we encounter when
there is redundancy.
Then, functional dependency.
Later: How to do Normzalization? How Normalization solves these problems?

What is redundancy?
Having multiple copies (i.e., repetition) of the same data in the database.
https://www.geeksforgeeks.org/the-problem-of-redundancy-in-database/

Do you see redundancy in the following table? Yes, the faculty id


FacultyID FacultyName DeptID DeptName DeptPhone
1 Srikanth 1 MIS 607-000-1111
2 Jinglu 1 MIS 607-000-1111
3 Surinder 1 MIS 607-000-1111
3 Murali 2 FIN 607-000-2222
4 Joey 2 FIN 607-000-2222

What are the issues with redundancy?


*Increases the size of database
* anomaly: An error or inconsistency that may result when a user attemps to update a table that contains redundant data.
1. Insertion Anomaly
2. Deletion Anomaly
3. Modification Anomaly

1. Insertion Anomaly:
Let us try to insert a new faculty member

FacultyID FacultyName DeptID DeptName DeptPhone


1 Srikanth 1 MIS 607-000-1111
2 Jinglu 1 MIS 607-000-1111
3 Surinder 1 MIS 607-000-1111
3 Murali 2 FIN 607-000-2222
4 Joey 2 FIN 607-000-2222
6 Mary 1 MIS 607-000-1111
7 John 1 MIS 607-000-1111
To insert redundant data for every new row is a data insertion problem or anomaly.
What information is needed for entering a new faculty? Faculty Info Dept Info
However, in a normal business situation, a new faculty should be allowed to be added without the need of a Dept.
What is the reason? redundancy, two different but related data in the same table

2. Deletion Anomaly:
Let us try to delete faculty data.

FacultyID FacultyName DeptID DeptName DeptPhone


1 Srikanth 1 MIS 607-000-1111
2 Jinglu 1 MIS 607-000-1111
3 Surinder 1 MIS 607-000-1111
3 Murali 2 FIN 607-000-2222
4 Joey 2 FIN 607-000-2222

What happenned? Dept info deleted along with student data

Loss of a related data when some other data is deleted is called deletion anomaly
3. Modification Anomaly

The MIS Dept phone number changes to 607-000-3333. Lets update!

FacultyID FacultyName DeptID DeptName DeptPhone


1 Srikanth 1 MIS 607-000-1111
2 Jinglu 1 MIS 607-000-1111
3 Surinder 1 MIS 607-000-8888 **If I change for one dept, I need to change for every in
3 Murali 2 FIN 607-000-2222
4 Joey 2 FIN 607-000-2222

Modification should be performed for every row corresponding to that entity instance; otherwise, the data will
be inconsistent.

How will normalization solve this problem?


The process of decomposing relations with anomalies to produce smaller, well-structured relations
Well-structured Relation: A relation that contains minimal redundancy and allows users to insert, modify, and delete the rows in a table
without errors or inconsistencies.

A technique of organizing the data into multiple related tables, to minimize DATA REDUNDANCY.

DIVIDE and RULE

FacultyID FacultyName DeptID DeptName DeptPhone


1 Srikanth 1 MIS 607-000-1111 normalization means breakdown
2 Jinglu 1 MIS 607-000-1111
3 Surinder 1 MIS 607-000-1111
3 Murali 2 FIN 607-000-2222
4 Joey 2 FIN 607-000-2222
FacultyID FacultyName DeptID
1 Srikanth 1
2 Jinglu 1 DeptID DeptName DeptPhone
3 Surinder 1 1 MIS 607-000-3333
4 Murali 2 2 FIN 607-000-2222
5 Joey 2
6 Mary 1

What about the Anomalies?


1. Insertion Anomaly? Let us try to insert a new faculty member mary
2. Deletion Anomaly? Let us try to delete faculty data. the strikethrough
3. Modification Anomaly? The MIS Dept phone number changes to 607-000-3333. Lets update! easily changed
pt, I need to change for every instance in the dept

rows in a table

lization means breakdown


Functional Dependency determinant vs candidate key
A constraint between two attributes in which the value of one attribute is determined by the value of another.
For a relation R, attribute B is functionally dependent on attribute A, if for every valid instance of A, that value A uniquely determines
the value of B.
· Representation: A → B
· Functional dependency is not a mathematical dependency: B cannot be computed from A.
· If you know the value of A, there can be only one value for B

Examples:
1. Bnumber -> Name, Age, Gender, Major, DateEnrolled B0001 --> Jim, 23, M, MIS this entire statement is con
2. DateCompleted EmpID, MIS333 -> a students course completion date of a course
The functional dependency in this statement implies that the date when a course is completed is determined by the identity of the
employee and the title of the course.

3. SSN --> Name, Address, BirthDate SSN is the determinant here


If someone looks up an SSN in the government database, can’t we find the corresponding person’s name, address, and birth date?
Yes. Hence, a person’s name, address, and birth date are functionally dependent on that person’s Social Security number.
There can be only one Name, one Address, and one Birthdate for each SSN.
4. Vin --> Make, Model, Color
The make, model, and the original color of a vehicle are functionally dependent on the vehicle identification number (VIN).
As above (3), there can be only one value of Make, Model, and Color associated with each VIN.
5. ISBN - --> FirstAuthor, Publisher, Title
The title of a book, the name of the first author, and the publisher are functionally dependent on the book’s international standard
book number (ISBN).
the attribute on the left hand side of the arrow = determinant
Determinant
The attribute on the left side of the arrow (→) in a functional dependency. BUNumber, SSN, VIN, and ISBN, are determinants in the
preceding three examples 1, 3, 4, 5, respectively.
EmpID and CourseTitle together become determinant for example 2.

Candidate Key
A candidate key is an attribute, or combination of attributes, that uniquely identifies a row in a relation. It satisfies the following properties.
1. Unique identification: For every row, the value of the key must uniquely identify that row. This property implies that each
nonkey attribute is functionally dependent on that key.
2. Nonredundancy: No attribute in the key can be deleted without destroying the property of unique identification.
All candidate keys are determinants. Not all determinants are candidate keys.

What are the candidate keys in the following tables?

EMPLOYEE1

The whole statement below is the functional dependency in EMPLOYEE1.


EmpID -> Name, DeptName, Salary the candidate key and the determinant are the same in this table
EmpID is the determinant in EMPLOYEE1
EmpID is the candidate key.

Pictorial Notation for Functional Dependencies in EMPLOYEE1

EMPLOYEE2
The two statements below represent the functional dependencies in EMPLOYEE2.
EmpID --> Name, DeptName, Salary
EmpID, CourseTItle --> Date Completed

The determinants are: (1) EmpID , and (2) EmpID,Course ID


Emp ID and Course Title is the candidate key.
candidate keys and determinants are not the same here

Pictorial Notation for Functional Dependencies in EMPLOYEE2


s entire statement is considered a functional dependency
Normalization Steps
Normalization can be accomplished and understood in stages, each of which corresponds to a normal form.
Normal form is a state of a relation that requires certain rules regarding relationships between attributes (or functio
1. First normal form
2. Second normal form
3. Third normal form
4. Boyce-Codd normal form
5. Fourth normal form
6. Fifth normal form

Let us start with a bill and think about a table to store the data.

If we were to store the bill data in a table, it will result in the following table.
What is the issue here? multivalued attribute or repeating groups
What is the consequence?

every bill is a row in the table


OrderID OrderDate CustID CusName CustAddress ProdID ProdDesc
1006 10/24/2015 2 Value Furniture Plano, TX 7 Dining Table
5 Writer's Desk
Entertainment
4 Center
Furniture
1007 10/25/2015 6 Gallery Boulder, CO 11 4-Dr Dresser
Entertainment
4 Center
convert this table to first normal form --> the relation does not have multival
Step 1: Convert to First Normal Form
A relation that has a primary key and in which there are no repeating groups.
a. Remove repeating groups.
b. Define a primary key that uniquely identifies each row.
OrderID OrderDate CustID CusName CustAddress ProdID ProdDesc
1006 10/24/2015 2 Value Furniture Plano, TX 7 Dining Table
1006 10/24/2015 2 Value Furniture Plano, TX 5 Writer's Desk
Entertainment
1006 10/24/2015 2 Value Furniture Plano, TX 4 Center
Furniture
1007 10/25/2015 6 Gallery Boulder, CO 11 4-Dr Dresser
Furniture Entertainment
1007 10/25/2015 6 Gallery Boulder, CO 4 Center
convert this table to first normal form --> the relation does not have multival

Relational Model
INVOICE
OrderID OrderDate CustID CusName CustAddress ProdID ProdDesc

What is the issue here? redundancy

What is the consequence for database? All anomalies apply.


1. Insertion Anomaly: Let us say the company wants to introduce a new product (Product ID 8).
Customer wants to change his order (orderID 1007). Results in data replication and data en
2. Deletion Anomaly: Customer wants to delete “Dining Table” from orderID 1006. Entire row should be de
Crucial Item information is lost.
3. Update Anomaly: Price increase of ProductID 4 to $750. Change has to propagate in all rows.

Why? in the 1NF you have partial functional dependency


Partial Functional Dependency: one or more nonkey attributes are functionally dependent on part (but not all) of th
primary key.
Non Key attributes OrderDate CustID CusName CustAddrProdDesc
Key attributes OrderID ProdID even tho both are key attributes, there are so
OrderID → OrderDate, CustID, CusName, and CustAddress ex. I only need order id for order date, I only n
ProdID → ProdDesc, ProdFinish, ProdStdPrice I need orderid and prodid for ordered quantit
Only OrderedQty depends on OrderID, ProdID i.e, OrderedQty → OrderID, ProdID

Step 2: Convert to Second Normal Form


A relation in first normal form in which every nonkey attribute is fully functionally dependent on the primary key.
Simply: the relation should be in first normal form and there should be no partial functional dependency.
a. Create a new relation for each primary key attribute that is a determinant in the partial dependency.
That attribute is the primary key in the new relation.
b. Move those nonkey attributes that are only dependent on this primary key attribute from the old relatio
to the new relation.

ORDER
OrderID OrderDate CustID CusName CustAddress

Order Line this database is in 2nd normal form


OrderID ProdID OrderedQty - it should be in 1nf
- should have no partial dependency
PRODUCT
ProdID ProdDesc ProdFinish ProdStdPrice

What is the problem here?

OrderID OrderDate CustID CusName CustAddress


1006 10/24/2015 2 Value Furniture Plano, TX
1006 10/24/2015 2 Value Furniture Plano, TX transitive dependency
1006 10/24/2015 2 Value Furniture Plano, TX non key attributes depend on other n
Furniture
1007 10/25/2015 6 Gallery Boulder, CO OrderID
Furniture
1007 10/25/2015 6 Gallery Boulder, CO key attribute

Step 3: Convert to Third Normal Form


A relation that is in second normal form and has no transitive dependencies.
a. Create a new relation for each determinant (CustomerID).
b. Move the attributes (customername, customer address) dependent on this determinant.
c. Keep customerID as the foreign key in the old relation (ORDER).
d. Associate two relations.

Final Relational Diagram After Normalization


CUSTOMER DB is in 3NF
CustID CusName CustAddress -- 2NF
-- no transitive
ORDER
OrderID OrderDate CustID

Order Line
OrderID ProdID OrderedQty

PRODUCT
ProdID ProdDesc ProdFinish ProdStdPrice
nds to a normal form.
between attributes (or functional dependencies are satisfied).

ProdFinish ProdStdPrice OrderedQty


Natural Ash 800.00 2
Cherry 325.00 2

Natural Maple 650.00 1

Oak 500.00

Natural Maple 650.00 3


elation does not have multivalued attributes (repeating groups), primary key needs to be present

ProdFinish ProdStdPrice OrderedQty


Natural Ash 800.00 2
Cherry 325.00 2

Natural Maple 650.00 1

Oak 500.00

Natural Maple 650.00 3


elation does not have multivalued attributes (repeating groups), primary key needs to be present

ProdFinish ProdStdPrice OrderedQty

uct (Product ID 8).


n data replication and data entry errors.
1006. Entire row should be deleted.

pagate in all rows.

dent on part (but not all) of the

ProdFinish ProdStdPrice OrderedQty


re key attributes, there are some attributes you can get with partial key attributes
rder id for order date, I only need prodid for proddesc
nd prodid for ordered quantity
endent on the primary key.
tional dependency.
in the partial dependency.

attribute from the old relation

tabase is in 2nd normal form


uld be in 1nf
d have no partial dependency

ve dependency
y attributes depend on other non-key attributes

OrderDate CustID CusName CustAddress

non key attribut non key attribut non key attri non key attribute
CustID --> CUsName CustAddress
key attribute non key attri non key attribute

determinant.

You might also like