0% found this document useful (0 votes)
26 views50 pages

DB Lecture 9&10

Uploaded by

njumbacharles
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views50 pages

DB Lecture 9&10

Uploaded by

njumbacharles
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 50

NORMALIZATION

Introduction of Database
Normalization

 Normalization is an important process in


database design that helps improve the
database’s efficiency, consistency, and
accuracy. It makes it easier to manage and
maintain the data and ensures that the database
is adaptable to changing business needs.
Introduction of Database
Normalization
What is Database Normalization?

Database normalization is the process of


organizing the attributes of the database to reduce
or eliminate data redundancy (having the same
data but at different places). Data redundancy
unnecessarily increases the size of the database
as the same data is repeated in many places.
Inconsistency problems also arise during insert,
delete, and update operations
Why do we need
Normalization?
 The primary objective for normalizing the
relations is to eliminate the below anomalies.
Failure to reduce anomalies results in data
redundancy, which may threaten data integrity
and cause additional issues as the database
increases. Normalization consists of a set of
procedures that assist you in developing an
effective database structure.
Why do we need
Normalization?
 Insertion Anomalies: Insertion anomalies occur
when it is not possible to insert data into a
database because the required fields are
missing or because the data is incomplete.

 Deletion anomalies: Deletion anomalies occur


when deleting a record from a database and can
result in the unintentional loss of data.
Why do we need
Normalization?
 Updating anomalies: Updating anomalies occur
when modifying data in a database and can result
in inconsistencies or errors.

 For example, if a database contains information


about employees and their salaries, updating an
employee’s salary in one record but not in all
related records could lead to incorrect
calculations and reporting.
Features of Database
Normalization
 Elimination of Data Redundancy: One of the
main features of normalization is to eliminate the
data redundancy that can occur in a database.
Data redundancy refers to the repetition of data
in different parts of the database.
 Ensuring Data Consistency: Normalization
helps in ensuring that the data in the database is
consistent and accurate.
Features of Database
Normalization
 Simplification of Data Management:
Normalization simplifies the process of managing
data in a database. By breaking down a complex
data structure into simpler tables, normalization
makes it easier to manage the data, update it, and
retrieve it.
 Improved Database Design: Normalization helps
in improving the overall design of the database. By
organizing the data in a structured and systematic
way, normalization makes it easier to design and
maintain the database.
Features of Database
Normalization
 Avoiding Update Anomalies: Normalization
helps in avoiding update anomalies, which can
occur when updating a single record in a table
affects multiple records in other tables.
Normalization ensures that each table contains
only one type of data and that the relationships
between the tables are clearly defined, which
helps in avoiding such anomalies.
Features of Database
Normalization
 Standardization: Normalization helps in
standardizing the data in the database. By
organizing the data into tables and defining
relationships between them, normalization helps
in ensuring that the data is stored in a consistent
and uniform manner.
Normal Form (NF)
 All attributes depend on the key, the whole key
and nothing but the key.
 1NF Keys and no repeating groups
 2NF No partial dependencies
 3NF All determinants are candidate keys
 4NF No multivalued dependencies
First Normal Form (1NF)
 A table is in the first normal form if
 The domain of each attribute contains only
atomic values, and
 The value of each attribute contains only a
single value from that domain.

In layman's terms. it means every column of


your table should only contain single values
Example
 For a library

Patron ID Borrowed books


C45 B33, B44, B55
C12 B56
1-NF Solution
Patron ID Borrowed book
C45 B33
C45 B44
C45 B33
C12 B56
Example
 For an airline

Flight Weekdays
UA59 Mo We Fr
UA73 Mo Tu We Th Fr
1NF Solution

Flight Weekday
UA59 Mo
UA59 We
UA59 Fr
UA73 Mo
UA73 We
… …
Implication for the ER model
 Watch for entities that can have multiple values
for the same attribute
 Phone numbers, …
 What about course schedules?
 MW 5:30-7:00pm
 Can treat them as atomic time slots
Functional dependency
Let X and Y be sets of attributes in a table T
 Y is functionally dependent on X in T iff for
each set x  R.X there is precisely one
corresponding set y R.Y
 Y is fully functional dependent on X in T if Y is
functional dependent on X and Y is not
functional dependent on any proper subset of X
Example
 Book table
BookNo Title Author Year
B1 Moby Dick H. Melville 1851
B2 Lincoln G. Vidal 1984

Author attribute is:


 functionally dependent on the pair
{ BookNo, Title}
 fully functionally dependent on BookNo
Why it matters
 table BorrowedBooks

BookNo Patron Address Due


B1 J. Fisher 101 Main Street 3/2/15
B2 L. Perez 202 Market Street 2/28/15

Address attribute is
 functionally dependent on the pair
{ BookNo, Patron}
 fully functionally dependent on Patron
Problems
 Cannot insert new patrons in the system until they
have borrowed books
 Insertion anomaly
 Must update all rows involving a given patron if he or
she moves.
 Update anomaly
 Will lose information about patrons that have returned
all the books they have borrowed
 Deletion anomaly
Second Normal Form (2NF)
 A table is in 2NF if
 It is in 1NF and
 no non-prime attribute is dependent on any
proper subset of any candidate key of the
table
 A non-prime attribute of a table is an attribute
that is not a part of any candidate key of the
table
 A candidate key is a minimal super key
Example
 Library allows patrons to request books that are
currently out

BookNo Patron PhoneNo


B3 J. Fisher 555-1234
B2 J. Fisher 555-1234
B2 M. Amer 555-4321
Example
 Candidate key is {BookNo, Patron}
 We have
 Patron → PhoneNo
 Table is not 2NF
 Potential for
 Insertion anomalies

 Update anomalies

 Deletion anomalies
2NF Solution
 Put telephone number in separate Patron table

BookNo Patron Patron PhoneNo


B3 J. Fisher J. Fisher 555-1234
B2 J. Fisher M. Amer 555-4321
B2 M. Amer
Third Normal Form
 A table is in 3NF if
 it is in 2NF and
 all its attributes are determined only by its
candidate keys and not by any non-prime
attributes
Example
 Table BorrowedBooks

BookNo Patron Address Due


B1 J. Fisher 101 Main Street 3/2/15
B2 L. Perez 202 Market Street 2/28/15

 Candidate key is BookNo


 Patron → Address
3NF Solution
 Put address in separate Patron table
BookNo Patron Due
B1 J. Fisher 3/2/15
B2 L. Perez 2/28/15

Patron Address
J. Fisher 101 Main Street
L. Perez 202 Market Street
Another example
 Tournament winners

Tournament Year Winner DOB


Indiana Invitational 1998 Al Fredrickson 21 July 1975

Cleveland Open 1999 Bob Albertson 28 Sept. 1968


Des Moines Masters 1999 Al Fredrickson 21 July 1975
 Candidate key is {Tournament, Year}
 Winner →DOB
Boyce-Codd Normal Form (3NF)
 Stricter form of 3NF
 A table T is in BCNF if
 for every one of its non-trivial dependencies
X → Y, X is a super key for T

 Most tables that are in 3NF also are in BCNF


Example
Manager Project Branch
Alice Alpha Austin
Alice Delta Austin
Carol Alpha Houston
Dean Delta Houston

 We can assume
 Manager → Branch
 {Project, Branch} → Manager
Example
Manager Project Branch
Alice Alpha Austin
Bob Delta Houston
Carol Alpha Houston
Alice Delta Austin

 Not in BCNF because Manager → Branch and


Manager is not a superkey
 Will decomposition work?
A decomposition (I)
Manager Project Manager Branch
Alice Alpha Alice Austin
Bob Delta Bob Houston
Carol Alpha Carol Houston
Alice Delta
 Two-table solution does not preserve the
dependency {Project, Branch} → Manager
A decomposition (II)
Manager Project Manager Branch
Alice Alpha Alice Austin
Bob Delta Bob Houston
Carol Alpha Carol Houston
Alice Delta Dean Houston
Dean Delta
 Cannot have two or more managers managing
the same project at the same branch
Multivalued dependencies
 Assume the column headings in a table
are divided into three disjoint groupings X,
Y, and Z
 For a particular row, we can refer to the
data beneath each group of headings as x,
y, and z respectively
Multivalued dependencies
 A multivalued dependency X =>Y occurs if
 For any xc actually occurring in the table and the list
of all the xcyz combinations that occur
in the table, we will find that xc is associated with the
same y entries regardless of z.
 A trivial multivalued dependency X =>Y is one where
either
 Y is a subset of X, or

 Z is empty (X  Y has all column headings)


Fourth Normal Form
 A table is in 4NF iff
 For every one of its non-trivial multivalued
dependencies X => Y, X is either:
 A candidate key or

 A superset of a candidate key


Example from Wikipedia
Restaurant Pizza DeliveryArea
Pizza Milano Thin crust SW Houston
Pizza Milano Thick crust SW Houston
Pizza Firenze Thin crust NW Houston
Pizza Firenze Thick crust NW Houston
Pizza Milano Thin crust NW Houston
Pizza Milano Thick crust NW Houston
Discussion
 The table has no non-key attributes
 Key is { Restaurant, Pizza, DeliveryArea}
 Two non-trivial multivalued dependencies
 Restaurant => Pizza
 Restaurant => DeliveryArea

since each restaurant delivers the same pizzas


to all its delivery areas
4NF Solution Restaurant DeliveryArea
Pizza Milano SW Houston
Pizza Firenze NW Houston
Pizza Milano NW Houston
 Two separate tables

Restaurant Pizza
Pizza Milano Thin crust
Pizza Milano Thick crust
Pizza Firenze Thin crust
Pizza Firenze Thick crust
An Example
Normalisation Example
 We have a table  Columns
representing orders in  Order
an online store  Product
 Each row represents  Quantity
an item on a
 UnitPrice
particular order
 Customer
 Primary key is
 Address
{Order, Product}
Functional Dependencies
 Each order is for a single customer:
 Order  Customer
 Each customer has a single address
 Customer  Address
 Each product has a single price
 Product  UnitPrice
 As Order  Customer and Customer  Address
 Order  Address
2NF Solution (I)
 First decomposition
 First table

Order Product Quantity UnitPrice

 Second table
Order Customer Address
2NF Solution (II)
 Second decomposition
 First table

Order Product Quantity


 Second table
Order Customer Address
 Third table
Product UnitPrice
3NF
 In second table

Order Customer Address

 Customer  Address
 Split second table into

Order Customer

Customer Address
Normalisation to 2NF
 Second normal form  To remove the first FD
means no partial we project over
dependencies on {Order, Customer,
candidate keys Address} (R1)
 {Order}  {Customer,
and
Address} {Order, Product, Quantity,
 {Product}  UnitPrice} (R2)
{UnitPrice}
Normalisation to 2NF
  To remove this we project over
R1 is now in 2NF, but
there is still a partial FD in {Product, UnitPrice} (R3)
R2 and
{Product}  {UnitPrice} {Order, Product, Quantity} (R4)
Normalisation to 3NF
 R has now been split into  To remove
3 relations - R1, R3, and {Order}  {Customer} 
R4 {Address}
 R3 and R4 are in 3NF  we project R1 over
 R1 has a transitive FD  {Order, Customer}
on its key  {Customer, Address}
Normalisation
 1NF:
 {Order, Product, Customer, Address, Quantity,
UnitPrice}
 2NF:
 {Order, Customer, Address}, {Product, UnitPrice},
and {Order, Product, Quantity}
 3NF:
 {Product, UnitPrice}, {Order, Product, Quantity},

{Order, Customer}, and {Customer, Address}

You might also like