0% found this document useful (0 votes)
2 views

Lesson5-NORMALIZATION(midtrem)

Normalization is the process of organizing data in a database to minimize redundancy and ensure data dependency, introduced by Edgar F. Codd in the 1970s. Its goals include eliminating redundant data, ensuring logical data storage, and avoiding modification anomalies such as update, insertion, and deletion anomalies. The process involves several normal forms, with the Third Normal Form (3NF) being a common standard for a normalized database.

Uploaded by

razel gicale
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lesson5-NORMALIZATION(midtrem)

Normalization is the process of organizing data in a database to minimize redundancy and ensure data dependency, introduced by Edgar F. Codd in the 1970s. Its goals include eliminating redundant data, ensuring logical data storage, and avoiding modification anomalies such as update, insertion, and deletion anomalies. The process involves several normal forms, with the Third Normal Form (3NF) being a common standard for a normalized database.

Uploaded by

razel gicale
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 29

NORMALIZATION

Definition
Normalization:
is the process of efficiently organizing data in
a database
is the process of organizing the fields and
tables of a relational database to minimize
redundancy and ensuring data dependency.
History
Edgar F. Codd, the inventor of the
relational model, introduced the concept of
normalization and what we now know as the
First Normal Form (1NF) in 1970.
Second Normal Form (2NF) in 1971.
Third Normal Form (3NF) in 1971
Boyce-Codd Normal Form (BCNF) in 1974.
(Codd and Raymond F. Boyce)
 Sixth Normal Form (6NF) in 2002 (introduced
by Chris Date, Hugh Darwen, and Nikos
Lorentzos)
Goals of Normalization
There are two goals of the normalization
process:
eliminating redundant data (for example,
storing the same data in more than one
table)
ensuring data dependencies make sense
(only storing related data in a table).
Both of these are worthy goals as they
reduce the amount of space a database
consumes and ensure that data is logically
stored.
Normalization
a relational database table (the
computerized representation of a relation)
is often described as "normalized" if it is in
the Third Normal Form.
Most 3NF tables are:
 free of insertion
 update
 and deletion anomalies
 i.e. in most cases 3NF tables adhere to
BCNF, 4NF, and 5NF (but typically not 6NF).
Objectives of normalization
Free database of modification anomalies
Minimize redesign when extending the
database structure
Make the data model for informative to
users
Avoid bias towards any particular patterns
of querying (neutral)
Objectives of normalization
Free the database of modification
anomalies
When an attempt is made to modify (update,
insert into, or delete from) a table, undesired
side-effects may follow.
Not all tables can suffer from these side-effects;
rather, the side-effects can only arise in tables
that have not been sufficiently normalized.
An insufficiently normalized table might have one
or more of the following characteristics:
 Update anomaly
 Insertion anomaly
 Deletion anomaly
Update anomaly
 The same information can be expressed on multiple
rows; therefore updates to the table may result in
logical inconsistencies.
 For example:
 each record in an "Employees' Skills" table might contain
an Employee ID, Employee Address, and Skill;
 thus a change of address for a particular employee will
potentially need to be applied to multiple records (one for
each of his skills).
 If the update is not carried through successfully—if, that
is, the employee's address is updated on some records but
not others—then the table is left in an inconsistent state.
 Specifically, the table provides conflicting answers to
the question of what this particular employee's address
is.
 This phenomenon is known as an update anomaly.
Insertion Anomaly
There are circumstances in which certain facts
cannot be recorded at all.
For example:
 each record in a "Faculty and Their Courses" table
might contain a Faculty ID, Faculty Name, Faculty
Hire Date, and Course Code
thus we can record the details of any faculty
member who teaches at least one course, but we
cannot record the details of a newly hired faculty
member who has not yet been assigned to teach any
courses except by setting the Course Code to null.
 This phenomenon is known as an insertion
anomaly.
Deletion anomaly
Under certain circumstances, deletion of data
representing certain facts necessitates deletion
of data representing completely different facts.
The "Faculty and Their Courses" table described in
the previous example suffers from this type of
anomaly, for if a faculty member temporarily
ceases to be assigned to any courses, we must
delete the last of the records on which that faculty
member appears, effectively also deleting the
faculty member.
This phenomenon is known as a deletion
anomaly.
Objectives of normalization
Minimize redesign when extending the
database structure
When a fully normalized database structure
is extended to allow it to accommodate new
types of data, the pre-existing aspects of the
database structure can remain largely or
entirely unchanged. As a result, applications
interacting with the database are minimally
affected.
Objectives of normalization
Make the data model more informative
to users
Normalized tables, and the relationship
between one normalized table and another,
mirror real-world concepts and their
interrelationships.
Objectives of normalization
Avoid bias towards any particular
pattern of querying
Normalized tables are suitable for general-
purpose querying.
This means any queries against these
tables, including future queries whose
details cannot be anticipated, are
supported. In contrast, tables that are not
normalized lend themselves to some types
of queries, but not others.
 For example:
 consider an online bookseller whose customers
maintain wishlists of books they'd like to have.
 For the obvious, anticipated query—what books does
this customer want?—it's enough to store the
customer's wishlist in the table as, say, a
homogeneous string of authors and titles.
 With this design, though, the database can answer
only that one single query. It cannot by itself answer
interesting but unanticipated queries:
 What is the most-wished-for book?
 Which customers are interested in WWII espionage?
 How does Lord Byron stack up against his
contemporary poets?
 Answers to these questions must come from special
adaptive tools completely separate from the database.
One tool might be software written especially to
handle such queries.
 This special adaptive software has just one single
purpose: in effect to normalize the non-normalized
field.
 Unforeseen queries can be answered trivially, and
entirely within the database framework, with a
normalized table.
Normal forms
The database community has developed a
series of guidelines for ensuring that
databases are normalized these are referred
to as normal forms.
are numbered from one (the lowest form of
normalization, referred to as first normal
form or 1NF) through five (fifth normal form
or 5NF).
In practical applications, you'll often see 1NF,
2NF, and 3NF along with the occasional 4NF.
First Normal Form (1nf)
First normal form (1NF) sets the very basic
rules for an organized database:
Eliminate duplicative columns from the same
table.
Create separate tables for each group of
related data and identify each row with a
unique column or set of columns (the
primary key).
The primary key for the repeating group is
usually a composite key.
Example (Unnormalized)
 Table: SalesOrders (Un-normalized)
 SalesOrderNo
 Date
 CustomerNo
 CustomerName
 CutomerAddress
 ClerkNo
 ClerkName
 Item1Description
 Item1Quantity
 Item1UnitPrice
 Item2Description
 Item2Quantity
 Item2UnitPrice
 Item3Description
 Item3Quantity
 Item3UnitPrice
 Total
Example (1NF)
Table: SalesOrders Table: OrderItems
 SalesOrderNo  SalesOrderNo
 Date  ItemNo
 CustomerNo  ItemDescription
 CustomerName  ItemQuantity
 CustomerAddress  ItemUnitPrice
 ClerkNo
 ClerkName
 Total
Second normal form (2nf)
 Second normal form (2NF) further addresses the
concept of removing duplicative data:
 Meet all the requirements of the first normal form.
 Remove subsets of data that apply to multiple rows of a
table and place them in separate tables. (Remove
partial dependencies)
 Functional dependency: The value of one attribute
depends entirely on the value of another.
 Partial dependency: An attribute depends on only part of
the primary key. (The primary key must be a composite
key.)
 Transitive dependency: An attribute depends on an
attribute other than the primary key.
 Create relationships between these new tables and
their predecessors through the use of foreign keys.
Example (2NF)
Table: OrderItems Table:
 SalesOrderNo InventoryItems
 ItemNo  ItemNo
 ItemQuantity  ItemDescription
 ItemUnitPrice
Third normal form (3nf)
Third normal form (3NF) goes one large
step further:
Meet all the requirements of the second
normal form.
Remove columns that are not dependent
upon the primary key. (Remove transitive
dependencies)
Start a new table for the transitively
dependent attribute and the attribute it
depends on.
Keep a copy of the key attribute in the
original table.
Example (3NF)
Table: SalesOrders Table: Customers
 SalesOrderNo  CustomerNo
 Date  CustomerName
 CustomerNo  CustomerAddress
 ClerkNo
 Total Table: Clerks
 ClerkNo
 ClerkName
Example (Final Tables)
 Table: SalesOrders  Table: InventoryItems
 SalesOrderNo  ItemNo
 Date  ItemDescription
 CustomerNo
 ClerkNo  Table: Customers
 Total  CustomerNo
 CustomerName
 Table: OrderItems  CustomerAddress
 SalesOrderNo
 ItemNo  Table: Clerks
 ItemQuantity  ClerkNo
 ItemUnitPrice  ClerkName
Boyce-Codd Normal Form
(BCNF or 3.5NF)

The Boyce-Codd Normal Form, also referred


to as the "third and half (3.5) normal
form", adds one more requirement:
 Meet all the requirements of the third
normal form.
Every determinant must be a candidate
key.
Fourth normal form
Finally, fourth normal form (4NF) has one
additional requirement:
Meet all the requirements of the third normal
form.
A relation is in 4NF if it has no multi-valued
dependencies.
Remember, these normalization guidelines
are cumulative. For a database to be in
2NF, it must first fulfill all the criteria of a
1NF database.
Other examples: Unnormalized
Table
1NF
2nf
3nf

You might also like