NORMALIZATION
NORMALIZATION
Introduction to Normalization:
Why Normalize?
1. Reducing Redundancy: This means minimizing duplicate data which can lead to less
disk space usage and ensures that data modifications (updates, deletes, inserts) do not
lead to data inconsistency.
2. Eliminating Inappropriate Dependencies: Organize data such that dependencies
between tables are logical and appropriate. This helps in maintaining consistency and
integrity of data.
3. Improving Data Integrity: Normalization by its structure enforces data validation
and integrity constraints, which goes a long way in protecting the data.
4. Simplification of Data Modification: In a normalized database, the structure is such
that modifications to the data (inserts, updates, deletes) can be made in one place
without unexpected side effects rippling to other parts of the database.
Break down the key concepts using an everyday example for clarity:
1. Reduction of Redundancy:
"Imagine if every time a customer at a bookstore buys a book, the cashier writes down
the customer's entire information along with the book details on every receipt. If the
customer buys ten books, their information is recorded ten times. That's redundancy.
Normalization helps us store the customer's information just once and refer to it when
needed."
2. Logical Data Dependencies:
"In our bookstore, the book's price should solely depend on the book itself, not on the
customer buying it. If data dependencies are illogical, we might end up recording
incorrect prices under different customers. Normalization ensures that all
dependencies within our data are logical and make sense."
3. Data Integrity and Accuracy:
"Data accuracy is crucial. For example, if a publisher changes the name, we should be
able to update it in one place and have that change reflected throughout the database.
Normalization arranges data to maintain this high level of integrity and accuracy."
4. Simplified Database Design:
"Normalized databases are easier to manage, update, and query. Since each data
element is stored at one place, operations like updates, inserts, and deletes become
more straightforward and less error-prone."
Demonstration:
Problem Statement:
In a restaurant, we have a database that initially records for each order: the Order ID,
Customer Name, Customer Address, Item Ordered, Quantity, and Price. This table is not
normalized, and as a result, data redundancy and update anomalies occur, making data
management inefficient and error-prone.
First Normal Form (1NF)
Definition: A table is in the First Normal Form if all underlying columns are atomic,
meaning each column contains unique and indivisible values, and each row of the database
table must be unique.
Problem Solved: Eliminates duplicate columns and defines data in a tabular format where
each column has unique values.
Real-World Example:
Initial Table (Not in 1NF):
Order ID: 001, Customer Name: John Doe, Customer Address: 123 Elm St, Items
Ordered: [Burger, Fries], Quantity: [2,1], Price: [10, 5]
Order ID: 001, Customer Name: John Doe, Customer Address: 123 Elm St, Item
Ordered: Burger, Quantity: 2, Price: 10
Order ID: 001, Customer Name: John Doe, Customer Address: 123 Elm St, Item
Ordered: Fries, Quantity: 1, Price: 5
Benefits:
Each item is listed separately with associated quantity and price, eliminating groupings
within fields. This prevents data redundancy and ensures each data piece is stored
distinctly.
Definition: A table is in Second Normal Form if it's in First Normal Form and all non-key
attributes are fully dependent on the primary key of the table.
Problem Solved: Reduces data redundancy and removes partial dependency; all non-key
attributes are now fully dependent on the primary key.
Real-World Example:
From 1NF, note that the customer information is repeated for the same order, creating
redundancy.
Benefits:
Customer information is separated from the order details, significantly reducing data
duplication. Changes in customer addresses are made in just one place, and the integrity of
the order details is maintained independently.
Definition: A table is in Third Normal Form if it is in Second Normal Form and all its
columns are not transitively dependent on the primary key.
Problem Solved: Eliminates transitive dependency to improve data integrity and ensure that
non-key attributes are directly dependent only on the primary key.
Real-World Example:
If 'Price' of items depends on 'Item Ordered' and not directly on the 'Order ID', a transitive
dependency exists.
Benefits:
Prices are linked directly to items, not orders. If a price change occurs, it only needs to be
updated in the Item Details Table. This organization prevents errors and ensures robust data
consistency across the database.
By applying these normal forms, the restaurant's database prevents inconsistencies, avoids
redundancy, easy updating of customer addresses, and ensures data integrity when updating
prices. Each step in normalization ensures that the database remains efficient, consistent, and
easy to maintain.
Interactive Session:
Ask participants to identify issues in a sample non-normalized table and to suggest how to
normalize it. This encourages engagement and helps solidify understanding.
Using this structured approach, you can effectively convey the importance and the
methodology behind normalization in databases, fitting for a demo interview class.
Table: Orders
Problems in 0NF:
The table contains multiple values in a single cell (repeating groups), for instance,
'Burger (2, $10), Fries (1, $5)' under the OrderDetails column.
For each item in an order, multiple pieces of data (item name, quantity, price) are
stored within a single cell.
Data is not atomic, and arranging this information in queries or generating reports can
be exceedingly complex and inefficient because SQL operations normally assume one
value per cell.
Normalized Table:
In moving from 0NF to 1NF, the primary task is breaking down grouped information into
individual records. This process helps lay the foundational groundwork necessary for
subsequent normalization forms (2NF and 3NF), where relationships within data are further
refined and optimized.
The table provided is already in the First Normal Form (1NF) because each record is unique,
and all entries are atomic. However, it may still suffer from some redundancy and potential
for update anomalies due to partial dependencies.
Redundancy: CustomerName is repeated with each item in an order. This does not
only increase the storage space but also creates potential for update anomalies. For
instance, if 'John Doe' changes his name, it would require updates in multiple rows.
Partial Dependency: Attributes CustomerName depend only on OrderID and not on
Item. This is a classic partial dependency.
Definition: A table is in 2NF if it is in 1NF and there is no partial dependency of data on any
subset of a candidate key.
Solution:
1. Orders Table: This table contains the unique OrderID and the associated customer
information.
o Columns: OrderID (PK), CustomerName
OrderID CustomerName
001 John Doe
002 Sarah Lee
2. Order Details Table: This table lists each item in the orders along with its quantity
and price.
o Columns: OrderID (FK), Item, Quantity, UnitPrice
By decomposing the original table into these two tables, we ensure that each non-key
attribute in each table only depends on the primary key, thus meeting the criteria for 2NF and
enhancing the overall quality and maintainability of the database.
Once a database table has achieved First Normal Form (1NF), where the table has no
repeating groups and only atomic values, it may still face issues with data redundancy and
potential anomalies due to partial dependencies. Partial dependency occurs when attribute
values in a table depend only on part of a composite primary key.
Redundancy: Notice that 'John Doe' is repeated for multiple courses, along with the
associated CourseName and Instructor.
Partial Dependency: Attributes 'CourseName' and 'Instructor' depend only on the
CourseID, not on the StudentID. Hence, this partial dependency needs addressing.
Update Anomaly: If instructor 'Mr. Smith' changes for 'Math', it needs multiple
updates, which is error-prone.
Insertion Anomaly: You cannot add a new course without assigning a student due to
the composition of the primary key (StudentID, CourseID).
1. Students Table
o Contains student-specific information.
o Columns: StudentID (PK), StudentName
StudentID StudentName
001 John Doe
002 Sarah Lee
003 Mike Chen
2. Courses Table
o Contains course-specific information.
o Columns: CourseID (PK), CourseName, Instructor
CourseID CourseName Instructor
C01 Math Mr. Smith
C02 History Ms. Jane
C03 Science Dr. Brown
3. Enrollment Table
o Links students and courses, managing where students are enrolled.
o Columns: StudentID (FK), CourseID (FK)
StudentID CourseID
001 C01
001 C02
002 C01
003 C03
By restructuring into multiple tables based on complete dependencies, 2NF focuses on the
relationships within data, thereby enhancing data integrity and reducing the storage footprint
of redundancies found in 1NF.