Vincent
Vincent
OF IT SECTION
LECTURER : MR MBUMWAE
Normalization is a systematic approach to organizing data in a database to minimize and improve data
integrity.
First Normal Form (1NF)
First Normal Form (1NF) is the most basic level of database normalization. A table is said to be in 1NF if it
satisfies the following conditions:
1. Atomic Values: Each column (attribute) in the table must contain only atomic (indivisible) values. This
means that each cell should hold a single value, not a list, set, or any other composite structure.
2. Unique Column Names: Each column in the table must have a unique name to avoid ambiguity.
3. Orderless Rows: The order of rows in the table does not matter. Each row must be uniquely identifiable,
typically through a primary key.
4. No Repeating Groups: There should be no repeating groups of columns
1. It Must Be in 1NF: The table must already satisfy the rules of 1NF (atomic values, unique column
names, no repeating groups, etc.).
2. No Partial Dependencies: Every non-prime attribute (an attribute not part of the primary key) must be
fully functionally dependent on the entire primary key, not just a part of it. This means that if the primary
key is composite (made up of multiple columns), no non-prime attribute should depend on only one part
of the key.
1. Orders Table
Order_I D Customer_Name Book_Title Quantity Date
2. Books Table
1. It Must Be in 2NF: The table must already satisfy the rules of 2NF (no partial dependencies).
2. No Transitive Dependencies: Every non-prime attribute (an attribute not part of the primary key) must
be directly dependent on the primary key, and not indirectly dependent through another non-prime
attribute. In other words, there should be no situation where a non-prime attribute depends on another
non-prime attribute
1. Customers Table
1 John Doe
2 Jane Smith
2. Authors Table
Author_ID First_Name Last_Name Other_Name
1 Alice Bakers
2 Bob Marks
3. Publishers Table
Publisher_ID Publisher_Name
1 ABC Pub
2 XYZ Pub
4. Books Table
2 Data Science 2 2 30
3 Machine Learning 1 1 70
5. Orders Table
Order_ID Customer_I D Book_ID Quantity Order_Dat
e
001 1 1 2 2024-05-13
002 2 2 1 2024-05-14
003 1 3 1 2024-05-13
a.author_id where o.customer_id = (select customer_id from customers where first_name = 'john');
c. update books set price = price * 1.10 where publisher_id = (select publisher_id from publishers
where publisher_name = 'XYZ Pub');
select * from books where publisher_id = (select publisher_id from publishers where
publisher_name = 'XYZ Pub');
Importance of Normalization in Database Design
1. Functional Dependencies:
Functional dependencies describe the relationship between attributes in a database. For example, if
attribute A determines attribute B, knowing A allows us to uniquely identify B. This principle guides the
decomposition of tables, ensuring data is stored in one place, reducing redundancy. For instance, in a
bookstore database, a book's ISBN can serve as a primary key, linking all related data (title, author,
price) together.
While normalization improves data integrity and reduces redundancy, it can increase query complexity
and impact performance. Retrieving data from multiple tables often requires complex joins, which can
slow down response times, especially in large databases. Database designers must balance normalization
with performance considerations, ensuring the database remains efficient for its intended use.
BCNF is a stricter version of the Third Normal Form (3NF) and addresses certain types of redundancy
that can still exist in 3NF. It requires that every determinant in a relation is a candidate key. This
eliminates update, insertion, and deletion anomalies. For example, if a book's price changes, storing the
price in only one place prevents inconsistencies across the database.
Concurrency control ensures data consistency and integrity when multiple transactions occur
simultaneously. As databases are often accessed by multiple users at the same time, implementing
mechanisms to prevent conflicts and maintain data accuracy is crucial.
2PL ensures serializability in transactions by requiring all locks to be acquired before any changes are
made to the database. It operates in two phases:
This structure prevents issues like lost updates and dirty reads, ensuring consistent transaction
execution.
Deadlocks occur when two or more transactions wait indefinitely for resources held by each other.
Techniques like wait-die and wound-wait mitigate this risk:
- Wait-Die: Older transactions wait for younger ones to release locks, while younger transactions are
aborted if they request locks held by older transactions.
- Wound-Wait: Older transactions preempt younger ones, forcing them to release locks.
3. Snapshot Isolation:
Snapshot isolation allows transactions to read a consistent snapshot of the database at a specific point
in time. This enables multiple transactions to operate simultaneously without interfering with each other,
reducing the need for locks and improving performance while maintaining data integrity.
Conclusion
Normalization and concurrency control are fundamental to effective database design and management.
Normalization ensures data integrity and reduces redundancy, while concurrency control mechanisms
like 2PL, deadlock prevention, and snapshot isolation maintain consistency and performance in multi-
user environments. By carefully implementing these principles, database designers can create robust,
efficient, and reliable systems, leading to better data management and user satisfaction.
Modern database management systems (DBMS) now include automated tools that recommend and
apply normalization techniques, reducing the manual effort required by database designers.
Techniques like multi-version concurrency control (MVCC) and optimistic concurrency control
(OCC) are increasingly used to enhance performance in high-traffic databases.
AI and machine learning are being integrated into DBMS to optimize query performance, especially in
highly normalized databases with complex joins.
4. Distributed Databases:
With the rise of distributed databases, normalization and concurrency control techniques are being
adapted to handle data spread across multiple nodes, ensuring consistency and performance in
decentralized environments (Cockroach Labs, 2023).
5. Blockchain Integration:
Blockchain technology is being explored for concurrency control in distributed databases, providing
immutable transaction logs and enhancing data integrity (Nakamoto, 2008).
By staying updated with these advancements, database professionals can continue to design systems that
are both efficient and reliable, meeting the demands of modern data-driven applications.
References
Bernstein, P. A., Hadzilacos, V., & Goodman, N. (1987). Concurrency Control and Recovery
in Database Systems. Addison-Wesley.
- Berenson, H., Bernstein, P. A., Gray, J., Melton, J., O'Neil, E. J., & O'Neil, P. E. (1995). A Critique of
ANSI SQL Isolation Levels. ACM SIGMOD Record, 24(2), 1-10.
- Codd, E. F. (1970). A Relational Model of Data for Large Shared Data Banks. Communications of the
ACM, 13(6), 377-387.
- Elmasri, R., & Navathe, S. B. (2016). Fundamentals of Database Systems(7th ed.). Pearson.
- Silberschatz, A., Korth, H. F., & Sudarshan, S. (2011). Database System Concepts(6th ed.). McGraw-
Hill.