Unit 8 (Normalization)
Unit 8 (Normalization)
Logical Schema
A logical schema of a database represents the structure of the data at a logical level. It defines how the data is
organized and how relationships between different data entities are established. Unlike the physical schema,
which deals with the storage of data, the logical schema focuses on what data is stored, its types, and the
relationships among the data.
Example:
In a university database, a logical schema might include:
• Student Table: (StudentID, FirstName, LastName, DateOfBirth)
• Course Table: (CourseID, CourseName, Credits)
• Enrollment Table: (StudentID , CourseID, EnrollmentDate)
This schema defines how data about students, courses, and enrollments is organized and related logically.
Relational Schema
A relational schema is a blueprint of a database that outlines the structure of tables (relations) and how they
are related. It is part of the relational database model and defines the database in terms of tables, columns,
and the relationships between them.
A relational instance is a snapshot of data that can change over time as records are added, modified, or
deleted. Each instance reflects the data's state at a given moment. Each tuple in a relational instance
corresponds to a single entry or record in the table. For example, in a Student table, each row represents one
student's details.
Simply, Relational instances refer to the actual data or records contained in a relation (table) of a relational
database at a specific point in time. It represents the current state of the data within that table and consists of
a collection of tuples (rows) that adhere to the defined schema of the relation.
Keys in a Schema
Candidate Key:
• A set of attributes that can uniquely identify a row in a table. A table can have multiple candidate keys,
and each one is a potential choice to become the primary key.
Primary Key:
• A candidate key that is chosen to uniquely identify each record in the table. It must contain unique
values and cannot contain NULL values.
Alternate Key:
• Any candidate key that is not chosen as the primary key is called an alternate key. It also uniquely
identifies records but is not used as the primary key.
Foreign Key:
• An attribute (or set of attributes) in one table that refers to the primary key of another table. It is used
to establish relationships between tables.
Composite Key:
• A key that consists of two or more attributes that together uniquely identify a row. For example, in an
Enrollment table, StudentID and CourseID together might form a composite key.
Compound Key:
• Often used interchangeably with a composite key, but technically refers to a key made up of multiple
attributes that may or may not be unique by themselves.
1. Entities to Tables:
• Each entity in the ER diagram becomes a table in the logical schema.
• The attributes of the entity become the columns of the table.
2. Attributes to Columns:
• Simple attributes (e.g., FirstName, LastName) are directly mapped to columns in the corresponding
table.
• Composite attributes (e.g., FullAddress) are split into their constituent attributes (e.g., Street, City,
ZipCode).
• Multivalued attributes (e.g., PhoneNumbers) are transformed into separate tables to maintain
normalization. A new table is created with a foreign key referencing the original entity.
3. Primary Keys:
• Identify the primary key for each table. This is typically the attribute that uniquely identifies each
record in the entity.
• If an entity has a composite key, it should be reflected in the logical schema as a primary key consisting
of multiple columns.
Note –
For Weak entities, Create a separate Schema including…
An Anomaly in the context of databases refers to an inconsistency or unexpected behavior that occurs when
performing operations such as insertion, updating, or deletion of data in a poorly designed database table.
Anomalies can lead to data redundancy, inconsistency, and integrity issues. They typically arise when a
database is not properly normalized, meaning that the data is not organized efficiently.
1. Insertion Anomaly:
An insertion anomaly occurs when certain attributes cannot be inserted into the database without the
presence of other attributes. This often happens when the table design does not allow for the
independent addition of data.
Example:
Consider a table that stores student information along with their course details:
If a new course (e.g., History) needs to be added but there are no students enrolled yet, you cannot insert
the course information without also adding a student. This leads to a situation where courses can't be
recorded unless they are linked to students, resulting in a loss of information.
2. Update Anomaly
An update anomaly occurs when changing a single piece of information requires multiple rows to be
updated, leading to potential inconsistencies if one of the updates fails or is overlooked.
Example:
Using the previous student-course table, if a course name changes (e.g., changing "Math" to
"Mathematics"), every row containing "Math" must be updated. If one row is missed or not updated
correctly, it results in data inconsistency:
A deletion anomaly occurs when the deletion of data inadvertently removes additional valuable information
that should be retained. This often happens when related data is stored together in a single table.
Example: Using the same table, if a student (e.g., Bob) withdraws from all courses and is deleted from the
table, the course information (e.g., Science) is also lost. After deletion, the information about the "Science"
course is lost, this could lead to loss of valuable data about the course itself, impacting the ability to manage
or track courses independently.
These anomalies highlight the importance of proper database design, particularly normalization. By organizing
data into separate tables and establishing clear relationships, one can minimize insertion, update, and
deletion anomalies, thus ensuring data integrity and reducing redundancy. Properly normalized tables allow
for independent data management, making it easier to insert, update, and delete records without unintended
consequences.
Functional dependencies are relationships between attributes in a relational database that describe how one
set of attributes determines the value of another set of attributes. Understanding these dependencies is
crucial for database design and normalization, ensuring data consistency and reducing redundancy.
1. Transitive Dependency:
• This happens when one attribute determines another indirectly through a third attribute.
• Example: If you know the StudentID(PK), you can find their AdvisorID, and with the AdvisorID, you can
find the AdvisorName.
2. Partial Dependency:
• In a table with a combined key (where two or more columns together form the key), a partial
dependency happens when only part of the key is enough to determine some data.
• Example: In a table where OrderID,ProductID is the key, the ProductName might depend only on
ProductID, not on the full key. This is a partial dependency.
3. Full Dependency:
• Full dependency happens when the entire Primary needed to determine something else.
• Example: In the same table with OrderID,ProductID, if you need both OrderID and ProductID to know
the Quantity ordered, then Quantity fully depends on both.
Normalization
Normalization is a process used to organize data in a database to reduce redundancy (duplicate data) and
ensure data integrity. It involves breaking down tables into smaller, related tables, so the database is more
efficient and consistent.
• Instead of storing individual data points in their own rows, you might see multiple values stuffed into
one cell.
• Example:
2. Repeating Columns:
• Similar types of data might be stored in separate columns. Instead of organizing data efficiently, each
related piece of data gets its own column.
• Example:
First Normal Form (1NF) is the first level of normalization applied to a database to organize data. It ensures
that the data is structured correctly by following a few key rules.
• In 1NF, each cell in a table should contain only a single, indivisible value. This is called having atomic
(or simple) values. You should not have multiple values or lists in any cell.
In this example, the "Products" column contains multiple items in the same cell, which violates 1NF.
2. No Repeating Groups:
• In 1NF, repeating groups of columns should be avoided. You shouldn’t have multiple columns for
similar data.
• Each column in a table must have a unique name to avoid confusion. This helps when referencing
columns in queries. In 1NF, every column is identified uniquely.
4. Unique Rows:
• In 1NF, each row (or record) should be unique. There should be no duplicate rows in the table. You can
achieve this by ensuring each table has a primary key — a column (or set of columns) that uniquely
identifies each row.
Second Normal Form (2NF) is the next step in database normalization after the First Normal Form (1NF). In
2NF, we focus on eliminating partial dependencies, which means every non-key attribute should be fully
dependent on the whole primary key, not just part of it.
• Conditions:
Example:
Let’s say you have a table with the following columns: OrderID, CustomerID, CustomerName, and OrderDate.
The primary key is a combination of OrderID and CustomerID.
Here, CustomerName depends only on CustomerID, not on OrderID. This is a partial dependency because
CustomerName only relies on part of the composite key (CustomerID), not the whole key (OrderID,
CustomerID).
1. Identify the Partial Dependencies: Find the attributes that depend only on part of the composite key.
2. Split the Table: Create separate tables to ensure that each attribute is fully dependent on the entire
primary key.
Partial Dependency:
We need to split the table into two to eliminate the partial dependency of CustomerName on just CustomerID.
2. Orders Table (OrderID and CustomerID together form the primary key):
Now, CustomerName depends only on CustomerID, and OrderDate depends on both OrderID and
CustomerID, making the data structure more organized and following 2NF rules.
Example:
Suppose you have the following table with the columns StudentID, StudentName, DepartmentID, and
DepartmentName.
In this table:
1. Identify the Transitive Dependencies: Look for non-key attributes that depend on other non-key
attributes.
2. Split the Table: Create separate tables to store the transitive dependencies.
Transitive Dependency:
To eliminate the transitive dependency, we can separate the DepartmentName into its own table.
1. Students Table:
2. Departments Table:
Now, the DepartmentName is stored only once in the Departments table, and we no longer repeat
department names in the Students table. This structure is now in 3NF.
QUESTION –