UNIT4
UNIT4
A) Normalization is a process in database design that helps eliminate data redundancy and improve
data integrity by organizing data into multiple related tables. There are several normal forms (NF)
that define specific rules for structuring database tables. Here's an explanation of each normal form:
- 1NF requires that each column in a table contains only atomic values, meaning no repeating
groups or arrays within a column.
- Example: Consider a table named "Students" with columns "StudentID," "Name," and "Subjects"
where the "Subjects" column stores multiple subject names as a comma-separated list. To convert it
to 1NF, you would separate the subjects into individual rows with a foreign key referencing the
student.
- 2NF builds on 1NF and requires that each non-key column in a table is functionally dependent on
the entire primary key.
- Example: Suppose you have a "Sales" table with columns "OrderID," "ProductID,"
"ProductName," and "ProductCategory." If the "ProductCategory" is dependent only on the
"ProductID," it violates 2NF. To comply with 2NF, you would move the "ProductCategory" to a
separate table, referencing the "ProductID."
- 3NF builds on 2NF and requires that no non-key column is transitively dependent on the primary
key.
- Example: Consider a "Students" table with columns "StudentID," "CourseID," "CourseName," and
"CourseTeacher." If "CourseTeacher" is functionally dependent on "CourseName" but not directly
dependent on the primary key "StudentID," it violates 3NF. To adhere to 3NF, you would create a
separate "Courses" table with "CourseID," "CourseName," and "CourseTeacher" columns.
- 4NF deals with multivalued dependencies and requires that no non-key column is dependent on
another non-key column.
- Example: Suppose you have a "Books" table with columns "BookID," "AuthorID," "AuthorName,"
and "BookGenres." If the "BookGenres" column contains multiple genres, violating 4NF, you would
create a separate table "Genres" with "BookID" and "Genre"columns to eliminate the multivalued
dependency.
5. Fifth Normal Form (5NF):
- 5NF deals with join dependencies and requires that no non-trivial join dependency exists
between columns through a combination of other columns.
- Example: Consider a "Students" table with columns "StudentID," "CourseID," "CourseName," and
"TeacherName." If "TeacherName" is dependent on the combination of "StudentID" and "CourseID"
but not individually on either, it violates 5NF. To achieve 5NF, you would separate the
"TeacherName" into a separate table with a composite key consisting of "StudentID" and
"CourseID."
- It requires that every determinant (the column or set of columns on which another column is
functionally dependent) is a candidate key.
- BCNF eliminates all non-trivial functional dependencies and ensures that each determinant is a
candidate key.
Each normal form helps in improving database design by reducing redundancy, ensuring data
integrity, and avoiding anomalies. The normalization process involves applying these normal forms
progressively to ensure data is well-structured and efficiently organized.
2. Explain Different types of functional dependencies?
- Example: In a table with attributes "EmployeeID" and "Name," the functional dependency
{EmployeeID} -> {EmployeeID, Name} is trivial because the dependent attribute "EmployeeID" is already
part of the determinant attribute.
- A functional dependency is considered full when removing any attribute from the determinant
attribute(s) breaks the dependency.
- Example: In a table with attributes "EmployeeID," "DepartmentID," and "Salary," the functional
dependency {EmployeeID, DepartmentID} -> {Salary} is full because removing either "EmployeeID" or
"DepartmentID" would break the dependency.
- A functional dependency is considered partial if removing one or more attributes from the
determinant attribute(s) does not break the dependency.
- Example: In a table with attributes "EmployeeID," "DepartmentID," and "Salary," the functional
dependency {EmployeeID, DepartmentID} -> {EmployeeID} is partial because removing "DepartmentID"
does not break the dependency.
Understanding these types of functional dependencies is crucial for database designers to identify and
eliminate redundancy, perform normalization, and ensure data integrity in a relational database
schema.
3. Explain lossless join and dependency preserving decomposition?
A) Lossless Join:
Lossless join refers to a property of decomposition in database normalization. It ensures that when a
relation (table) is decomposed into multiple smaller relations, it is still possible to recombine these
smaller relations through a join operation and obtain the original relation without losing any
information. In other words, the decomposition preserves all the functional dependencies of the original
relation.
To achieve a lossless join, the decomposition must satisfy the following conditions:
1. Covering Condition: The union of the projections of the smaller relations should result in the original
relation.
2. Preservation of Functional Dependencies: All the functional dependencies that hold in the original
relation should also hold in the smaller relations.
Lossless join is essential to maintain the integrity and correctness of the data during the decomposition
process. It ensures that even though a relation is divided into smaller parts, we can still combine those
parts and obtain the original relation without any loss of information.
To achieve a dependency preserving decomposition, the decomposition must satisfy the following
condition:
1. Preservation of Functional Dependencies: All the functional dependencies that hold in the original
relation should also hold in the smaller relations.
Both lossless join and dependency preserving decomposition are desirable properties when
decomposing a relation to eliminate redundancy and achieve higher levels of database normalization.
4. Explain the concept of surrogate key?
The concept of a surrogate key is used in database design to provide a unique identifier for each row in a
table. A surrogate key is an artificially created identifier that has no meaning or relevance to the data
itself. It is typically an auto-incrementing integer or a globally unique identifier (GUID) generated by the
database management system (DBMS).
1. Uniqueness: Surrogate keys guarantee the uniqueness of each row in a table. Since they are
generated by the DBMS, they avoid the possibility of duplicate values and eliminate the need to rely on
natural keys.
2. Independence from Data: Surrogate keys are independent of the data attributes in the table. They do
not have any inherent meaning or significance related to the real-world entity represented by the table.
This independence allows for flexibility in data management and avoids potential issues when natural
keys change.
3. Simplicity and Efficiency: Surrogate keys are often simple integer values or unique identifiers
generated by the DBMS. They are efficient for indexing, searching, and joining tables. The use of a single
surrogate key simplifies the design and implementation of relationships between tables.
4. Stability: Surrogate keys remain stable over time. Unlike natural keys, which might change due to
updates or modifications to the data, surrogate keys remain constant, providing a consistent identifier
for each row.
5. Consistency: Surrogate keys enable consistency when merging or integrating data from different
sources. Since they are independent of the source data, they can be used to uniquely identify and
reconcile records from multiple systems.
6. Support for Relationships: Surrogate keys are commonly used as foreign keys to establish
relationships between tables. They simplify the design and improve the performance of joins between
related tables.
Overall, the use of surrogate keys offers several advantages in database design, such as ensuring
uniqueness, simplifying relationships, maintaining data integrity, and providing efficient
querying capabilities.