DBMS Unit-3
DBMS Unit-3
There are various strategies that are considered while designing a schema.
Most of these strategies follow an incremental approach that is, they must
start with some schema constructs derived from the requirements and then
they incrementally modify, refine or build on them.
Let’s discuss some of these strategies:
1. Top-down strategy –
Normalization:
Denormalization:
Vertical partitioning:
Horizontal partitioning:
Normalization:
In database management systems (DBMS), normal forms are help to ensure
that the design of a database is efficient, organized, and free from data
anomalies. There are several levels of normalization, each with its own set of
guidelines, known as normal forms.
• First Normal Form (1NF): This is the most basic level of
normalization. In 1NF, each table cell should contain only a single
value, and each column should have a unique name. The first normal
form helps to eliminate duplicate data and simplify queries.
• Second Normal Form (2NF): 2NF eliminates redundant data by
requiring that each non-key attribute be dependent on the primary
key. This means that each column should be directly related to the
primary key, and not to other columns.
• Third Normal Form (3NF): 3NF builds on 2NF by requiring that all
non-key attributes are independent of each other. This means that
each column should be directly related to the primary key, and not
to any other columns in the same table.
• Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF
that ensures that each determinant in a table is a candidate key. In
other words, BCNF ensures that each non-key attribute is
dependent only on the candidate key.
• Fourth Normal Form (4NF): 4NF is a further refinement of BCNF
that ensures that a table does not contain any multi-valued
dependencies.
• Fifth Normal Form (5NF): 5NF is the highest level of normalization
and involves decomposing a table into smaller tables to remove
data redundancy and improve data integrity.
Normal forms help to reduce data redundancy, increase data consistency, and
improve database performance. However, higher levels of normalization can
lead to more complex database designs and queries. It is important to strike
a balance between normalization and practicality when designing a database.
Advantages of Normal Form
• Reduced data redundancy: Normalization helps to eliminate
duplicate data in tables, reducing the amount of storage space
needed and improving database efficiency.
• Improved data consistency: Normalization ensures that data is
stored in a consistent and organized manner, reducing the risk of
data inconsistencies and errors.
• Simplified database design: Normalization provides guidelines for
organizing tables and data relationships, making it easier to design
and maintain a database.
• Improved query performance: Normalized tables are typically
easier to search and retrieve data from, resulting in faster query
performance.
• Easier database maintenance: Normalization reduces the
complexity of a database by breaking it down into smaller, more
manageable tables, making it easier to add, modify, and delete data.
Overall, using normal forms in DBMS helps to improve data quality, increase
database efficiency, and simplify database design and maintenance.
• {Note that, there are many courses having the same course fee}
Here, COURSE_FEE cannot alone decide the value of COURSE_NO
or STUD_NO; COURSE_FEE together with STUD_NO cannot decide
the value of COURSE_NO; COURSE_FEE together with
COURSE_NO cannot decide the value of STUD_NO; Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong
to the one only candidate key {STUD_NO, COURSE_NO} ; But,
COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key. Non-
prime attribute COURSE_FEE is dependent on a proper subset of
the candidate key, which is a partial dependency and so this relation
is not in 2NF. To convert the above relation to 2NF, we need to split
the table into two tables such as : Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE
Table 1
Table2
X is a super key.
A relation is said to be in third normal form, if we did not have any transitive
dependency for non-prime attributes. The basic condition with the Third
Normal Form is that, the relation must be in Second Normal Form.
Below mentioned is the basic condition that must be hold in the non-trivial
functional dependency X -> Y:
• X is a Super Key.
• Y is a Prime Attribute ( this means that element of Y is some part
of Candidate Key).
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.
o If there is no transitive dependency for non-prime attributes, then the relation
must be in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for
every non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-
prime.
EMPLOYEE table:
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
You came across a similar hierarchy known as the Chomsky Normal Form in
the Theory of Computation. Now, carefully study the hierarchy above. It can
be inferred that every relation in BCNF is also in 3NF. To put it another way,
a relation in 3NF need not be in BCNF. Ponder over this statement for a while.
To determine the highest normal form of a given relation R with functional
dependencies, the first step is to check whether the BCNF condition holds. If
R is found to be in BCNF, it can be safely deduced that the relation is also
in 3NF, 2NF, and 1NF as the hierarchy shows. The 1NF has the least
restrictive constraint – it only requires a relation R to have atomic values in
each tuple. The 2NF has a slightly more restrictive constraint.
The 3NF has a more restrictive constraint than the first two normal forms but
is less restrictive than the BCNF. In this manner, the restriction increases as
we traverse down the hierarchy.
Example 1
Let us consider the student database, in which data of the student are
mentioned.
Computer
101 Science & DBMS B_001 201
Engineering
Computer
Computer
101 Science & B_001 202
Networks
Engineering
Electronics &
VLSI
102 Communication B_003 401
Technology
Engineering
Electronics &
Mobile
102 Communication B_003 402
Communication
Engineering
The table present above is not in BCNF, because as we can see that neither
Stu_ID nor Stu_Course is a Super Key. As the rules mentioned above clearly
tell that for a table to be in BCNF, it must follow the property that for
functional dependency X−>Y, X must be in Super Key and here this property
fails, that’s why this table is not in BCNF.
For satisfying this table in BCNF, we have to decompose it into further tables.
Here is the full procedure through which we transform this table into BCNF.
Let us first divide this main table into two
tables Stu_Branch and Stu_Course Table.
Stu_Branch Table
Stu_ID Stu_Branch
101 201
101 202
Stu_ID Stu_Course_No
102 401
102 402
Example 2
Find the highest normal form of a relation R(A, B, C, D, E) with FD set as:
{ BC->D, AC->BE, B->E }
Explanation:
• Step-1: As we can see, (AC)+ ={A, C, B, E, D} but none of its
subsets can determine all attributes of the relation, So AC will be
the candidate key. A or C can’t be derived from any other attribute
of the relation, so there will be only 1 candidate key {AC}.
• Step-2: Prime attributes are those attributes that are part of
candidate key {A, C} in this example and others will be non-prime
{B, D, E} in this example.
• Step-3: The relation R is in 1st normal form as a relational DBMS
does not allow multi-valued or composite attributes.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC
is not a proper subset of candidate key AC) and AC->BE is in 2nd normal form
(AC is candidate key) and B->E is in 2nd normal form (B is not a proper subset
of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a
super key nor D is a prime attribute) and in B->E (neither B is a super key nor
E is a prime attribute) but to satisfy 3rd normal for, either LHS of an FD should
be super key or RHS should be a prime attribute. So the highest normal form
of relation will be the 2nd Normal form.
Note: A prime attribute cannot be transitively dependent on a key in BCNF
relation.
Consider these functional dependencies of some relation R
AB ->C
C ->B
AB ->B
Suppose, it is known that the only candidate key of R is AB. A careful
observation is required to conclude that the above dependency is a Transitive
Dependency as the prime attribute B transitively depends on the key AB
through C. Now, the first and the third FD are in BCNF as they both contain
the candidate key (or simply KEY) on their left sides. The second dependency,
however, is not in BCNF but is definitely in 3NF due to the presence of the
prime attribute on the right side. So, the highest normal form of R is 3NF as
all three FDs satisfy the necessary conditions to be in 3NF.
Example 3
1. It must be in BCNF.
2. It does not have any multi-valued dependency.
Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then
the decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will
result in the same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all
the decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME
EMPLOYEE table:
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID",
then the resultant relation will look like:
Employee ⋈ Department
Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must
satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the
dependencies of R either must be a part of R1 or R2 or must be
derivable from the combination of functional dependencies of R1 and
R2.
o For example, suppose there is a relation R (A, B, C, D) with functional
dependency set (A->BC). The relational R is decomposed into R1(ABC)
and R2(AD) which is dependency preserving because FD A->BC is a
part of relation R1(ABC).
Decomposition Algorithms:
o Decomposition to BCNF
o Decomposition to 3NF
Decomposition to BCNF:
There are following cases which require to be tested if the given relation
schema R satisfies the BCNF rule:
Case 1: Check and test, if a nontrivial dependency α -> β violate the BCNF
rule, evaluate and compute α+ , i.e., the attribute closure of α. Also, verify that
α+ includes all the attributes of the given relation R. It means it should be the
super key of relation R.
Case 2: If the given relation R is in BCNF, it is not required to test all the
dependencies in F+. It only requires determining and checking the
dependencies in the provided dependency set F for the BCNF test. It is
because if no dependency in F causes a violation of BCNF, consequently, none
of the F+ dependency will cause any violation of BCNF.
Normalization is a systematic approach of decomposing tables to eliminate
data redundancy (repetition) and undesirable characteristics like Insertion,
Update, and Deletion Anomalies. It is a multi-step process that puts data into
tabular form, removing duplicated data from the relation tables.
Example:
CourseDetails table:
Database
CSE101 Systems P101 John Doe [email protected] CSE
Alice
MAT101 Linear Algebra P103 Johnson [email protected] MATH
Quantum
PHY101 Physics P104 Bob Brown [email protected] PHYSICS
A table is in the First Normal Form (1NF) if it contains only atomic values and
each row can be identified uniquely.
The CourseDetails table is already in 1NF because:
A table is in the Second Normal Form (2NF) if it is in 1NF and all non-primary
key attributes are fully functional and dependent on the primary key.
To achieve 2NF, we can notice that ProfName, ProfEmail, and DeptName are
not dependent on the primary key (CourseID) alone but on ProfID as well. So,
we decompose the table into two tables to eliminate partial dependency:
Courses table:
Professors table:
A table is in the Third Normal Form (3NF) if it is in 2NF and all its attributes
are not only fully functionally dependent on the primary key but also non-
transitively dependent (i.e., no transitive dependency).
In the Courses table, DeptName is transitively dependent on CourseID
through ProfID. To remove this transitive dependency and achieve 3NF, we
can further decompose the Courses table into two tables:
Departments table:
DeptName ProfID
CSE P101
CSE P102
MATH P103
PHYSICS P104
The primary difference between 3NF and BCNF is how they handle functional
dependencies involving candidate keys. BCNF requires that for every
functional dependency, the left-hand side must be a superkey, which is a
stricter requirement than in 3NF.
Example:
CourseAssignments table:
ProfID → ProfName
The composite key for this table is (CourseID, Semester) because it uniquely
identifies each record. However, we notice a dependency that ProfID
determines ProfName, which violates the BCNF rule since ProfID is not a
superkey for this table.
Step 1: Decomposition into BCNF
To decompose this table into BCNF, we need to ensure that for all functional
dependencies, the left-hand side is a superkey. Given the violation noted, we
can decompose CourseAssignments into two tables to satisfy BCNF:
CourseOfferings table:
Professors table:
ProfID ProfName
In this decomposition:
The Professors table has ProfID as its primary key, which uniquely identifies
ProfName, also satisfying BCNF.