0% found this document useful (0 votes)
4 views28 pages

DBMS Unit-3

The document discusses various strategies for schema design in database management systems (DBMS), including top-down, bottom-up, inside-out, and mixed strategies. It emphasizes the importance of normalization to reduce data redundancy and improve data integrity, detailing different normal forms from 1NF to 5NF. Additionally, it outlines the advantages of normalization and the impact of schema design on database performance and maintenance.

Uploaded by

bharat soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views28 pages

DBMS Unit-3

The document discusses various strategies for schema design in database management systems (DBMS), including top-down, bottom-up, inside-out, and mixed strategies. It emphasizes the importance of normalization to reduce data redundancy and improve data integrity, detailing different normal forms from 1NF to 5NF. Additionally, it outlines the advantages of normalization and the impact of schema design on database performance and maintenance.

Uploaded by

bharat soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Schema design :

There are various strategies that are considered while designing a schema.
Most of these strategies follow an incremental approach that is, they must
start with some schema constructs derived from the requirements and then
they incrementally modify, refine or build on them.
Let’s discuss some of these strategies:
1. Top-down strategy –

In this strategy, we basically start with a schema that contains a high


level of abstraction and then apply successive top-down
refinement. An example, we may specify only a few level entity
types and then we specify their attributes split them into lower-level
entity types and relationships. The process of specialization to refine
an entity type into subclass is also an example of this strategy.
2. Bottom-up strategy –

In this type of strategy, we basically start with basic abstraction and


then go on adding to this abstraction. For example, we may start
with attributes and group these into entity types and relationships.
We can also add a new relationship among entity types as the
design goes ahead. The basic example is the process of generalizing
entity types into the higher-level generalized superclass.
3. Inside-Out Strategy –

This is a special case of a bottom-up strategy when attention is


basically focused on a central set of concepts that are most evident.
Modelling then basically spreads outward by considering new
concepts in the vicinity of existing ones. We could specify a few
clearly evident entity types in the schema and continue by adding
other entity types and relationships that are related to each other.
4. Mixed Strategy –

Instead of using any particular strategy throughout the design, the


requirements are partitioned according to a top-down strategy, and
part of the schema is designed for each partition according to a
bottom-up strategy after that various schema are combined.
Schema design is the process of creating a logical and organized structure
for a database, which involves defining tables, columns, relationships,
constraints, and other elements that will govern how data is stored and
accessed. Effective schema design is essential for creating a robust, scalable,
and efficient database system.
Here are some strategies for schema design in DBMS:
Identify the purpose and scope of the database: Before designing a
database schema, it is important to define the purpose and scope of the
database. This will help you determine what kind of data the database needs
to store, how it will be used, and what types of queries will be performed on
the data.
Normalize the database: Normalization is the process of organizing data into
tables and applying rules to ensure data is stored in a consistent and efficient
manner. By reducing data redundancy and ensuring data integrity,
normalization helps to eliminate anomalies and improve the overall quality
of the database.
Use data types appropriately: Choosing the right data type for each column
is important for efficient data storage and retrieval. For example, using
numeric data types for numeric data can improve calculation performance,
while using date/time data types can help with date/time calculations and
sorting.
Establish relationships between tables: Establishing relationships between
tables can help to eliminate data redundancy and improve data consistency.
For example, a foreign key can be used to link a record in one table to a record
in another table, ensuring that data is consistent across both tables.
Use constraints to ensure data integrity: Constraints can be used to enforce
rules on the data in a database, ensuring that data is accurate and consistent.
For example, a primary key constraint can ensure that each record in a table
has a unique identifier, while a check constraint can ensure that data meets
certain conditions before it is inserted into a table.
Optimize for performance: Schema design can have a significant impact on
database performance. Optimizing indexes, partitioning data, and using
appropriate data types can all improve query performance and reduce
database overhead.
Effective schema design requires a thorough understanding of the data being
stored and how it will be used, as well as an understanding of best practices
for database design and optimization. By following these strategies, you can
create a robust and efficient database schema that meets your needs and
supports your business goals.

Features of different strategies for schema design in DBMS:

Normalization:

➢ Divides large tables into smaller, related tables to minimize data


redundancy and ensure data consistency
➢ Reduces the need for multiple updates to maintain consistency
➢ Eliminates data anomalies, such as update, insertion, and deletion
anomalies
➢ Results in a more complex schema with more tables and
relationships
➢ May negatively impact query performance due to the increased
number of joins required

Denormalization:

➢ Adds redundant data to improve query performance by reducing the


number of joins required
➢ Simplifies data access by storing all data in one place
➢ Can result in data inconsistency if not properly managed
➢ Increases storage requirements due to the duplicated data
➢ Simplifies queries by reducing the number of joins required, which can
result in faster query execution

Vertical partitioning:

➢ Splits a table into smaller tables based on columns to improve query


performance
➢ Reduces I/O operations by only reading relevant columns from disk
➢ Simplifies data access by storing data in tables with fewer columns
➢ Can result in a more complex schema with more tables and
relationships

➢ Can negatively impact query performance if a query requires columns


from multiple tables

Horizontal partitioning:

➢ Splits a table into smaller tables based on rows to improve query


performance and scalability

➢ Simplifies data management by breaking down large tables into


smaller, more manageable pieces

➢ Increases query performance by reducing the amount of data that


needs to be scanned

➢ Can result in a more complex schema with more tables and


relationships

➢ Can negatively impact query performance if a query requires data


from multiple partitions

Normal Forms in DBMS:


Normalization is the process of minimizing redundancy from a relation or set
of relations. Redundancy in relation may cause insertion, deletion, and update
anomalies. So, it helps to minimize the redundancy in relations. Normal
forms are used to eliminate or reduce redundancy in database tables.

Normalization:
In database management systems (DBMS), normal forms are help to ensure
that the design of a database is efficient, organized, and free from data
anomalies. There are several levels of normalization, each with its own set of
guidelines, known as normal forms.
• First Normal Form (1NF): This is the most basic level of
normalization. In 1NF, each table cell should contain only a single
value, and each column should have a unique name. The first normal
form helps to eliminate duplicate data and simplify queries.
• Second Normal Form (2NF): 2NF eliminates redundant data by
requiring that each non-key attribute be dependent on the primary
key. This means that each column should be directly related to the
primary key, and not to other columns.
• Third Normal Form (3NF): 3NF builds on 2NF by requiring that all
non-key attributes are independent of each other. This means that
each column should be directly related to the primary key, and not
to any other columns in the same table.
• Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF
that ensures that each determinant in a table is a candidate key. In
other words, BCNF ensures that each non-key attribute is
dependent only on the candidate key.
• Fourth Normal Form (4NF): 4NF is a further refinement of BCNF
that ensures that a table does not contain any multi-valued
dependencies.
• Fifth Normal Form (5NF): 5NF is the highest level of normalization
and involves decomposing a table into smaller tables to remove
data redundancy and improve data integrity.
Normal forms help to reduce data redundancy, increase data consistency, and
improve database performance. However, higher levels of normalization can
lead to more complex database designs and queries. It is important to strike
a balance between normalization and practicality when designing a database.
Advantages of Normal Form
• Reduced data redundancy: Normalization helps to eliminate
duplicate data in tables, reducing the amount of storage space
needed and improving database efficiency.
• Improved data consistency: Normalization ensures that data is
stored in a consistent and organized manner, reducing the risk of
data inconsistencies and errors.
• Simplified database design: Normalization provides guidelines for
organizing tables and data relationships, making it easier to design
and maintain a database.
• Improved query performance: Normalized tables are typically
easier to search and retrieve data from, resulting in faster query
performance.
• Easier database maintenance: Normalization reduces the
complexity of a database by breaking it down into smaller, more
manageable tables, making it easier to add, modify, and delete data.
Overall, using normal forms in DBMS helps to improve data quality, increase
database efficiency, and simplify database design and maintenance.

First Normal Form:


If a relation contains composite or multi-valued attribute, it violates first
normal form or a relation is in first normal form if it does not contain any
composite or multi-valued attribute. A relation is in first normal form if every
attribute in that relation is singled valued attribute.
• Example 1 – Relation STUDENT in table 1 is not in 1NF because of
multi-valued attribute STUD_PHONE. Its decomposition into 1NF
has been shown in table 2.
• Example 2 –
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
In the above table Course is a multi-valued attribute so it is not in 1NF.
Below Table is in 1NF as there is no multi-valued attribute
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3

Second Normal Form:


To be in second normal form, a relation must be in first normal form and
relation must not contain any partial dependency. A relation is in 2NF if it
has No Partial Dependency, i.e., no non-prime attribute (attributes which are
not part of any candidate key) is dependent on any proper subset of any
candidate key of the table. Partial Dependency – If the proper subset of
candidate key determines non-prime attribute, it is called partial dependency.
• Example 1 – Consider table-3 as following below.
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000

• {Note that, there are many courses having the same course fee}
Here, COURSE_FEE cannot alone decide the value of COURSE_NO
or STUD_NO; COURSE_FEE together with STUD_NO cannot decide
the value of COURSE_NO; COURSE_FEE together with
COURSE_NO cannot decide the value of STUD_NO; Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong
to the one only candidate key {STUD_NO, COURSE_NO} ; But,
COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key. Non-
prime attribute COURSE_FEE is dependent on a proper subset of
the candidate key, which is a partial dependency and so this relation
is not in 2NF. To convert the above relation to 2NF, we need to split
the table into two tables such as : Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE
Table 1
Table2

STUD_NO COURSE_NO COURSE_NO COURSE_FEE


1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
• NOTE: 2NF tries to reduce the redundant data getting stored in
memory. For instance, if there are 100 students taking C1 course,
we don’t need to store its Fee as 1000 for all the 100 records,
instead, once we can store it in the second table as the course fee
for C1 is 1000.
• Example 2 – Consider following functional dependencies in
relation R (A, B , C, D )
AB-> C [A and B together determine C]
BC-> D [B and C together determine D]
In the above relation, AB is the only candidate key and there is no partial
dependency, i.e., any proper subset of AB doesn’t determine any non-prime
attribute.

X is a super key.

Y is a prime attribute (each element of Y is part of some candidate


key).
Example 1:
In relation STUDENT given in Table 4, FD set: {STUD_NO -> STUD_NAME,
STUD_NO -> STUD_STATE, STUD_STATE -> STUD_COUNTRY, STUD_NO
-> STUD_AGE}
Candidate Key: {STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true.
So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the
third normal form.
To convert it in third normal form, we will decompose the relation STUDENT
(STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY_STUD_AGE) as: STUDENT (STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_AGE) STATE_COUNTRY (STATE,
COUNTRY)
Consider relation R(A, B, C, D, E) A -> BC, CD -> E, B -> D, E -> A All possible
candidate keys in above relation are {A, E, CD, BC} All attributes are on right
sides of all functional dependencies are prime.
Example 2: Find the highest normal form of a relation R(A,B,C,D,E) with FD
set as {BC->D, AC->BE, B->E}
Step 1: As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can
determine all attribute of relation, So AC will be candidate key. A or C can’t
be derived from any other attribute of the relation, so there will be only 1
candidate key {AC}.
Step 2: Prime attributes are those attributes that are part of candidate key
{A, C} in this example and others will be non-prime {B, D, E} in this example.
Step 3: The relation R is in 1st normal form as a relational DBMS does not
allow multi-valued or composite attribute. The relation is in 2nd normal form
because BC->D is in 2nd normal form (BC is not a proper subset of candidate
key AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E is
in 2nd normal form (B is not a proper subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super
key nor D is a prime attribute) and in B->E (neither B is a super key nor E is a
prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be
super key or RHS should be prime attribute. So the highest normal form of
relation will be 2nd Normal form.
For example consider relation R(A, B, C) A -> BC, B -> A and B both are super
keys so above relation is in BCNF.
Third Normal Form:

A relation is said to be in third normal form, if we did not have any transitive
dependency for non-prime attributes. The basic condition with the Third
Normal Form is that, the relation must be in Second Normal Form.
Below mentioned is the basic condition that must be hold in the non-trivial
functional dependency X -> Y:
• X is a Super Key.
• Y is a Prime Attribute ( this means that element of Y is some part
of Candidate Key).
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.
o If there is no transitive dependency for non-prime attributes, then the relation
must be in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for
every non-trivial function dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal


Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so o


n

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-
prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent


on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively
dependent on super key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal

Boyce-Codd Normal Form (BCNF)


the general definitions of 2NF and 3NF may identify additional redundancy
caused by dependencies that violate one or more candidate keys. However,
despite these additional constraints, dependencies can still exist that will
cause redundancy to be present in 3NF relations. This weakness in 3NF
resulted in the presentation of a stronger normal form called the Boyce-
Codd Normal Form (Codd, 1974).
Although, 3NF is an adequate normal form for relational databases, still, this
(3NF) normal form may not remove 100% redundancy because of
X−>Y functional dependency if X is not a candidate key of the given relation. This
can be solved by Boyce-Codd Normal Form (BCNF).
Boyce–Codd Normal Form (BCNF) is based on functional dependencies that take
into account all candidate keys in a relation; however, BCNF also has
additional constraints compared with the general definition of 3NF.
Rules for BCNF
Rule 1: The table should be in the 3rd Normal Form.
Rule 2: X should be a super key for every functional dependency (FD) X−>Y
in a given relation.
Note: To test whether a relation is in BCNF, we identify all the determinants
and make sure that they are candidate keys.

You came across a similar hierarchy known as the Chomsky Normal Form in
the Theory of Computation. Now, carefully study the hierarchy above. It can
be inferred that every relation in BCNF is also in 3NF. To put it another way,
a relation in 3NF need not be in BCNF. Ponder over this statement for a while.
To determine the highest normal form of a given relation R with functional
dependencies, the first step is to check whether the BCNF condition holds. If
R is found to be in BCNF, it can be safely deduced that the relation is also
in 3NF, 2NF, and 1NF as the hierarchy shows. The 1NF has the least
restrictive constraint – it only requires a relation R to have atomic values in
each tuple. The 2NF has a slightly more restrictive constraint.
The 3NF has a more restrictive constraint than the first two normal forms but
is less restrictive than the BCNF. In this manner, the restriction increases as
we traverse down the hierarchy.

Example 1

Let us consider the student database, in which data of the student are
mentioned.

Stu_ID Stu_Branch Stu_Course Branch_Number Stu_Course_No

Computer
101 Science & DBMS B_001 201
Engineering

Computer
Computer
101 Science & B_001 202
Networks
Engineering

Electronics &
VLSI
102 Communication B_003 401
Technology
Engineering

Electronics &
Mobile
102 Communication B_003 402
Communication
Engineering

Functional Dependency of the above is as mentioned:


Stu_ID −> Stu_Branch
Stu_Course −> {Branch_Number, Stu_Course_No}
Candidate Keys of the above table are: {Stu_ID, Stu_Course}

The table present above is not in BCNF, because as we can see that neither
Stu_ID nor Stu_Course is a Super Key. As the rules mentioned above clearly
tell that for a table to be in BCNF, it must follow the property that for
functional dependency X−>Y, X must be in Super Key and here this property
fails, that’s why this table is not in BCNF.
For satisfying this table in BCNF, we have to decompose it into further tables.
Here is the full procedure through which we transform this table into BCNF.
Let us first divide this main table into two
tables Stu_Branch and Stu_Course Table.
Stu_Branch Table
Stu_ID Stu_Branch

101 Computer Science & Engineering

102 Electronics & Communication Engineering

Candidate Key for this table: Stu_ID.


Stu_Course Table
Stu_Course Branch_Number Stu_Course_No

DBMS B_001 201

Computer Networks B_001 202

VLSI Technology B_003 401

Mobile Communication B_003 402

Candidate Key for this table: Stu_Course.


Stu_ID to Stu_Course_No Table
Stu_ID Stu_Course_No

101 201

101 202
Stu_ID Stu_Course_No

102 401

102 402

Candidate Key for this table: {Stu_ID, Stu_Course_No}.


After decomposing into further tables, now it is in BCNF, as it is passing the
condition of Super Key, that in functional dependency X−>Y, X is a Super Key.

Example 2

Find the highest normal form of a relation R(A, B, C, D, E) with FD set as:
{ BC->D, AC->BE, B->E }
Explanation:
• Step-1: As we can see, (AC)+ ={A, C, B, E, D} but none of its
subsets can determine all attributes of the relation, So AC will be
the candidate key. A or C can’t be derived from any other attribute
of the relation, so there will be only 1 candidate key {AC}.
• Step-2: Prime attributes are those attributes that are part of
candidate key {A, C} in this example and others will be non-prime
{B, D, E} in this example.
• Step-3: The relation R is in 1st normal form as a relational DBMS
does not allow multi-valued or composite attributes.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC
is not a proper subset of candidate key AC) and AC->BE is in 2nd normal form
(AC is candidate key) and B->E is in 2nd normal form (B is not a proper subset
of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a
super key nor D is a prime attribute) and in B->E (neither B is a super key nor
E is a prime attribute) but to satisfy 3rd normal for, either LHS of an FD should
be super key or RHS should be a prime attribute. So the highest normal form
of relation will be the 2nd Normal form.
Note: A prime attribute cannot be transitively dependent on a key in BCNF
relation.
Consider these functional dependencies of some relation R
AB ->C
C ->B
AB ->B
Suppose, it is known that the only candidate key of R is AB. A careful
observation is required to conclude that the above dependency is a Transitive
Dependency as the prime attribute B transitively depends on the key AB
through C. Now, the first and the third FD are in BCNF as they both contain
the candidate key (or simply KEY) on their left sides. The second dependency,
however, is not in BCNF but is definitely in 3NF due to the presence of the
prime attribute on the right side. So, the highest normal form of R is 3NF as
all three FDs satisfy the necessary conditions to be in 3NF.

Example 3

For example consider relation R(A, B, C)


A -> BC,
B -> A
A and B both are super keys so the above relation is in BCNF.
Note: BCNF decomposition may always not be possible with dependency
preserving, however, it always satisfies the lossless join condition. For
example, relation R (V, W, X, Y, Z), with functional dependencies:
V, W -> X
Y, Z -> X
W -> Y
It would not satisfy dependency preserving BCNF decomposition.

Note: Redundancies are sometimes still present in a BCNF relation as it is not


always possible to eliminate them completely.
There are also some higher-order normal forms, like the 4th Normal Form
and the 5th Normal Form.
We have to focus on some basic rules that are for BCNF:
1. Table must be in Third Normal Form.
2. In relation X->Y, X must be a superkey in a relation.
Fourth Normal Form:
Fourth Normal Form contains no non-trivial multi-valued dependency except
candidate key. The basic condition with Fourth Normal Form is that the
relation must be in BCNF.
The basic rules are mentioned below.

1. It must be in BCNF.
2. It does not have any multi-valued dependency.

Fifth Normal Form:


Fifth Normal Form is also called as Projected Normal Form. The basic
conditions of Fifth Normal Form is mentioned below.
Relation must be in Fourth Normal Form.
The relation must not be further non loss decomposed.

Applications of Normal Forms in DBMS


• Data consistency: Normal forms ensure that data is consistent and
does not contain any redundant information. This helps to prevent
inconsistencies and errors in the database.
• Data redundancy: Normal forms minimize data redundancy by
organizing data into tables that contain only unique data. This
reduces the amount of storage space required for the database and
makes it easier to manage.
• Response time: Normal forms can improve query performance by
reducing the number of joins required to retrieve data. This helps to
speed up query processing and improve overall system
performance.
• Database maintenance: Normal forms make it easier to maintain
the database by reducing the amount of redundant data that needs
to be updated, deleted, or modified. This helps to improve database
management and reduce the risk of errors or inconsistencies.
• Database design: Normal forms provide guidelines for designing
databases that are efficient, flexible, and scalable. This helps to
ensure that the database can be easily modified, updated, or
expanded as needed.
Some Important Points about Normal Forms:
• BCNF is free from redundancy caused by Functional Dependencies.
• If a relation is in BCNF, then 3NF is also satisfied.
• If all attributes of relation are prime attribute, then the relation is
always in 3NF.
• A relation in a Relational Database is always and at least in 1NF
form.
• Every Binary Relation (a Relation with only 2 attributes) is always
in BCNF.
• If a Relation has only singleton candidate keys (i.e. every candidate
key consists of only 1 attribute), then the Relation is always in 2NF
(because no Partial functional dependency possible).
• Sometimes going for BCNF form may not preserve functional
dependency. In that case go for BCNF only if the lost FD(s) is not
required, else normalize till 3NF only.
• There are many more Normal forms that exist after BCNF, like 4NF
and more. But in real world database systems it’s generally not
required to go beyond BCNF.
Relational Decomposition
o When a relation in the relational model is not in appropriate normal
form then the decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to
problems like loss of information.
o Decomposition is used to eliminate some of the problems of bad design
like anomalies, inconsistencies, and redundancy.

Types of Decomposition

Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then
the decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will
result in the same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all
the decomposition give the original relation.

Example:

EMPLOYEE_DEPARTMENT table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and


DEPARTMENT

EMPLOYEE table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida

DEPARTMENT table

DEPT_ID EMP_ID DEPT_NAME


827 22 Sales

438 33 Marketing

869 46 Finance

575 52 Production

678 60 Testing

Now, when these two relations are joined on the common column "EMP_ID",
then the resultant relation will look like:

Employee ⋈ Department

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must
satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the
dependencies of R either must be a part of R1 or R2 or must be
derivable from the combination of functional dependencies of R1 and
R2.
o For example, suppose there is a relation R (A, B, C, D) with functional
dependency set (A->BC). The relational R is decomposed into R1(ABC)
and R2(AD) which is dependency preserving because FD A->BC is a
part of relation R1(ABC).

Decomposition Algorithms:

Here, we will get to know the decomposition algorithms using functional


dependencies for two different normal forms, which are:

o Decomposition to BCNF
o Decomposition to 3NF

Decomposition using functional dependencies aims at dependency


preservation and lossless decomposition.

Decomposition to BCNF:

Before applying the BCNF decomposition algorithm to the given relation, it is


necessary to test if the relation is in Boyce-Codd Normal Form. After the test,
if it is found that the given relation is not in BCNF, we can decompose it
further to create relations in BCNF.

There are following cases which require to be tested if the given relation
schema R satisfies the BCNF rule:

Case 1: Check and test, if a nontrivial dependency α -> β violate the BCNF
rule, evaluate and compute α+ , i.e., the attribute closure of α. Also, verify that
α+ includes all the attributes of the given relation R. It means it should be the
super key of relation R.

Case 2: If the given relation R is in BCNF, it is not required to test all the
dependencies in F+. It only requires determining and checking the
dependencies in the provided dependency set F for the BCNF test. It is
because if no dependency in F causes a violation of BCNF, consequently, none
of the F+ dependency will cause any violation of BCNF.
Normalization is a systematic approach of decomposing tables to eliminate
data redundancy (repetition) and undesirable characteristics like Insertion,
Update, and Deletion Anomalies. It is a multi-step process that puts data into
tabular form, removing duplicated data from the relation tables.

Normalization is achieved through a series of stages called normal forms.


Each normal form (NF) has an importance in eliminating redundancy, and
ensuring data integrity and consistency. The most commonly achieved forms
are the First Normal Form (1NF), Second Normal Form (2NF), and Third
Normal Form (3NF).

Let's go through an example to illustrate the decomposition of a database


table all the way to the Third Normal Form (3NF).

Example:

Consider a university database where we have a table that stores information


about courses and the professors teaching them, along with the department
details. Let's name this table CourseDetails.

CourseDetails table:

CourseID CourseName ProfID ProfName ProfEmail DeptName

Database
CSE101 Systems P101 John Doe [email protected] CSE

CSE102 Algorithms P102 Jane Smith [email protected] CSE

Alice
MAT101 Linear Algebra P103 Johnson [email protected] MATH

Quantum
PHY101 Physics P104 Bob Brown [email protected] PHYSICS

Step 1: Convert to 1NF

A table is in the First Normal Form (1NF) if it contains only atomic values and
each row can be identified uniquely.
The CourseDetails table is already in 1NF because:

All columns contain atomic (indivisible) values.

Each row is unique.

Step 2: Convert to 2NF

A table is in the Second Normal Form (2NF) if it is in 1NF and all non-primary
key attributes are fully functional and dependent on the primary key.

To achieve 2NF, we can notice that ProfName, ProfEmail, and DeptName are
not dependent on the primary key (CourseID) alone but on ProfID as well. So,
we decompose the table into two tables to eliminate partial dependency:

Courses table:

CourseID CourseName ProfID DeptName

CSE101 Database Systems P101 CSE

CSE102 Algorithms P102 CSE

MAT101 Linear Algebra P103 MATH

PHY101 Quantum Physics P104 PHYSICS

Professors table:

ProfID ProfName ProfEmail

P101 John Doe [email protected]

P102 Jane Smith [email protected]

P103 Alice Johnson [email protected]

P104 Bob Brown [email protected]

Step 3: Convert to 3NF

A table is in the Third Normal Form (3NF) if it is in 2NF and all its attributes
are not only fully functionally dependent on the primary key but also non-
transitively dependent (i.e., no transitive dependency).
In the Courses table, DeptName is transitively dependent on CourseID
through ProfID. To remove this transitive dependency and achieve 3NF, we
can further decompose the Courses table into two tables:

Courses table (revised):

CourseID CourseName ProfID

CSE101 Database Systems P101

CSE102 Algorithms P102

MAT101 Linear Algebra P103

PHY101 Quantum Physics P104

Departments table:

DeptName ProfID

CSE P101

CSE P102

MATH P103

PHYSICS P104

After decomposition, we now have three tables in 3NF:

Courses with a primary key (CourseID) and no transitive dependencies.

Professors with a primary key (ProfID) and no transitive dependencies.

Departments with a primary key (ProfID, though this could be normalized


further based on a unique department identifier, assuming a professor can
belong to only one department, which is a simplification).

This decomposition eliminates data redundancy, update anomalies, and


ensures data integrity within our university database.
Boyce-Codd Normal Form (BCNF) is an advancement of the Third Normal
Form (3NF) aiming to resolve even more subtle redundancy and dependency
issues that 3NF does not address. A table is in BCNF if, for every one of its
non-trivial functional dependencies (X → Y), X is a superkey—a set of
attributes that uniquely identifies a row in a table.

The primary difference between 3NF and BCNF is how they handle functional
dependencies involving candidate keys. BCNF requires that for every
functional dependency, the left-hand side must be a superkey, which is a
stricter requirement than in 3NF.

Example:

Let’s consider a university database example to understand the


decomposition to BCNF. Suppose we have the following table,
CourseAssignments, which records which professor is teaching which course
during which semester, and we assume a professor can teach only one course
in a semester.

CourseAssignments table:

CourseID Semester ProfID ProfName

CSE101 Fall 2023 P101 John Doe

CSE102 Spring 2024 P102 Jane Smith

MAT101 Fall 2023 P103 Alice Johnson

PHY101 Spring 2024 P104 Bob Brown

In this design, let’s identify the functional dependencies:

CourseID, Semester → ProfID, ProfName

ProfID → ProfName

The composite key for this table is (CourseID, Semester) because it uniquely
identifies each record. However, we notice a dependency that ProfID
determines ProfName, which violates the BCNF rule since ProfID is not a
superkey for this table.
Step 1: Decomposition into BCNF

To decompose this table into BCNF, we need to ensure that for all functional
dependencies, the left-hand side is a superkey. Given the violation noted, we
can decompose CourseAssignments into two tables to satisfy BCNF:

CourseOfferings table:

CourseID Semester ProfID

CSE101 Fall 2023 P101

CSE102 Spring 2024 P102

MAT101 Fall 2023 P103

PHY101 Spring 2024 P104

Professors table:

ProfID ProfName

P101 John Doe

P102 Jane Smith

P103 Alice Johnson

P104 Bob Brown

In this decomposition:

The CourseOfferings table has a composite key (CourseID, Semester), which


is a superkey, thus satisfying the BCNF condition.

The Professors table has ProfID as its primary key, which uniquely identifies
ProfName, also satisfying BCNF.

Step 2: Verification of BCNF

After decomposition, we should verify that each table adheres to BCNF:

CourseOfferings: For each non-trivial functional dependency, the left-hand


side (either CourseID, Semester or ProfID) is a superkey in the context of this
table.
Professors: The only non-trivial functional dependency is ProfID →
ProfName, where ProfID is a superkey.

This decomposition resolves the anomaly and redundancy issue while


maintaining the integrity and consistency of the database. Each table in BCNF
ensures minimal redundancy and eliminates update anomalies, making the
database design more robust and efficient for operations like insertions,
deletions, and updates.

You might also like