0% found this document useful (0 votes)
17 views49 pages

DBMS - Module 3

The document discusses relational database design, focusing on normalization, functional dependencies, and the identification of anomalies such as insertion, deletion, and update anomalies. It outlines the importance of creating good relation schemas, the process of decomposition, and the various normal forms (1NF, 2NF, 3NF) to minimize redundancy and maintain data integrity. Additionally, it highlights the significance of candidate keys and the implications of having multiple candidate keys in a relational database.

Uploaded by

vspranav88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views49 pages

DBMS - Module 3

The document discusses relational database design, focusing on normalization, functional dependencies, and the identification of anomalies such as insertion, deletion, and update anomalies. It outlines the importance of creating good relation schemas, the process of decomposition, and the various normal forms (1NF, 2NF, 3NF) to minimize redundancy and maintain data integrity. Additionally, it highlights the significance of candidate keys and the implications of having multiple candidate keys in a relational database.

Uploaded by

vspranav88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Relational Database

Design

Module III - DBMS


Knowledge is not free…
You have to pay
attention!
● Relational database design Anomalies in a Database
● Normalization Theory
● Functional Dependencies
● First, Second and Third Normal Forms
● Relations with more than one Candidate Key
● Good and Bad Decompositions
● Boyce Codd Normal Form
● Multivalued Dependencies and Fourth Normal Form
● Join Dependencies and Fifth Normal Form.
Relational Database Design
● What is relational database design?
○ The grouping of attributes to form "good" relation
schemas
● Two levels of relation schemas
○ The logical "user view" level
○ The storage "base relation" level
● Design is concerned mainly with base relations
● What are the criteria for "good" base relations?
○ Semantics of relational attributes
○ Redundant Information in Tuples and Anomalies
○ Null Values in Tuples
○ Spurious Tuples
Semantics of the Relation Attributes
● Informally, each tuple in a relation should represent one
entity or relationship instance. (Applies to individual
relations and their attributes).
○ Attributes of different entities (EMPLOYEEs,
DEPARTMENTs, PROJECTs) should not be mixed in the
same relation.
○ Only foreign keys should be used to refer to other
entities.
○ Entity and relationship attributes should be kept apart as
much as possible.
● Design a schema that can be explained easily relation by
relation. The semantics of attributes should be easy to
interpret.
Semantics of the Relation Attributes

A simplified COMPANY
relational database schema
Redundant Information in Tuples and Anomalies
● Mixing attributes of multiple entities may cause problems.
● Information is stored redundantly wasting storage
● Problems with anomalies
○ Insertion anomalies - when a new row is added to a
table and it causes an inconsistency
○ Deletion anomalies - when we delete some rows from a
table and any necessary additional information or data
is also lost from the database.
○ Update anomalies - If there are some changes in the
database, we have to apply that change in all the rows.
And if we miss any row, we will have one more field,
creating an update anomaly in the database.
Redundant Information in Tuples and Anomalies
Consider the relation:
EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)
● Update Anomaly: Changing the name of project number P1 from
―”Billing” to ―”Customer-Accounting” may cause this update to
be made for all 100 employees working on project P1.
● Insert Anomaly: Cannot insert a project unless an employee is
assigned to .
Inversely - Cannot insert an employee unless an he/she is
assigned to a project.
● Delete Anomaly: When a project is deleted, it will result in
deleting all the employees who work on that project.
Alternately, if an employee is the sole employee on a project,
deleting that employee would result in deleting the
corresponding project.
Anomalies Example
● Assume a manufacturing company stores employee details in a
table called Employee having four attributes:
Anomalies Example
● Insert anomaly - If there is a new row inserted in the table and it
creates the inconsistency in the table then it is called the insertion
anomaly.
Example: Assume that a new employee is joining the company under
training and not assigned to any department. Then, we would not
insert the data into the table if the emp_dept field doesn't allow
nulls.
● Delete anomaly - If we delete some rows from the table and if any
other information or data which is required is also deleted from
the database, this is called the deletion anomaly in the database.
Example: Assume that if the company closes the department D890,
then deleting the rows that have emp_dept as D890 would also
delete the information of employee Maggie since she is assigned
only to this department.
Anomalies Example
● Update anomaly - When we update some rows in the table, and if
it leads to the inconsistency of the table then this anomaly occurs.
This type of anomaly is known as an updation anomaly.
Example: In the given table, we have two rows for an employee
named Rick, and he belongs to two different departments of the
company. If we need to update Rick's address, we must update the
same address in two rows. Otherwise, the data will become
inconsistent.
If, in some way, we can update the correct address in one department
but not the other, then according to the database, Rick will have two
different addresses, which is not correct and would lead to
inconsistent data.
● Design a schema that does not suffer from the insertion, deletion
and update anomalies. If there are any present, then note them so
that applications can be made to take them into account.
Null Values in Tuples
● Relations should be designed such that their tuples will have
as few NULL values as possible.
● Attributes that are NULL frequently could be placed in
separate relations (with the primary key)
● Reasons for nulls:
○ attribute not applicable or invalid
○ attribute value unknown (may exist)
○ value known to exist, but unavailable
Spurious Tuples
● Bad designs for a relational database may result in
erroneous results for certain JOIN operations.
● The "lossless join" property is used to guarantee meaningful
results for join operations.
● The relations should be designed to satisfy the lossless join
condition.
● No spurious tuples should be generated by doing a
natural-join of any relations.
DECOMPOSITION
● Decomposition can be defined as a database management
system process for dividing a single relation into multiple
subrelations.
● Its main purpose is to break down the functions of a company
into fine levels of detail.
● It eliminates the anomalies and redundancy from the database
by breaking it up into many different tables.
● There are two important
properties of decompositions:
○ non-additive or losslessness
of the corresponding join
○ preservation of the functional
dependencies.
LOSSY DECOMPOSITION
● In a Lossy Decomposition, the relation needs to be decomposed
into two or more relational schemas. There is no way that loss
of information can be avoided during the retrieval of the
original relation.
LOSSLESS DECOMPOSITION
● Decomposition is lossless if it is feasible to reconstruct relation
R from decomposed tables using Joins. This is the preferred
choice. The information will not lose from the relation when
decomposed. The join would result in the same original
relation.
LOSSLESS DECOMPOSITION
● Decomposition is lossless if it is feasible to reconstruct relation
R from decomposed tables using Joins. This is the preferred
choice. The information will not lose from the relation when
decomposed. The join would result in the same original
relation.
LOSSLESS DECOMPOSITION
DECOMPOSITION
Advantages of decomposition in DBMS
● Easy use of Codes - The availability of decomposition makes it easier for programs
to copy and reuse important codes for other works in DBMS. It only not helps in
saving lots of time but also makes things convenient for the users.
● Finding Mistakes - Another reason the programmers opt for decomposition is to
allow them conveniently complete complex programs. The mistakes are quite easier
to find with this sort of programming.
● Problem-Solving Approach - It is considered a perfect problem-solving strategy
using which complex computer programs can be written easily. The users can
precisely join tons of code together for adequate results.
● Eliminating Errors - The biggest advantage of having decomposition in DBMS is
eliminating the inconsistencies and duplication to a greater extent. The data can be
easily identified in cases when decomposition happens in DBMS.
DECOMPOSITION
Properties of decomposition in DBMS
● Attribute Preservation - The functional dependencies decompose the universal relation
that attributes preservation of decomposition.
● Dependency Preservation - Dependency preservation can be defined as the functionality
that features directly in the relation schemas. There is a chance of dependency loss if the
decomposition is not preserved.
● No Redundancy - It is used for removing a few of the issues related to improper design,
such as redundancy, anomalies, and inconsistencies.
Issues of Decomposition
● Redundant Storage - Many instances where the same information gets stored in a single
place can confuse the programmers. It will take lots of space in the system.
● Insertion Anomalies - It isn’t essential for storing important details unless some kind of
information is stored in a consistent manner.
● Deletion Anomalies - It isn’t possible to delete some details without eliminating any sort
of information.
Functional Dependencies
Properties of decomposition in DBMS
● Attribute Preservation - The functional dependencies decompose the universal relation
that attributes preservation of decomposition.
● Dependency Preservation - Dependency preservation can be defined as the functionality
that features directly in the relation schemas. There is a chance of dependency loss if the
decomposition is not preserved.
● No Redundancy - It is used for removing a few of the issues related to improper design,
such as redundancy, anomalies, and inconsistencies.
Issues of Decomposition
● Redundant Storage - Many instances where the same information gets stored in a single
place can confuse the programmers. It will take lots of space in the system.
● Insertion Anomalies - It isn’t essential for storing important details unless some kind of
information is stored in a consistent manner.
● Deletion Anomalies - It isn’t possible to delete some details without eliminating any sort
of information.
NORMALIZATION
● Normalization is a technique for producing a set of relations with desirable
properties, given the data requirements of an enterprise.
● The process of normalization is a formal method that identifies relations based
on their primary or candidate keys and the functional dependencies among
their attributes.
● Without Normalization, it becomes difficult to handle and update the database,
without facing data loss. Insertion, Updation and Deletion Anamolies are very
frequent if Database is not Normalized.
● Normalization divides the larger table into smaller and links them using
relationships.
● The normal form is used to reduce redundancy from the database table.
NORMALIZATION
Advantages of Normalization
● Normalization helps to minimize data redundancy.
● Greater overall database organization.
● Data consistency within the database.
● Much more flexible database design.
● Enforces the concept of relational integrity.
Disadvantages of Normalization
● We cannot start building the database before knowing what the user needs.
● The performance degrades when normalizing the relations to higher normal
forms, i.e., 4NF, 5NF.
● It is very time-consuming and difficult to normalize relations of a higher degree.
● Careless decomposition may lead to a bad database design and serious
problems.
NORMALIZATION
NORMAL FORMS
1NF Definition:
It states that the domain of an attribute must
include only atomic (simple) values and that
the value of any attribute in a tuple must be a
single value from the domain of that attribute.

2NF Definition:
A relation schema R is in second normal form
(2NF) if every non-prime attribute A in R is
fully functionally dependent on the primary
key of R.

3NF Definition:
A relation schema R is in third normal form
(3NF) if it satisfies 2NF and no non-prime
attribute of R is transitively dependent on the
primary key.
FIRST NORMAL FORM
● First Normal Form is a relation in which the intersection of each row and
column contains one and only one value.
● As per First Normal Form, no two Rows of data must contain repeating group
of information i.e each set of column must have a unique value, such that
multiple columns cannot be used to fetch the same row. Each table should be
organized into rows, and each row should have a primary key that
distinguishes it as unique.
● The Primary key is usually a single column,
but sometimes more than one column can
be combined to create a single primary key.
For example consider a table which is not in
First normal form. - Student table:
FIRST NORMAL FORM
● In First Normal Form, any row must not have a column in which more than
one value is saved, like separated with commas. Rather than that, we must
separate such data into multiple rows. Student Table following 1NF will be :

Using the First Normal Form, data redundancy increases, as there will be many
columns with same data in multiple rows but each row as a whole will be unique.
Full functional dependency
● Full functional dependency indicates that if A and B are attributes of a
relation, B is fully functionally dependent on A if B is functionally dependent
on A, but not on any proper subset of A.
● A functional dependency A→B is partially dependent if there is some
attributes that can be removed from A and the dependency still holds.
Second Normal Form (2NF)
● Second normal form (2NF) is a relation that is in first normal form and every
non-primary-key attribute is fully functionally dependent on the primary key.
● The normalization of 1NF relations to 2NF involves the removal of partial
dependencies. If a partial dependency exists, we remove the function
dependent attributes from the relation by placing them in a new relation along
with a copy of their determinant.
● In example of First Normal Form there are two rows for Adam, to include
multiple subjects that he has opted for. While this is searchable, and follows
First normal form, it is an inefficient use of space. Also in the above Table in
First Normal Form, while the candidate key is {Student, Subject}, Age of
Student only depends on Student column, which is incorrect as per Second
Normal Form.
● To achieve second normal form, it would be helpful to split out the subjects
into an independent table, and match them up using the student names as
foreign keys.
Second Normal Form (2NF)

In Subject Table the candidate key will be {Student,


Subject} column. Now, both the above tables
qualifies for Second NF and will never suffer from
Update Anomalies. Although there are a few
complex cases in which table in Second NF suffers
Update Anomalies, and to handle those scenarios
Third NFis there.
Third Normal Form (3NF)
Transitive dependency
A condition where A, B, and C are attributes of a relation such that if A -> B and B -> C,
then C is transitively dependent on A via B (provided that A is not functionally
dependent on B or C).
Third normal form (3NF)
A relation that is in first and second normal form, and in which no non-primary-key
attribute is transitively dependent on the primary key.
The normalization of 2NF relations to 3NF involves the removal of transitive
dependencies by placing the attribute(s) in a new relation along with a copy of the
determinant.
Third Normal form applies that every non-prime attribute of table must be dependent
on primary key, or we can say that, there should not be the case that a non-prime
attribute is determined by another non-prime attribute. So this transitive functional
dependency should be removed from the table and also the table must be in Second
Normal form.
The advantage of removing transitive dependency is, Amount of data duplication is
reduced. Data integrity achieved.
Third Normal Form (3NF)
Student_Detail Table :
In this table Student_id is Primary key, but street, city and state depends upon Zip. The
dependency between zip and other fields is called transitive dependency. Hence to
apply 3NF, we need to move the street, city and state to new table, with Zip as primary
key.
Relations with more than one Candidate Key
In relational database design, a candidate key is a set of attributes (columns) that can
uniquely identify a tuple (row) in a relation (table). When a relation has more than one
candidate key, it means there are multiple ways to uniquely identify a tuple, and each
candidate key could be chosen as the primary key.

Example: Employee Table


emp_id email ssn name department
101 [email protected] 123-45-6789 Alice HR
102 [email protected] 987-65-4321 Bob IT
103 [email protected] 456-78-9123 Charlie Finance

Candidate Keys:
emp_id (Each employee has a unique ID)
email (Each employee has a unique email)
ssn (Each employee has a unique Social Security Number)
Superkey vs. Candidate Key
A superkey is any set of attributes that can uniquely identify a tuple (row) in a relation.
It may contain extra attributes that are not necessary for uniqueness.
A candidate key is a minimal superkey, meaning it contains only the necessary
attributes to uniquely identify a row (no extra attributes).
Example: Employee Table
emp_id email ssn name department
101 [email protected] 123-45-6789 Alice HR
102 [email protected] 987-65-4321 Bob IT
103 [email protected] 456-78-9123 Charlie Finance
Superkeys (Candidate Keys + Extra Attributes)
A superkey is any superset of a candidate key. Examples of superkeys include:
{ emp_id } (Candidate key, also a superkey)
{ email } (Candidate key, also a superkey)
{ ssn } (Candidate key, also a superkey)
{ emp_id, name } (Superkey, but not a candidate key because name is unnecessary)
{ email, department } (Superkey, but department is unnecessary)
{ ssn, email, emp_id } (Superkey, but contains extra attributes)
BCNF (Boyce-Codd Normal Form)
A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X -> Y
holds in R, then X is a superkey of R
Each normal form is strictly stronger than the previous one
● Every 2NF relation is in 1NF
● Every 3NF relation is in 2NF
● Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
A table is in BCNF if every functional dependency X -> Y, X is the super key of the table.
For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than
one department.
EMPLOYEE table:
In the above table FDs are as follows:
1.EMP_ID -> EMP_COUNTRY
2.EMP_DEPT -> {DEPT_TYPE,
EMP_DEPT_NO}
BCNF (Boyce-Codd Normal Form)

Candidate keys:
For the first table: EMP_ID
For the second table:
EMP_DEPT
For the third table: {EMP_ID,
EMP_DEPT}
Now, this is in BCNF because
the left side part of both the
functional dependencies is key.
BCNF (Boyce-Codd Normal Form)
a. 3 NF relation converted into BCNF.

b. Relation in 3NF, but not in BCNF


BCNF (Boyce-Codd Normal Form)
A relation TEACH that is in 3NF but not in BCNF

Two FDs exist in the relation TEACH:


fd1: { student, course} -> instructor
fd2: instructor -> course
{student, course} is a candidate key for this relation. So this relation is in 3NF but not in BCNF.
A relation NOT in BCNF should be decomposed so as to meet this property, while possibly forgoing the
preservation of all functional dependencies in the decomposed relations.
BCNF (Boyce-Codd Normal Form)
Achieving the BCNF by Decomposition
Three possible decompositions for relation TEACH
1. {student, instructor} and {student, course}
2. {course, instructor } and {course, student}
3. {instructor, course } and {instructor, student}
All three decompositions will lose fd1. We have to settle for sacrificing the
functional dependency preservation. But we cannot sacrifice the non-additivity
property after decomposition.
Out of the above three, only the 3rd decomposition will not generate spurious
tuples after join.(and hence has the non-additivity property).
Verify whether a binary decomposition (decomposition into two relations) is
nonadditive (lossless), the third decomposition above meets the property.
Multivalued Dependencies and Fourth Normal Form
● 4NF ensures no non-trivial multivalued dependencies other than those involving a
candidate key. It follows the preceding Normal forms.
● The 4NF strictly ensures the presence of only one multivalued dependency. For
the database to be in 4NF, the key conditions to be satisfied include:
○ Presence of Boyce-Codd Normal Form (BCNF) in DBMS
○ Absence of multi-valued dependency

(a) The EMP relation with two MVDs: ENAME —>> PNAME and ENAME —>>
DNAME.

(b) Decomposing the EMP relation into two 4NF


relations EMP_PROJECTS and EMP_DEPENDENTS.
Multivalued Dependencies and Fourth Normal Form
A relation schema R is in 4NF with respect to a set of dependencies F (that includes
functional dependencies and multivalued dependencies) if, for every nontrivial
multivalued dependency X —>> Y in F+, X is a superkey for R.
Note: The set of all dependencies that include F as well as all dependencies that can be
inferred from F is called the closure of F; it is denoted by F+.
Decomposing a relation state of EMP that is not in 4NF:
(a) EMP relation with additional tuples.
(b) Two corresponding 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.
Lossless (Non-additive) Join
Decomposition into 4NF Relations:
PROPERTY LJ1’
The relation schemas R1 and R2 form a
lossless (nonadditive) join decomposition
of R with respect to a set F of functional
and multivalued dependencies if and only
if (R1 ∩ R2) —>> (R1- R2) or by symmetry,
if and only if (R1 ∩ R2) —>> (R2- R1)).
Multivalued Dependencies and Fourth Normal Form
JOIN Dependencies and FIFTH Normal Form
● A join dependency (JD), denoted by JD(R1, R2, ..., Rn), specified on relation schema
R, specifies a constraint on the states r of R.
● The constraint states that every legal state r of R should have a non-additive join
decomposition into R1, R2, ..., Rn; that is, for every such r we have * (πR1(r), πR2(r), ...,
πRn(r)) = r
Note: an MVD is a special case of a JD where n = 2.
● A join dependency JD(R1, R2, ..., Rn), specified on relation schema R, is a trivial JD if
one of the relation schemas Ri in JD(R1, R2, ..., Rn) is equal to R.
● A relation schema R is in fifth normal form (5NF) (or Project-Join Normal Form
(PJNF)) with respect to a set F of functional, multivalued, and join dependencies if,
for every nontrivial join dependency JD(R1, R2, ..., Rn) in F+ (that is, implied by F),
every Ri is a superkey of R.
● The database is to be in 5NF; it requires 4NF data without any join dependencies. It
should also possess the lossless join property. 5NF eliminates redundancy caused
by join dependencies and ensures lossless decomposition of tables into smaller
relations.
JOIN Dependencies and FIFTH Normal Form
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form
REVISION QUESTIONS
1. Identify and discuss each of the indicated
dependencies in the dependency diagram.

2. To keep track of students and courses, a new college uses the table structure.
Draw the dependency diagram for this table.

3. Using the dependency diagram you just drew, show the tables (in their third normal
form) you would create to fix the problems you encountered. Draw the dependency
diagram for the fixed table.
REVISION QUESTIONS
4. An agency called Instant Cover supplies part-time/temporary staff to hotels in
Scotland. Figure lists the time spent by agency staff working at various hotels. The
national insurance number (NIN) is unique for every member of staff. Use Figure to
answer questions (a) and (b).

a. This table is susceptible to update anomalies. Provide examples of insertion,


deletion and update anomalies.
b. Normalize this table to third normal form. State any assumptions.
REVISION QUESTIONS
2 Mark Questions:
1. Explain about Multivalued dependency.
2. Write the need of normalization.
3. What are good and bad decompositions?
4. What is meant by functional dependencies?
5. Explain the guidelines to design a good database.
6. Given relation R with attributes A, B, C, D, E, F and set of FDs as
A-> BC, E-> CF, B ->E and CD->EF.
Find out closure {A, B}+ of the set of attributes.
7. List out the desirable properties of decomposition.
8. What is 4NF and 5NF?
REVISION QUESTIONS
5 Mark Questions:
1. Define 2NF and 5 NF.
2. What is decomposition? Explain its purpose.
3. Explain relational database anomalies.
4. Write a note on Join dependency and 5NF.
5. Define 1NF and 4NF
6. Define 3NF and BCNF.
7. Define decomposition and explain how it addresses the redundancy problem.
8. What are the problems caused by decomposition.
9. Should all relational databases be normalised? Justify your answer.
10 Mark Questions:
1. What is normalization? Explain 1Nf, 2 NF and 3NF with examples.
Thank You
Happy Learning!

You might also like