DBMS - Module 3
DBMS - Module 3
Design
A simplified COMPANY
relational database schema
Redundant Information in Tuples and Anomalies
● Mixing attributes of multiple entities may cause problems.
● Information is stored redundantly wasting storage
● Problems with anomalies
○ Insertion anomalies - when a new row is added to a
table and it causes an inconsistency
○ Deletion anomalies - when we delete some rows from a
table and any necessary additional information or data
is also lost from the database.
○ Update anomalies - If there are some changes in the
database, we have to apply that change in all the rows.
And if we miss any row, we will have one more field,
creating an update anomaly in the database.
Redundant Information in Tuples and Anomalies
Consider the relation:
EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)
● Update Anomaly: Changing the name of project number P1 from
―”Billing” to ―”Customer-Accounting” may cause this update to
be made for all 100 employees working on project P1.
● Insert Anomaly: Cannot insert a project unless an employee is
assigned to .
Inversely - Cannot insert an employee unless an he/she is
assigned to a project.
● Delete Anomaly: When a project is deleted, it will result in
deleting all the employees who work on that project.
Alternately, if an employee is the sole employee on a project,
deleting that employee would result in deleting the
corresponding project.
Anomalies Example
● Assume a manufacturing company stores employee details in a
table called Employee having four attributes:
Anomalies Example
● Insert anomaly - If there is a new row inserted in the table and it
creates the inconsistency in the table then it is called the insertion
anomaly.
Example: Assume that a new employee is joining the company under
training and not assigned to any department. Then, we would not
insert the data into the table if the emp_dept field doesn't allow
nulls.
● Delete anomaly - If we delete some rows from the table and if any
other information or data which is required is also deleted from
the database, this is called the deletion anomaly in the database.
Example: Assume that if the company closes the department D890,
then deleting the rows that have emp_dept as D890 would also
delete the information of employee Maggie since she is assigned
only to this department.
Anomalies Example
● Update anomaly - When we update some rows in the table, and if
it leads to the inconsistency of the table then this anomaly occurs.
This type of anomaly is known as an updation anomaly.
Example: In the given table, we have two rows for an employee
named Rick, and he belongs to two different departments of the
company. If we need to update Rick's address, we must update the
same address in two rows. Otherwise, the data will become
inconsistent.
If, in some way, we can update the correct address in one department
but not the other, then according to the database, Rick will have two
different addresses, which is not correct and would lead to
inconsistent data.
● Design a schema that does not suffer from the insertion, deletion
and update anomalies. If there are any present, then note them so
that applications can be made to take them into account.
Null Values in Tuples
● Relations should be designed such that their tuples will have
as few NULL values as possible.
● Attributes that are NULL frequently could be placed in
separate relations (with the primary key)
● Reasons for nulls:
○ attribute not applicable or invalid
○ attribute value unknown (may exist)
○ value known to exist, but unavailable
Spurious Tuples
● Bad designs for a relational database may result in
erroneous results for certain JOIN operations.
● The "lossless join" property is used to guarantee meaningful
results for join operations.
● The relations should be designed to satisfy the lossless join
condition.
● No spurious tuples should be generated by doing a
natural-join of any relations.
DECOMPOSITION
● Decomposition can be defined as a database management
system process for dividing a single relation into multiple
subrelations.
● Its main purpose is to break down the functions of a company
into fine levels of detail.
● It eliminates the anomalies and redundancy from the database
by breaking it up into many different tables.
● There are two important
properties of decompositions:
○ non-additive or losslessness
of the corresponding join
○ preservation of the functional
dependencies.
LOSSY DECOMPOSITION
● In a Lossy Decomposition, the relation needs to be decomposed
into two or more relational schemas. There is no way that loss
of information can be avoided during the retrieval of the
original relation.
LOSSLESS DECOMPOSITION
● Decomposition is lossless if it is feasible to reconstruct relation
R from decomposed tables using Joins. This is the preferred
choice. The information will not lose from the relation when
decomposed. The join would result in the same original
relation.
LOSSLESS DECOMPOSITION
● Decomposition is lossless if it is feasible to reconstruct relation
R from decomposed tables using Joins. This is the preferred
choice. The information will not lose from the relation when
decomposed. The join would result in the same original
relation.
LOSSLESS DECOMPOSITION
DECOMPOSITION
Advantages of decomposition in DBMS
● Easy use of Codes - The availability of decomposition makes it easier for programs
to copy and reuse important codes for other works in DBMS. It only not helps in
saving lots of time but also makes things convenient for the users.
● Finding Mistakes - Another reason the programmers opt for decomposition is to
allow them conveniently complete complex programs. The mistakes are quite easier
to find with this sort of programming.
● Problem-Solving Approach - It is considered a perfect problem-solving strategy
using which complex computer programs can be written easily. The users can
precisely join tons of code together for adequate results.
● Eliminating Errors - The biggest advantage of having decomposition in DBMS is
eliminating the inconsistencies and duplication to a greater extent. The data can be
easily identified in cases when decomposition happens in DBMS.
DECOMPOSITION
Properties of decomposition in DBMS
● Attribute Preservation - The functional dependencies decompose the universal relation
that attributes preservation of decomposition.
● Dependency Preservation - Dependency preservation can be defined as the functionality
that features directly in the relation schemas. There is a chance of dependency loss if the
decomposition is not preserved.
● No Redundancy - It is used for removing a few of the issues related to improper design,
such as redundancy, anomalies, and inconsistencies.
Issues of Decomposition
● Redundant Storage - Many instances where the same information gets stored in a single
place can confuse the programmers. It will take lots of space in the system.
● Insertion Anomalies - It isn’t essential for storing important details unless some kind of
information is stored in a consistent manner.
● Deletion Anomalies - It isn’t possible to delete some details without eliminating any sort
of information.
Functional Dependencies
Properties of decomposition in DBMS
● Attribute Preservation - The functional dependencies decompose the universal relation
that attributes preservation of decomposition.
● Dependency Preservation - Dependency preservation can be defined as the functionality
that features directly in the relation schemas. There is a chance of dependency loss if the
decomposition is not preserved.
● No Redundancy - It is used for removing a few of the issues related to improper design,
such as redundancy, anomalies, and inconsistencies.
Issues of Decomposition
● Redundant Storage - Many instances where the same information gets stored in a single
place can confuse the programmers. It will take lots of space in the system.
● Insertion Anomalies - It isn’t essential for storing important details unless some kind of
information is stored in a consistent manner.
● Deletion Anomalies - It isn’t possible to delete some details without eliminating any sort
of information.
NORMALIZATION
● Normalization is a technique for producing a set of relations with desirable
properties, given the data requirements of an enterprise.
● The process of normalization is a formal method that identifies relations based
on their primary or candidate keys and the functional dependencies among
their attributes.
● Without Normalization, it becomes difficult to handle and update the database,
without facing data loss. Insertion, Updation and Deletion Anamolies are very
frequent if Database is not Normalized.
● Normalization divides the larger table into smaller and links them using
relationships.
● The normal form is used to reduce redundancy from the database table.
NORMALIZATION
Advantages of Normalization
● Normalization helps to minimize data redundancy.
● Greater overall database organization.
● Data consistency within the database.
● Much more flexible database design.
● Enforces the concept of relational integrity.
Disadvantages of Normalization
● We cannot start building the database before knowing what the user needs.
● The performance degrades when normalizing the relations to higher normal
forms, i.e., 4NF, 5NF.
● It is very time-consuming and difficult to normalize relations of a higher degree.
● Careless decomposition may lead to a bad database design and serious
problems.
NORMALIZATION
NORMAL FORMS
1NF Definition:
It states that the domain of an attribute must
include only atomic (simple) values and that
the value of any attribute in a tuple must be a
single value from the domain of that attribute.
2NF Definition:
A relation schema R is in second normal form
(2NF) if every non-prime attribute A in R is
fully functionally dependent on the primary
key of R.
3NF Definition:
A relation schema R is in third normal form
(3NF) if it satisfies 2NF and no non-prime
attribute of R is transitively dependent on the
primary key.
FIRST NORMAL FORM
● First Normal Form is a relation in which the intersection of each row and
column contains one and only one value.
● As per First Normal Form, no two Rows of data must contain repeating group
of information i.e each set of column must have a unique value, such that
multiple columns cannot be used to fetch the same row. Each table should be
organized into rows, and each row should have a primary key that
distinguishes it as unique.
● The Primary key is usually a single column,
but sometimes more than one column can
be combined to create a single primary key.
For example consider a table which is not in
First normal form. - Student table:
FIRST NORMAL FORM
● In First Normal Form, any row must not have a column in which more than
one value is saved, like separated with commas. Rather than that, we must
separate such data into multiple rows. Student Table following 1NF will be :
Using the First Normal Form, data redundancy increases, as there will be many
columns with same data in multiple rows but each row as a whole will be unique.
Full functional dependency
● Full functional dependency indicates that if A and B are attributes of a
relation, B is fully functionally dependent on A if B is functionally dependent
on A, but not on any proper subset of A.
● A functional dependency A→B is partially dependent if there is some
attributes that can be removed from A and the dependency still holds.
Second Normal Form (2NF)
● Second normal form (2NF) is a relation that is in first normal form and every
non-primary-key attribute is fully functionally dependent on the primary key.
● The normalization of 1NF relations to 2NF involves the removal of partial
dependencies. If a partial dependency exists, we remove the function
dependent attributes from the relation by placing them in a new relation along
with a copy of their determinant.
● In example of First Normal Form there are two rows for Adam, to include
multiple subjects that he has opted for. While this is searchable, and follows
First normal form, it is an inefficient use of space. Also in the above Table in
First Normal Form, while the candidate key is {Student, Subject}, Age of
Student only depends on Student column, which is incorrect as per Second
Normal Form.
● To achieve second normal form, it would be helpful to split out the subjects
into an independent table, and match them up using the student names as
foreign keys.
Second Normal Form (2NF)
Candidate Keys:
emp_id (Each employee has a unique ID)
email (Each employee has a unique email)
ssn (Each employee has a unique Social Security Number)
Superkey vs. Candidate Key
A superkey is any set of attributes that can uniquely identify a tuple (row) in a relation.
It may contain extra attributes that are not necessary for uniqueness.
A candidate key is a minimal superkey, meaning it contains only the necessary
attributes to uniquely identify a row (no extra attributes).
Example: Employee Table
emp_id email ssn name department
101 [email protected] 123-45-6789 Alice HR
102 [email protected] 987-65-4321 Bob IT
103 [email protected] 456-78-9123 Charlie Finance
Superkeys (Candidate Keys + Extra Attributes)
A superkey is any superset of a candidate key. Examples of superkeys include:
{ emp_id } (Candidate key, also a superkey)
{ email } (Candidate key, also a superkey)
{ ssn } (Candidate key, also a superkey)
{ emp_id, name } (Superkey, but not a candidate key because name is unnecessary)
{ email, department } (Superkey, but department is unnecessary)
{ ssn, email, emp_id } (Superkey, but contains extra attributes)
BCNF (Boyce-Codd Normal Form)
A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X -> Y
holds in R, then X is a superkey of R
Each normal form is strictly stronger than the previous one
● Every 2NF relation is in 1NF
● Every 3NF relation is in 2NF
● Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
A table is in BCNF if every functional dependency X -> Y, X is the super key of the table.
For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than
one department.
EMPLOYEE table:
In the above table FDs are as follows:
1.EMP_ID -> EMP_COUNTRY
2.EMP_DEPT -> {DEPT_TYPE,
EMP_DEPT_NO}
BCNF (Boyce-Codd Normal Form)
Candidate keys:
For the first table: EMP_ID
For the second table:
EMP_DEPT
For the third table: {EMP_ID,
EMP_DEPT}
Now, this is in BCNF because
the left side part of both the
functional dependencies is key.
BCNF (Boyce-Codd Normal Form)
a. 3 NF relation converted into BCNF.
(a) The EMP relation with two MVDs: ENAME —>> PNAME and ENAME —>>
DNAME.
2. To keep track of students and courses, a new college uses the table structure.
Draw the dependency diagram for this table.
3. Using the dependency diagram you just drew, show the tables (in their third normal
form) you would create to fix the problems you encountered. Draw the dependency
diagram for the fixed table.
REVISION QUESTIONS
4. An agency called Instant Cover supplies part-time/temporary staff to hotels in
Scotland. Figure lists the time spent by agency staff working at various hotels. The
national insurance number (NIN) is unique for every member of staff. Use Figure to
answer questions (a) and (b).