CO3-Notes-Database Design and Normalization
CO3-Notes-Database Design and Normalization
Database Design
Design is the process of creating and planning the construction of a system, or environment. It
involves identifying the needs and constraints of the project, developing a concept or idea, and
refining it through iteration until a final design is achieved.
In the Context of Database Design, it involves creating a structured and organized approach to storing
and managing data. The goal of database design is to create a database that is efficient, effective, and
easy to use. Database systems are designed to manage large amounts of information that are typically
related to the operations of an organization or enterprise. The information stored in a database is often
used to support the activities of the organization, whether it is for internal operations or to provide
services to customers or clients.
Good Database Design helps organizations avoid the problems and achieve the benefits through
efficient data retrieval and manipulation, accurate and secure data, and easy maintenance and updates.
Overall, Good Database Design is essential for organizations that want to effectively manage their
data and avoid the consequences of a bad design.
The following six steps has to be followed during its design process
1. Requirements Analysis: In requirement analysis for database design, the main goal is to
understand the needs and expectations of the stakeholders for the database system to be developed.
The following are the steps involved in requirement analysis for database design:
• Identify stakeholders: The first step is to identify the stakeholders who will be using the
database system.
• Gather requirements: Once the stakeholders are identified, the next step is to conduct
interviews or surveys to gather information about their requirements and expectations for the
system.
• Define requirements: The information gathered from the stakeholders can then be used to
create a list of requirements for the database system.
After gathering the requirements these requirements are organized and represented using appropriate
tools and are given as input to the conceptual database design phase.
2. Conceptual Database Design: Specifications are converted into ER-Model or Any other similarly
high-level conceptual database design model. The conceptual database design is the first stage of
database design. ER-Model provides a simple description of the data. It is a high-level view of the
entire database that describes what the database should contain and how the data should be related to
each other. This design is usually presented in an Entity-Relationship (ER) diagram. The main focus
is on the overall structure and relationships between entities. Once the requirement specifications are
converted into ER-Model it is given as input to the logical database design phase.
3. Logical Database Design Schema: The logical database design is an important stage of database
design. It focuses on converting the conceptual design into a detailed logical model that can be
implemented in a database management system (DBMS). The main focus is on defining the data
elements, their relationships, and the data constraints. This design is usually presented in the form of
tables, columns, and relationships.
Logical database design means that ER diagrams are now converted into actual relational database
schemas, and these relational database schemas are given as input to the schema refinement phase.
4. Schema Refinement: Database designed based on the E-R model may have some amount of
• Inconsistency
• Uncertainty
• Redundancy
• Guideline:1: Making sure that the semantics of the attributes is clear in the schema
• Guideline:2: Reducing the redundant information in tuples
• Guideline:3: Reducing the NULL values in tuples
• Guideline:4: Disallowing the possibility of generating spurious tuple
Guideline:1 Making sure that the semantics of the attributes is clear in the relations
The semantics of a relation refers to its meaning resulting from the interpretation of attribute values in
a tuple. Design a relation schema so that it is easy to explain its meaning. Do not combine attributes
from multiple entity types and relationship types into a single relation. Attributes of different entities
(EMPLOYEEs, DEPARTMENTs, PROJECTs) should not be mixed in the same relation. Only
foreign keys should be used to refer to other entities.
Guideline:2 Redundant Information in Tuples and Update Anomalies
• Wastes storage: Grouping attributes into relation schemas has a significant effect on storage
space.
• Problems with update anomalies: Storing natural joins of base relations leads to an additional
problem referred to as update anomalies. These can be classified into insertion anomalies,
deletion anomalies, and modification anomalies.
• Insertion anomalies: To insert a new employee tuple into EMP_DEPT, we must include either
the attribute values for the department that the employee works for, or NULLs (if the
employee does not work for a department as yet). For example, to insert a new tuple for an
employee who works in department number 5, we must enter all the attribute values of
department 5 correctly so that they are consistent with the corresponding values for
department 5 in other tuples in EMP_DEPT. It is difficult to insert a new department that has
no employees as yet in the EMP_DEPT relation.
• Deletion anomalies: If we delete from EMP_DEPT an employee tuple that happens to
represent the last employee working for a particular department, the information concerning
that department is lost inadvertently from the database. This problem does not occur in the
database because DEPARTMENT tuples are stored separately.
• Modification anomalies: In EMP_DEPT, if we change the value of one of the attributes of a
particular department—say, the manager of department 5—we must update the tuples of all
employees who work in that department; otherwise, the database will become inconsistent
Design the base relation schemas so that no insertion, deletion, or modification anomalies are present
in the relations. If any anomalies are present, note them clearly and make sure that the programs that
update the database will operate correctly.
As far as possible, avoid placing attributes in a base relation whose values may frequently be NULL.
If NULLs are unavoidable, make sure that they apply in exceptional cases only and do not apply to a
majority of tuples in the relation.
• The attribute does not apply to this tuple. For example, Visa_status may not apply to U.S.
students.
• The attribute value for this tuple is unknown. For example, the Date_of_birth may be
unknown for an employee.
• Bad designs for a relational database may result in erroneous results for certain JOIN
operations.
• The "lossless join" property is used to guarantee meaningful results for join operations.
Design relation schemas so that they can be joined with equality conditions on attributes that are
appropriately related (primary key, foreign key) pairs in a way that guarantees that no spurious tuples
are generated. Avoid relations that contain matching attributes that are not (foreign key, primary key)
combinations because joining on such attributes may produce spurious tuples.
5. Physical Database Design: The physical database design is the third and final stage of database
design. It focuses on implementing the logical design in a specific database management system by
defining the physical database schema. This includes defining the storage structures, access methods,
indexes, and other physical parameters. The main focus is on how the database will be physically
implemented on a specific platform.
6. Application and Security Design: Application and security design for database management
systems (DBMS), there are several key considerations that must be taken into account. Here are some
important points to keep in mind:
Functional Dependencies
A Functional Dependency is a relationship between or among attributes of a relation. For example,
if we know the value of Customer Account no then we can find the value of Customer Balance, if
this is true then we can say that Customer balance is functional dependent on Customer Account
no.
AccountNo → Balance
As another example:
ISBN → Title
Let X and Y are two attributes of a relation and given the value of X, if there is only one value of Y
corresponding to it then Y is said to be functionally dependent on X and this is indicated by the
notation:
X →Y
It means:
➢ Y is functionally dependent on X.
➢ X determines Y
➢ X is called determinant or attributes in the left side of the arrow are called determinants.
A B C D
a1 b1 c1 d1
a1 b2 c1 d2
a2 b2 c2 d2
a2 b3 c2 d3
a3 b4 c2 d4
Emp_Proj (EmpId,Pnumber,Hours,Ename,Pname,Plocation)
2) Pnumber → { Pname,Location}
3) {EmpId,Pnumber} → Hours
X, Z→Y
It means that there is only one value of Y corresponding to the given values of X, Z.
1) Reflexivity or Reflexive Rule: If Y⊆X, then X →Y. This axiom says indicates that a
given set of attributes the set itself determines any of its subsets.
2) Augmentation Rule: If X →Y then XZ →YZ. We can augment the left side of the FD or
both sides conveniently with one or more attributes but the axiom does not allow
augmenting the right side alone.
Normalization
The basic objective of Normalization is to reduce redundancy, which means information is to be
stored only once. Storing information several times leads to the insertion, update and deletion
anomalies, wastage of storage space and increase in the total size of the data stored
Normalization of data can be considered a process of analyzing the given relation schemas based on
their FDs and primary keys to achieve the desirable properties of:
(1) Minimizing redundancy and Minimizing the insertion, deletion, and update anomalies
Student_Course Relation:
Primary Key—(StudentNo,Courseno)
Student(StudentNo,studentname,Address)
Course(CourseNo,CourseName,Instructor)
StudentCourse (StudentNo,CourseNo)
Normal Forms
A number of Normal forms have been defined for classifying relations. Each Normal form has
associated with it a number of constraints on the kind of FDs that could be associated with the
relation.
The Normal Forms are used to ensure that various types of anomalies and inconsistencies are not
introduced into the database or we can say that a relation is said to be in a normal form if it satisfies
a certain prescribed set of conditions.
There are several stages of Normalization process. These are called the First Normal
Form(1NF),Second Normal Form(2NF),Third Normal Form(3NF),Boyce-Codd Normal
Form(BCNF),Forth Normal form etc.
It was defined to disallow multivalued attributes, composite attributes, and their combinations. It
states that the domain of an attribute must include only atomic (simple, indivisible) values and that
the value of any attribute in a tuple must be a single value from the domain of that attribute. Hence,
1NF disallows having a set of values, a tuple of values, or a combination of both as an attribute
value for a single tuple.
R(ABCDEFH) AH is only candidate key of R then the attributes A and H are prime attributes and
B,C,D,E,F are non-prime attributes.
EMP_PROJ
{EmpId, Pnumber} → Hours is a full dependency (neither EmpId → Hours nor Pnumber → Hours
holds).
However, the dependency {EmpId, Pnumber} → Ename is partial because EmpId→ Ename holds.
Transitive Dependency
R(A,B,C,D,E) and given set of FDs F={ AB →C,B→D,C→E) and AB is the candidate key.Since
EMP_PROJ
FDs are
The test for 2NF involves testing for functional dependencies whose left-hand side attributes are
part of the primary key. If the primary key contains a single attribute, the test need not be applied at
all.
If a relation schema is not in 2NF, it can be second normalized or 2NF normalized into a number of
2NF relations in which nonprime attributes are associated only with the part of the primary key on
which they are fully functionally dependent. Therefore, we decompose the EMP_PROJ into the
three relation schemas EP1, EP2, and EP3 shown in Figure, each of which is in 2NF.
EP1
EP2
EmpId EName
EP3
A Relation schema in Third normal form does not allow partial or transitive dependencies. The
relation schema EMP_DEPT in Figure is in 2NF, since no partial dependencies on a key exist.
Consider the relation EMP_DEPT:
EMP_DEPT
1) EmpId→ Ename,Bdate,Address,Dnumber
2) Dnumber → Dname,Dmgr_no
However, EMP_DEPT is not in 3NF because of the transitive dependency of Dmgr_no (and also
Dname) on EmpId via Dnumber.
We can normalize EMP_DEPT by decomposing it into the two 3NF relation schemas ED1 and
ED2 shown in Figure by removing the attributes that violate 3NF and placing them with the
attributes through which they are transitively dependent into another relation.
ED1
ED2
A Relation is in BCNF when every determinant is a candidate key or we can say that if an attribute
of a composite key is dependent on an attribute of the other composite key, a Normalization called
BCNF is needed.
When a table contains only one candidate key, the 3NF and the BCNF are equivalent. BCNF can be
violated when the table contains more than one candidate key.
The relation TEACH is in 3NF since there are no partial dependencies or transitive dependencies.
We see that relation is not in BCNF because although Instructor is a determinant, it is not a
Candidate key.
We can convert the TEACH relation into BCNF by dividing it into two relations. The attribute that
is a determinant but not a Candidate key must also be placed in a separate relation and must be the
key of that relation.
TEACH1
Instructor Course
TEACH2
Instructor Student
When a relation is in BCNF, there are no longer any anomalies that result from functional
dependencies. However, there may still be anomalies that result from Multivalued Dependency.
For example:
StudentId Subject Activity
100 Music Swimming
100 Accounting Swimming
100 Music Tennis
100 Accounting Tennis
150 Math Jogging
(StudentId,Subject,Activity)
StudentId →→ Subject
StudentId →→Activity
In general, Multivalued dependency exists when a relation has at least three attributes, two of
them are multivalued and their values depend on only the third attribute. In other words, in a
Relation R(A,B,C) a multivalued dependency exists if A determines multiple values of B (
A→→B) and A determines multiple values of C (A →→C) and B and C are independent of
each other.
A relation is in 4NF if it is BCNF and has no Multivalued dependencies. So, we can say that
4NF is needed when a relation has undesirable Multivalued dependencies. We have to
eliminate these anomalies by creating two relations, each one storing data for only one of the
two Multivalued attributes.
StudentId Subject
100 Music
100 Accounting
150 Math
StudentId Activity
100 Swimming
100 Tennis
150 Jogging
Now these both relations are in Fourth Normal form as each relation has only one multivalued
attribute.
ANOTHER EXAMPLE OF FORTH NORMAL FORM (4NF)
The Fifth normal form (5NF) is generally not implemented in real life database design. But we
must learn the concept about it. 5NF is also known as Project join normal form (PJ/NF). A
relation will be in 5NF if
• It is in 4NF
• It does not have join dependency.