Unit 2
Unit 2
2.1 Overview
2.2 ER – Model
2.3 Constraints
2.4 E-R Diagrams, ERD Issues, Weak Entity Sets
2.5 Codd’s Rules
2.6 Relational database model: Logical view of data, keys, integrity rules
2.7 Relational Database design: Features of good relational database design
2.8 Atomic domain and Normalization 1NF, 2NF, 3NF, BCNF
The Entity Relational Model is a model for identifying entities to be represented in the database and
representation of how those entities are related. The ER data model specifies enterprise schema that
represents the overall logical structure of a database graphically.
The Entity Relationship Diagram explains the relationship among the entities present in the database.
ER models are used to model real-world objects like a person, a car, or a company and the relation
between these real-world objects. In short, the ER Diagram is the structural format of the database.
ER diagrams are used to represent the E-R model in a database, which makes them easy to be
converted into relations (tables).
ER diagrams provide the purpose of real-world modeling of objects, which makes them
intently useful.
These diagrams are very easy to understand and easy to create even for a naive user.
Lines: Lines represent attributes to entities and entity sets with other relationship types.
ER Model consists of Entities, Attributes, and Relationships among Entities in a Database System.
Entity
An Entity may be an object with a physical existence – a particular person, car, house, or employee
– or it may be an object with a conceptual existence – a company, a job, or a university course.
Entity Set: An Entity is an object of Entity Type and a set of all entities is called an entity set. For
Example, E1 is an entity having Entity Type Student and the set of all students is called Entity Set.
In ER diagram, Entity Type is represented as:
2
1. Strong Entity
A Strong Entity is a type of entity that has a key Attribute. Strong Entity does not depend on other
Entity in the Schema. It has a primary key, which helps in identifying it uniquely, and it is represented
by a rectangle. These are called Strong Entity Types.
2. Weak Entity
An Entity type has a key attribute that uniquely identifies each entity in the entity set. But some
entity type exists for which key attributes can’t be defined. These are called Weak Entity types.
For Example, A company may store the information of dependents (Parents, Children, Spouse) of
an Employee. But the dependents don’t have existed without the employee. So Dependent will be a
Weak Entity Type and Employee will be Identifying Entity type for Dependent, which means it is
Strong Entity Type.
A weak entity type is represented by a Double Rectangle. The participation of weak entity types is
always total. The relationship between the weak entity type and its identifying strong entity type is
called identifying relationship and it is represented by a double diamond.
Attributes
Attributes are the properties that define the entity type. For example, Roll_No, Name, DOB, Age,
Address, and Mobile_No are the attributes that define entity type Student. In ER diagram, the attribute
is represented by an oval.
Attribute
1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called the key attribute. For
example, Roll_No will be unique for each student. In ER diagram, the key attribute is represented by
an oval with underlying lines.
Key Attribute
3
2. Composite Attribute
An attribute composed of many other attributes is called a composite attribute. For example, the
Address attribute of the student Entity type consists of Street, City, State, and Country. In ER
diagram, the composite attribute is represented by an oval comprising of ovals.
Composite Attribute
3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example, Phone_No (can be
more than one for a given student). In ER diagram, a multivalued attribute is represented by a double
oval.
Multivalued Attribute
4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is known as a derived attribute.
e.g.; Age (can be derived from DOB). In ER diagram, the derived attribute is represented by a dashed
oval.
Derived Attribute
The Complete Entity Type Student with its Attributes can be represented as:
4
Relationship Type and Relationship Set
A Relationship Type represents the association between entity types. For example, ‘Enrolled in’ is a
relationship type that exists between entity type Student and Course. In ER diagram, the relationship
type is represented by a diamond and connecting the entities with lines.
Entity-Relationship Set
A set of relationships of the same type is known as a relationship set. The following relationship set
depicts S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered in C3.
Relationship Set
The number of different entity sets participating in a relationship set is called the degree of a
relationship set.
1. Unary Relationship: When there is only ONE entity set participating in a relation, the relationship
is called a unary relationship. For example, one person is married to only one person.
Unary Relationship
2. Binary Relationship: When there are TWO entities set participating in a relationship, the
relationship is called a binary relationship. For example, a Student is enrolled in a Course.
Binary Relationship
3. n-ary Relationship: When there are n entities set participating in a relation, the relationship is
called an n-ary relationship.
5
Cardinality
The number of times an entity of an entity set participates in a relationship set is known as cardinality.
Cardinality can be of different types:
1. One-to-One: When each entity in each entity set can take part only once in the relationship, the
cardinality is one-to-one. Let us assume that a male can marry one female and a female can marry
one male. So the relationship will be one-to-one.
2. One-to-Many: In one-to-many mapping as well where each entity can be related to more than one
relationship and the total number of tables that can be used in this is 2. Let us assume that one surgeon
deparment can accomodate many doctors. So the Cardinality will be 1 to M. It means one deparment
has many Doctors.
6
Using sets, one-to-many cardinality can be represented as:
3. Many-to-One: When entities in one entity set can take part only once in the relationship set and
entities in other entity sets can take part more than once in the relationship set, cardinality is many to
one. Let us assume that a student can take only one course but one course can be taken by many
students. So the cardinality will be n to 1. It means that for one course there can be n students but for
one student, there will be only one course.
In this case, each student is taking only 1 course but 1 course has been taken by many students.
4. Many-to-Many: When entities in all entity sets can take part more than once in the relationship
cardinality is many to many. Let us assume that a student can take more than one course and one
course can be taken by many students. So the relationship will be many to many.
7
the total number of tables that can be used in this is 3.
In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled by S1, S3, and S4. So
it is many-to-many relationships.
The very first step is Identifying all the Entities, and place them in a Rectangle, and labeling them
accordingly.
The next step is to identify the relationship between them and pace them accordingly using the
Diamond, and make sure that, Relationships are not connected to each other.
8
Types of Keys in Relational Model (Candidate, Super, Primary, Alternate
and Foreign)
Keys are one of the basic requirements of a relational database model. It is widely used to identify
the tuples(rows) uniquely in the table. We also use keys to set up relations amongst various columns
and tables of a relational database.
2. Primary Key
3. Super Key
4. Alternate Key
5. Foreign Key
6. Composite Key
1. Candidate Key: The minimal set of attributes that can uniquely identify a tuple is known as a
candidate key. For Example, STUD_NO in STUDENT relation.
A table can have multiple candidate keys but only one primary key.
The value of the Candidate Key is unique and may be null for a tuple.
Example:
9
The candidate key can be simple (having only one attribute) or composite as well.
Example:
Table STUDENT_COURSE
Note: In SQL Server a unique constraint that has a nullable column, allows the value ‘null‘ in that
column only once. That’s why the STUD_PHONE attribute is a candidate here, but can not be a
‘null’ value in the primary key attribute.
2. Primary Key: There can be more than one candidate key in relation out of which one can be
chosen as the primary key. For Example, STUD_NO, as well as STUD_PHONE, are candidate keys
for relation STUDENT but STUD_NO can be chosen as the primary key (only one out of many
candidate keys).
It is a unique key.
It cannot be NULL.
Primary keys are not necessarily to be a single column; more than one column can also be a
primary key for a table.
Example:
Table STUDENT
3. Super Key: The set of attributes that can uniquely identify a tuple is known as Super Key. For
Example, STUD_NO, (STUD_NO, STUD_NAME), etc. A super key is a group of single or multiple
keys that identifies rows in a table. It supports NULL values.
Adding zero or more attributes to the candidate key generates the super key.
10
Super Key values may also be NULL.
Example:
4. Alternate Key: The candidate key other than the primary key is called an alternate key.
All the keys which are not primary keys are called alternate keys.
It is a secondary key.
Example:
5. Foreign Key: If an attribute can only take the values which are present as values of some other
attribute, it will be a foreign key to the attribute to which it refers. The relation which is being
referenced is called referenced relation and the corresponding attribute is called referenced attribute
the relation which refers to the referenced relation is called referencing relation and the corresponding
attribute is called referencing attribute. The referenced attribute of the referenced relation should be
the primary key to it.
11
It is a key it acts as a primary key in one table and it acts as
secondary key in another table.
For example, DNO is a primary key in the DEPT table and a non-key in EMP
Example:
Table STUDENT_COURSE
It may be worth noting that, unlike the Primary Key of any given relation, Foreign Key can be NULL
as well as may contain duplicate tuples i.e. it need not follow uniqueness constraint. For Example,
STUD_NO in the STUDENT_COURSE relation is not unique. It has been repeated for the first and
third tuples. However, the STUD_NO in STUDENT relation is a primary key and it needs to be
always unique, and it cannot be null.
6. Composite Key: Sometimes, a table might not have a single column/attribute that uniquely
identifies all the records of a table. To uniquely identify rows of a table, a combination of two or
more columns/attributes can be used. It still can give duplicate values in rare cases. So, we need to
find the optimal set of attributes that can uniquely identify rows in a table.
Different combinations of attributes may give different accuracy in terms of identifying the
rows uniquely.
Example:
12
FULLNAME + DOB can be combined
together to access the details of a student.
13
Functional Dependency and Attribute Closure
A functional dependency A->B in a relation holds if two tuples having same value of attribute A
also have same value for attribute B. For Example, in relation STUDENT shown in table 1,
Functional Dependencies
but
Functional Dependencies in a relation are dependent on the domain of the relation. Consider the
STUDENT relation given in Table 1.
Functional Dependency Set: Functional Dependency set or FD set of a relation is the set of all
FDs present in the relation. For Example, FD set for relation STUDENT shown in table 1 is:
Attribute Closure: Attribute closure of an attribute set can be defined as set of attributes which can
be functionally determined from it.
14
Recursively add elements to the result set which can be functionally determined from the elements
of the result set.
How to find Candidate Keys and Super Keys using Attribute Closure?
If attribute closure of an attribute set contains all attributes of relation, the attribute set will be
super key of the relation.
If no subset of this attribute set can functionally determine all attributes of the relation, the set will
be candidate key as well. For Example, using FD set of table 1,
(STUD_NO, STUD_NAME) will be super key but not candidate key because its subset
(STUD_NO)+ is equal to all attributes of the relation. So, STUD_NO will be a candidate key.
GATE Question: Consider the relation scheme R = {E, F, G, H, I, J, K, L, M, N} and the set of
functional dependencies {{E, F} -> {G}, {F} -> {I, J}, {E, H} -> {K, L}, K -> {M}, L -> {N} on
R. What is the key for R? (GATE-CS-2014)
A. {E, F}
B. {E, F, H}
C. {E, F, H, K, L}
D. {E}
Attributes which are parts of any candidate key of relation are called as prime attribute, others are
non-prime attributes. For Example, STUD_NO in STUDENT relation is prime attribute, others are
non-prime attribute.
Answer: (AE)+ = {ABECD} which is not set of all attributes. So AE is not a candidate key. Hence
option A and B are wrong.
(AEH)+ = {ABCDEH}
(BEH)+ = {BEHCDA}
(BCH)+ = {BCHDA} which is not set of all attributes. So BCH is not a candidate key. Hence
option C is wrong.
So correct answer is D.
They help in reducing data redundancy in a database by identifying and eliminating unnecessary
or duplicate data.
They improve data integrity by ensuring that data is consistent and accurate across the database.
They facilitate database maintenance by making it easier to modify, update, and delete data.
The process of identifying functional dependencies can be time-consuming and complex, especially
in large databases with many tables and relationships.
Overly restrictive functional dependencies can result in slow query performance or data
inconsistencies, as data that should be related may not be properly linked.
16
Functional dependencies do not take into account the semantic meaning of data, and may not
always reflect the true relationships between data elements.
Attribute closures help to identify all possible attributes that can be derived from a set of given
attributes.
They facilitate database design by identifying relationships between attributes and tables, which
can help to optimize query performance.
They ensure data consistency by identifying all possible combinations of attributes that can exist in
the database.
The process of calculating attribute closures can be computationally expensive, especially for large
datasets.
Attribute closures can become too complex to manage, especially as the number of attributes and
tables in a database grows.
Attribute closures do not take into account the semantic meaning of data, and may not always
accurately reflect the relationships between data elements.
17
Introduction of Database Normalization
Database normalization is the process of organizing the attributes of the database to reduce or
eliminate data redundancy (having the same data but at different places).
Problems because of data redundancy: Data redundancy unnecessarily increases the size of the
database as the same data is repeated in many places. Inconsistency problems also arise during
insert, delete and update operations.
For example, employee_id → name means employee_id functionally determines the name of the
employee. As another example in a timetable database, {student_id, time} → {lecture_room},
student ID and time determine the lecture room where the student should be.
A function dependency A → B means for all instances of a particular value of A, there is the same
value of B. For example in the below table A → B is true, but B → A is not true as there are
different values of A for B = 3.
A B
------
1 3
2 3
4 0
1 3
4 0
Examples
ABC → AB
ABC → A
ABC → ABC
18
Example:
Id → Name,
Name → DOB
Examples:
AB → BC,
AD → DC
The features of database normalization are as follows:
Elimination of Data Redundancy: One of the main features of normalization is to eliminate the
data redundancy that can occur in a database. Data redundancy refers to the repetition of data in
different parts of the database. Normalization helps in reducing or eliminating this redundancy,
which can improve the efficiency and consistency of the database.
Ensuring Data Consistency: Normalization helps in ensuring that the data in the database is
consistent and accurate. By eliminating redundancy, normalization helps in preventing
inconsistencies and contradictions that can arise due to different versions of the same data.
Improved Database Design: Normalization helps in improving the overall design of the database.
By organizing the data in a structured and systematic way, normalization makes it easier to design
and maintain the database. It also makes the database more flexible and adaptable to changing
business needs.
Avoiding Update Anomalies: Normalization helps in avoiding update anomalies, which can occur
when updating a single record in a table affects multiple records in other tables. Normalization
ensures that each table contains only one type of data and that the relationships between the tables
are clearly defined, which helps in avoiding such anomalies.
Standardization: Normalization helps in standardizing the data in the database. By organizing the
data into tables and defining relationships between them, normalization helps in ensuring that the
data is stored in a consistent and uniform manner.
Normalization is an important process in database design that helps in improving the efficiency,
consistency, and accuracy of the database. It makes it easier to manage and maintain the data and
ensures that the database is adaptable to changing business needs.
19
the redundancy in relations. Normal forms are used to eliminate or reduce redundancy in database
tables.
First Normal Form (1NF): This is the most basic level of normalization. In 1NF, each table cell
should contain only a single value, and each column should have a unique name. The first normal
form helps to eliminate duplicate data and simplify queries.
Second Normal Form (2NF): 2NF eliminates redundant data by requiring that each non-key
attribute be dependent on the primary key. This means that each column should be directly related
to the primary key, and not to other columns.
Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key attributes are
independent of each other. This means that each column should be directly related to the primary
key, and not to any other columns in the same table.
Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures that each
determinant in a table is a candidate key. In other words, BCNF ensures that each non-key attribute
is dependent only on the candidate key.
Fourth Normal Form (4NF): 4NF is a further refinement of BCNF that ensures that a table does not
contain any multi-valued dependencies.
Fifth Normal Form (5NF): 5NF is the highest level of normalization and involves decomposing a
table into smaller tables to remove data redundancy and improve data integrity.
Normal forms help to reduce data redundancy, increase data consistency, and improve database
performance. However, higher levels of normalization can lead to more complex database designs
and queries. It is important to strike a balance between normalization and practicality when designing
a database.
Improved data consistency: Normalization ensures that data is stored in a consistent and
organized manner, reducing the risk of data inconsistencies and errors.
Simplified database design: Normalization provides guidelines for organizing tables and data
relationships, making it easier to design and maintain a database.
Improved query performance: Normalized tables are typically easier to search and retrieve data
from, resulting in faster query performance.
20
Easier database maintenance: Normalization reduces the complexity of a database by breaking it
down into smaller, more manageable tables, making it easier to add, modify, and delete data.
Overall, using normal forms in DBMS helps to improve data quality, increase database efficiency,
and simplify database design and maintenance.
Example 2 –
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
In the above table Course is a multi-valued attribute so it is not in 1NF. Below Table is in
1NF as there is no multi-valued attribute
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
21
candidate key of the table. Partial Dependency – If the proper subset of candidate key determines
non-prime attribute, it is called partial dependency.
{Note that, there are many courses having the same course fee} Here, COURSE_FEE cannot alone
decide the value of COURSE_NO or STUD_NO; COURSE_FEE together with STUD_NO cannot decide
the value of COURSE_NO; COURSE_FEE together with COURSE_NO cannot decide the value of
STUD_NO; Hence, COURSE_FEE would be a non-prime attribute, as it does not belong to the one
only candidate key {STUD_NO, COURSE_NO} ; But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE
is dependent on COURSE_NO, which is a proper subset of the candidate key. Non-prime attribute
COURSE_FEE is dependent on a proper subset of the candidate key, which is a partial dependency
and so this relation is not in 2NF. To convert the above relation to 2NF, we need to split the table
into two tables such as : Table 1: STUD_NO, COURSE_NO Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
NOTE: 2NF tries to reduce the redundant data getting stored in memory. For instance, if there are
100 students taking C1 course, we don’t need to store its Fee as 1000 for all the 100 records,
instead, once we can store it in the second table as the course fee for C1 is 1000.
In the above relation, AB is the only candidate key and there is no partial dependency, i.e., any proper
subset of AB doesn’t determine any non-prime attribute.
X is a super key.
Y is a prime attribute (each element of Y is part of some candidate key).
For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true.
22
To convert it in third normal form, we will decompose the relation STUDENT (STUD_NO,
STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)
Consider relation R(A, B, C, D, E) A -> BC, CD -> E, B -> D, E -> A All possible candidate keys in
above relation are {A, E, CD, BC} All attributes are on right sides of all functional dependencies are
prime.
Example 2: Find the highest normal form of a relation R(A,B,C,D,E) with FD set as {BC->D, AC-
>BE, B->E}
Step 1: As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can determine all attribute of
relation, So AC will be candidate key. A or C can’t be derived from any other attribute of the relation,
so there will be only 1 candidate key {AC}.
Step 2: Prime attributes are those attributes that are part of candidate key {A, C} in this example and
others will be non-prime {B, D, E} in this example.
Step 3: The relation R is in 1st normal form as a relational DBMS does not allow multi-valued or
composite attribute. The relation is in 2nd normal form because BC->D is in 2nd normal form (BC
is not a proper subset of candidate key AC) and AC->BE is in 2nd normal form (AC is candidate
key) and B->E is in 2nd normal form (B is not a proper subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is a prime
attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to satisfy 3rd normal
for, either LHS of an FD should be super key or RHS should be prime attribute. So the highest normal
form of relation will be 2nd Normal form.
For example consider relation R(A, B, C) A -> BC, B -> A and B both are super keys so above
relation is in BCNF.
Below mentioned is the basic condition that must be hold in the non-trivial functional dependency
X -> Y:
X is a Super Key.
Y is a Prime Attribute ( this means that element of Y is some part of Candidate Key).
23
We have to focus on some basic rules that are for BCNF:
1. It must be in BCNF.
2. It does not have any multi-valued dependency.
24
Applications of Normal Forms in DBMS
Data consistency: Normal forms ensure that data is consistent and does not contain any redundant
information. This helps to prevent inconsistencies and errors in the database.
Data redundancy: Normal forms minimize data redundancy by organizing data into tables that
contain only unique data. This reduces the amount of storage space required for the database and
makes it easier to manage.
Query performance: Normal forms can improve query performance by reducing the number of
joins required to retrieve data. This helps to speed up query processing and improve overall system
performance.
Database maintenance: Normal forms make it easier to maintain the database by reducing the
amount of redundant data that needs to be updated, deleted, or modified. This helps to improve
database management and reduce the risk of errors or inconsistencies.
Database design: Normal forms provide guidelines for designing databases that are efficient,
flexible, and scalable. This helps to ensure that the database can be easily modified, updated, or
expanded as needed.
If all attributes of relation are prime attribute, then the relation is always in 3NF.
If a Relation has only singleton candidate keys( i.e. every candidate key consists of only 1 attribute),
then the Relation is always in 2NF( because no Partial functional dependency possible).
Sometimes going for BCNF form may not preserve functional dependency. In that case go for BCNF
only if the lost FD(s) is not required, else normalize till 3NF only.
There are many more Normal forms that exist after BCNF, like 4NF and more. But in real world
database systems it’s generally not required to go beyond BCNF.
25