Unit 2
Unit 2
Roles
Entity sets of a relationship need not be distinct
The labels “manager” and “worker” are called roles; they specify how employee entities
interact via the works-for relationship set.
Roles are indicated in E-R diagrams by labeling the lines that connect diamonds to rectangles.
Role labels are optional, and are used to clarify semantics of the Relationship
Cardinality Constraints
We express cardinality constraints by drawing either a directed line (→), signifying “one,” or an
undirected line (—), signifying “many,” between the relationship set and the entity set.
E.g.: One-to-one relationship:
A customer is associated with at most one loan via the relationship borrower
A loan is associated with at most one customer via borrower
2
One-To-Many Relationship
In the one-to-many relationship a loan is associated with at most one customer via borrower, a
customer is associated with several (including 0) loans via borrower
Many-To-One Relationships
In a many-to-one relationship a loan is associated with several (including 0) customers via
borrower, a customer is associated with at most one loan via borrower
Many-To-Many Relationship
Keys
3
A super key of an entity set is a set of one or more attributes whose values uniquely determine
each entity.
A candidate key of an entity set is a minimal super key
Customer-id is candidate key of customer
account-number is candidate key of account
Although several candidate keys may exist, one of the candidate keys is selected to be the
primary key.
Keys for Relationship Sets
The combination of primary keys of the participating entity sets forms a super key of a
relationship set.
(customer-id, account-number) is the super key of depositor
NOTE: this means a pair of entity sets can have at most onerelationship in a particular
relationship set.
# E.g. if we wish to track all access-dates to each account by each customer, we cannot
assume a relationship for each access. We can use a multivalued attribute though
Must consider the mapping cardinality of the relationship set when deciding what are the
candidate keys
Need to consider semantics of relationship set in selecting the primary key in case of more than
one candidate key
E-R Diagram with a Ternary Relationship
Design Issues
Use of entity sets vs. attributes
Choice mainly depends on the structure of the enterprise being modeled, and on the
semantics associated with the attribute in question.
Use of entity sets vs. relationship sets
Possible guideline is to designate a relationship set to describe an action that occurs
between entities
Binary versus n-ary relationship sets
Although it is possible to replace any nonbinary (n-ary, for n > 2) relationship set by a
number of distinct binary relationship sets, a nary relationship set shows more clearly that
several entities participate in a single relationship.
Placement of relationship attributes
Binary Vs. Non-Binary Relationships
Some relationships that appear to be non-binary may be better represented using binary
relationships
E.g. A ternary relationship parents, relating a child to his/her father and mother, is best replaced
by two binary relationships, father and mother
# Using two binary relationships allows partial information (e.g. only mother being
know)
But there are some relationships that are naturally non-binary
# E.g. works-on
Converting Non-Binary Relationships to Binary Form
In general, any non-binary relationship can be represented using binary relationships by creating
an artificial entity set.
4
Replace R between entity sets A, B and C by an entity set E, and three relationship sets:
1. RA, relating E and A
2.RB, relating E and B
3. RC, relating E and C
Create a special identifying attribute for E
Add any attributes of R to E
For each relationship (ai , bi , ci) in R, create
1. a new entity ei in the entity set E
2. add (ei , ai ) to RA
3. add (ei , bi ) to RB 4. add (ei , ci ) to RC
5
Attribute inheritance – a lower-level entity set inherits all the attributes and relationship
participation of the higher-level entity set to which it is linked.
Generalization
A bottom-up design process – combine a number of entity sets that share the same features into a
higher-level entity set.
Specialization and generalization are simple inversions of each other; they are represented in an
E-R diagram in the same way.
The terms specialization and generalization are used interchangeably.
Can have multiple specializations of an entity set based on different features.
! E.g. permanent-employee vs. temporary-employee, in addition to
officer vs. secretary vs. teller
! Each particular employee would be
a member of one of permanent-employee or temporary-employee,
and also a member of one of officer, secretary, or teller
! The ISA relationship also referred to as superclass – subclass Relationship
Design Constraints on a Specialization/Generalization
Constraint on which entities can be members of a given lower-level entity set.
Condition-defined
# E.g. all customers over 65 years are members of seniorcitizenentity set; senior-
citizen ISA person.
User-defined
Constraint on whether or not entities may belong to more than one lower-level
entity set within a single generalization.
Disjoint
# an entity can belong to only one lower-level entity set
# Noted in E-R diagram by writing disjoint next to the ISA triangle
Overlapping
# an entity can belong to more than one lower-level entity set
Completeness constraint -- specifies whether or not an entity in the higher-level entity set
must belong to at least one of the lower-level entity sets within a generalization.
total : an entity must belong to one of the lower-level entity sets
partial: an entity need not belong to one of the lower-level entity Sets
Aggregation
Consider the ternary relationship works-on, which we saw earlier
Suppose we want to record managers for tasks performed by an employee at a branch
6
Relationship sets works-on and manages represent overlapping information
Every manages relationship corresponds to a works-on relationship
However, some works-on relationships may not correspond to any manages relationships
# So we can’t discard the works-on relationship
Eliminate this redundancy via aggregation
Treat relationship as an abstract entity
Allows relationships between relationships
Abstraction of relationship into new entity
Without introducing redundancy, the following diagram represents:
An employee works on a particular job at a particular branch
An employee, branch, job combination may have an associated manager
7
E-R Diagram for a Banking Enterprise
8
NORMALIZATION
The process of restructuring / adjusting / organizing the logical data model of a database
is termed as Normalization or Decomposition of a Relation.
Normalization usually involves dividing larger tables into smaller tables and defining the
relationship between them.
The normal forms provides criteria for determining a table’s degree of vulnerability to
logical database inconsistencies and anomalies.
Higher the Normal Form, lesser is the vulnerability towards inconsistency and anomalies.
NEED FOR NORMALIZATION
To overcome the pitfalls of a poor database design such as redundancy, dependency and
anomalies, normalization of complex relations is performed.
9
Normalization is basically performed to
– Eliminate Redundancy
– Reduce Anomalies
– Organize data efficiently for faster retrieval
– To improve the overall performance of the DBMS.
The Normalization was first proposed by Edgar F. Codd in early 70’s. Hence normal forms are
usually called as Codd’s Laws
The above example is not in 1NF because it does not contain atomic values. Also It contains
multiple values for an Attribute.
Solution:
One Solution to the above problem is to have individual rows of multiple values.
User_ID can no more act as a primary key, Since it does not uniquely identify a row.
This brings extra redundancy even thou’ atomicity is achieved.
User_ID Name Phone Email ID
101 Arjun 403-555-1717 [email protected]
101 Arjun 403-555-1919 [email protected]
125 Ajay 403-555-1919 [email protected]
125 Ajay NULL [email protected]
143 Arav 403-555-1919 [email protected]
143 arav 403-555-1111 NULL
---------------------------------------------------------------------------------------------------------------------
10
SECOND NORMAL FORM (2NF)
A relation R is in second normal form (2NF) if and only if the relation R is in 1NF and every
non-prime attribute is fully functionally dependent on the primary key of R.
No Partial Dependency should exist.
FUNCTIONAL DEPENDENCY (FD)
Full Functional Dependency
A functional dependency X Y is said to be fully functional dependent if an attribute
is removed from the key, Y cannot be determined. i.e., X – {A} Y
Partial Dependency
A functional dependency X Y is said to be partial, if even an attribute is removed
from the key, Y can be determined. i.e., X – {A} Y
Example:
• Attribute Grade is fully functionally dependent on the primary key (Sid, Course-
id) because both parts of the primary keys are needed to determine Grade.
• On the other hand both Sname, and Phone attributes are not fully functionally dependent
on the primary key, because only a part of the primary key namely Sid is enough to
determine both Sname and Phone.
• Similarly the attributes Credit-hours and Course-Description are not fully functionally
dependent on the primary key because only Course-id is enough to determine their
values.
• Hence Sname, Phone, C-h, C-D are partially dependent .
Decomposition into 2NF:
• Student_details(Sid, Sname, Phone)
• Grade_details (Sid, Course-id, Grade)
• Course_details( Course-id, Course-description, Course-hours)
Example2:
Let us consider an inventory relation as shown below,
Inventory (supplier_no, status, city, part_no, quantity)
11
Functional Dependencies:
• (supplier_no, part_no) quantity (Composite Primary Key)
• (supplier_no) status
• (supplier_no) city
• city status (Supplier's status is determined by location)
Comments:
• Non-key attributes are not mutually independent (city status).
• Non-key attributes are not fully functionally dependent on the primary key (i.e., status
and city are dependent on just part of the key, namely supplier_no).
The above relation is in 1NF but not in 2NF so it is decomposed into two relations as,
• SUPPLIER_DETAIL (supplier_no, status, city)
• SUPPLIER_PART (supplier_no, part_no, quantity) ----- Now in 2NF
---------------------------------------------------------------------------------------------------------------------
Third Normal Form (3NF)
A relation R is in third normal form (3NF) if and only if the relation R is in 2NF and
every non-key attribute is non-transitively dependent on the primary key.
Eliminates Transitive Dependency
A relation R with more than one candidate key will clearly have transitive dependencies
of the form: primary_key other_candidate_key any_non-key_column
A relation R having just one candidate key is in third normal form (3NF) if and only if
the non-prime attributes of R (if any) are:
1) Mutually independent, and
2) Fully functionally dependent on the primary key of R.
Example (2NF but not 3NF):
SUPPLIER_DETAIL (supplier_no, status, city)
• Functional Dependencies:
– supplier_no status
– supplier_no city
– city status
Comments:
– Lacks mutual independence among non-key attributes. (supplier_no city, city
status.)
Decomposition (into 3NF):
– SUPPLIER_CITY (supplier_no, city)
– CITY_STATUS (city, status)
---------------------------------------------------------------------------------------------------------------------
BOYCE CODD NORMAL FORM (BCNF)
A relation R is in Boyce-Codd Normal Form (BCNF) if and only if every determinant is a
candidate key
Eliminates Transitive Dependency
The definition of 3NF does not deal with a relation that has multiple candidate keys,
where
o those candidate keys are composite, and
o the candidate keys overlap (i.e., have at least one common attribute)
Example (3NF but not BCNF)
SUPPLIER_PART (supplier_no, supplier_name, part_no, quantity)
Functional Dependencies:
We assume that supplier_name's are always unique to each supplier. Thus we have two
candidate keys:
(supplier_no, part_no) and (supplier_name, part_no)
Thus we have the following dependencies:
12
o (supplier_no, part_no) quantity
o (supplier_no, part_no) supplier_name
o (supplier_name, part_no) quantity
o (supplier_name, part_no) supplier_no
o supplier_name supplier_no
o supplier_no supplier_name
Decomposition (into BCNF):
o SUPPLIER_ID (supplier_no, supplier_name)
o SUPPLIER_PARTS (supplier_no, part_no, quantity)
---------------------------------------------------------------------------------------------------------------------
Fourth Normal Form (4NF)
A relation is said to be in 4NF if and only if it is in 3NF and no multi-valued
dependencies exists in the relation.
Eliminates Multivalued Dependency
A Multi-valued Dependency (MVD) exists when there exists at least three attributes
A,B and C in a relation, such that,
For each value of A there is a well-defined set of values for B and well-defined
set of values for C
But the set of values of B is independent of set C
A multi-determines B (B is multi-dependent on A) A B if and only if the set of B
values matching an (A, C) pair depends only on the A value.
Example:
employee Skills
employee Hobby
Employee Skills Hobby
1 Programming Golf
1 Programming Bowling
1 Analysis Golf
1 Analysis Bowling
2 Analysis Golf
2 Analysis Gardening
2 Management Golf
2 Management Gardening
Decomposition: Table2: Hobbies(employee, hobby)
Table1: Skills (employee, skills) Employee hobby
Employee Skills 1 Golf
1 Programming 1 Bowling
1 Analysis 2 Golf
2 Analysis 2 Gardening
2 Management
---------------------------------------------------------------------------------------------------------------------
13
Fifth Normal Form (5NF) or Project Join Normal Form (PJNF)
A table is in Fifth Normal Form (5NF) or Project-Join Normal Form (PJNF) if it is in 4NF and it
cannot have a lossless decomposition into any number of smaller tables.
When Projection and Join operation is performed on 2 or more tables, it should not create spurious
tuples, i.e irrelavant tuples that are not present in the base table.
Eliminates Join Dependency
Example
SUBJECT LECTURER SEMESTER
Computer Anshika Semester 1
Computer John Semester 1
Math John Semester 1
Math Akash Semester 2
Chemistry Praveen Semester 1
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class
for Semester 2. In this case, combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that
subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we can't
leave other two columns blank. So to make the above table into 5NF, we can decompose it into three relations P1,
P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Anomalies
Anomalies in DBMS develops when not all of the required changes in the redundant data are made
successfully. They are of three types: insert, delete and update.
INSERT Anomaly in Database
14
An Insert Anomaly occurs when attributes cannot be inserted into the database without the presence of
other attributes. Usually when a child is inserted without parent.
Jerry is a new Student with department id 6. There is no Department with this Dept_ID 6. Hence, the
anomaly. The usual behaviour should be a new department id with 6 and only then Student could have
it.
An insertion anomaly is the inability to add data to the database due to absence of other data.
UPDATE Anomaly in Database
When duplicated data is updated at one instance and not across all instances where it was duplicated.
That’s an update anomaly . See below English department has now Dept_ID 8 , but unfortunately it was
not updated in Student table.
DELETE Anomaly in Database
Now if someone decides to delete Computer Science department , he may end up deleting all student’s
data who had the department of Computer Science. So to say deletion of some attribute which causes
deletion of other attributes is deletion anomaly.
These anomalies are addressed by Normalization.
15