0% found this document useful (0 votes)
8 views15 pages

Unit 2

The document covers database design principles, focusing on the Entity-Relationship (E-R) model, E-R diagrams, and normalization processes. It explains key concepts such as entity sets, attributes, relationship sets, and various normal forms to eliminate redundancy and anomalies in database design. Additionally, it discusses advanced E-R features like specialization, generalization, and aggregation, along with design decisions and the importance of normalization in creating efficient databases.

Uploaded by

santha.naga63
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views15 pages

Unit 2

The document covers database design principles, focusing on the Entity-Relationship (E-R) model, E-R diagrams, and normalization processes. It explains key concepts such as entity sets, attributes, relationship sets, and various normal forms to eliminate redundancy and anomalies in database design. Additionally, it discusses advanced E-R features like specialization, generalization, and aggregation, along with design decisions and the importance of normalization in creating efficient databases.

Uploaded by

santha.naga63
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

UNIT II DATABASE DESIGN

Entity-Relationship model – E-R Diagrams – Enhanced-ER Model – ER-to-Relational Mapping


– Functional Dependencies – Non-loss Decomposition – First, Second, Third Normal Forms,
Dependency Preservation – Boyce/Codd Normal Form – Multi-valued Dependencies and Fourth
Normal Form – Join Dependencies and Fifth Normal Form
Entity – Relationship Model
Entity-Relationship Model
Entity Sets
A database can be modeled as:
a collection of entities, relationship among entities.
An entity is an object that exists and is distinguishable from other objects.
Example: specific person, company, event, plant
Entities have attributes
Example: people have names and addresses
An entity set is a set of entities of the same type that share the same properties.
Example: set of all persons, companies, trees, holidays
Attributes
An entity is represented by a set of attributes that is descriptive properties possessed by all
members of an entity set.
Example:
customer = (customer-id, customer-name, customer-street, customer-city)
loan = (loan-number, amount)
Domain – the set of permitted values for each attribute
Attribute types:
Simple and composite attributes.
Single-valued and multi-valued attributes
# E.g. multivalued attribute: phone-numbers
Derived attributes
Can be computed from other attributes
E.g. age, given date of birth
Relationship Sets
A relationship set is collection of relationships that share the same characteristics.
A relationship is an association among several entities
Example:
Hayes depositor A-102
customer entity relationship set account entity
A relationship set is a mathematical relation among n ≥ 2 entities, each taken from entity sets
{(e1, e2, … en) | e1 E1, e2 E2, …, en En}
where (e1, e2, …, en) is a relationship
Degree of a Relationship Set
Refers to number of entity sets that participate in a relationship set.
Relationship sets that involve two entity sets are binary (or degree two). Generally, most
relationship sets in a database system are binary.
Relationship sets may involve more than two entity sets.
Relationships between more than two entity sets are rare. Most relationships are binary.
Mapping Cardinalities
Express the number of entities to which another entity can be associated via a relationship set.
Most useful in describing binary relationship sets.
For a binary relationship set the mapping cardinality must be one of the following types:
One to one
One to many
1
Many to one
Many to many
E-R Diagrams

Rectangles represent entity sets.


Diamonds represent relationship sets.
Lines link attributes to entity sets and entity sets to relationship sets.
Ellipses represent attributes
Double ellipses represent multivalued attributes.
Dashed ellipses denote derived attributes.
Underline indicates primary key attributes
E-R Diagram With Composite, Multivalued, and Derived Attributes

Roles
Entity sets of a relationship need not be distinct
The labels “manager” and “worker” are called roles; they specify how employee entities
interact via the works-for relationship set.
Roles are indicated in E-R diagrams by labeling the lines that connect diamonds to rectangles.
Role labels are optional, and are used to clarify semantics of the Relationship

Cardinality Constraints
We express cardinality constraints by drawing either a directed line (→), signifying “one,” or an
undirected line (—), signifying “many,” between the relationship set and the entity set.
E.g.: One-to-one relationship:
A customer is associated with at most one loan via the relationship borrower
A loan is associated with at most one customer via borrower

2
One-To-Many Relationship
In the one-to-many relationship a loan is associated with at most one customer via borrower, a
customer is associated with several (including 0) loans via borrower

Many-To-One Relationships
In a many-to-one relationship a loan is associated with several (including 0) customers via
borrower, a customer is associated with at most one loan via borrower

Many-To-Many Relationship

A customer is associated with several (possibly 0) loans via borrower


A loan is associated with several (possibly 0) customers via borrower

Participation of an Entity Set in a Relationship Set


Total participation (indicated by double line): every entity in the entity set participates in at least
one relationship in the relationship set
E.g. participation of loan in borrower is total every loan must have a customer associated to it
via borrower
Partial participation: some entities may not participate in any relationship in the relationship set
E.g. participation of customer in borrower is partial

Keys
3
A super key of an entity set is a set of one or more attributes whose values uniquely determine
each entity.
A candidate key of an entity set is a minimal super key
Customer-id is candidate key of customer
account-number is candidate key of account
Although several candidate keys may exist, one of the candidate keys is selected to be the
primary key.
Keys for Relationship Sets
The combination of primary keys of the participating entity sets forms a super key of a
relationship set.
(customer-id, account-number) is the super key of depositor
NOTE: this means a pair of entity sets can have at most onerelationship in a particular
relationship set.
# E.g. if we wish to track all access-dates to each account by each customer, we cannot
assume a relationship for each access. We can use a multivalued attribute though
Must consider the mapping cardinality of the relationship set when deciding what are the
candidate keys
Need to consider semantics of relationship set in selecting the primary key in case of more than
one candidate key
E-R Diagram with a Ternary Relationship

Design Issues
 Use of entity sets vs. attributes
Choice mainly depends on the structure of the enterprise being modeled, and on the
semantics associated with the attribute in question.
 Use of entity sets vs. relationship sets
Possible guideline is to designate a relationship set to describe an action that occurs
between entities
 Binary versus n-ary relationship sets
Although it is possible to replace any nonbinary (n-ary, for n > 2) relationship set by a
number of distinct binary relationship sets, a nary relationship set shows more clearly that
several entities participate in a single relationship.
 Placement of relationship attributes
Binary Vs. Non-Binary Relationships
Some relationships that appear to be non-binary may be better represented using binary
relationships
E.g. A ternary relationship parents, relating a child to his/her father and mother, is best replaced
by two binary relationships, father and mother
# Using two binary relationships allows partial information (e.g. only mother being
know)
But there are some relationships that are naturally non-binary
# E.g. works-on
Converting Non-Binary Relationships to Binary Form
In general, any non-binary relationship can be represented using binary relationships by creating
an artificial entity set.

4
Replace R between entity sets A, B and C by an entity set E, and three relationship sets:
1. RA, relating E and A
2.RB, relating E and B
3. RC, relating E and C
Create a special identifying attribute for E
Add any attributes of R to E
For each relationship (ai , bi , ci) in R, create
1. a new entity ei in the entity set E
2. add (ei , ai ) to RA
3. add (ei , bi ) to RB 4. add (ei , ci ) to RC

Also need to translate constraints


Translating all constraints may not be possible
There may be instances in the translated schema that cannot correspond to any instance
of R
# Exercise: add constraints to the relationships RA, RB and RC to ensure that a newly created
entity corresponds to exactly one entity in each of entity sets A, B and .
We can avoid creating an identifying attribute by making E a weak entity set (described shortly)
identified by the three relationship sets
Weak Entity Sets
An entity set that does not have a primary key is referred to as a weak entity set.
The existence of a weak entity set depends on the existence of a identifying entity set
it must relate to the identifying entity set via a total, one-to-many relationship set from
the identifying to the weak entity set
Identifying relationship depicted using a double diamond
The discriminator (or partial key) of a weak entity set is the set of attributes that distinguishes
among all the entities of a weak entity set.
The primary key of a weak entity set is formed by the primary key of the strong entity set on
which the weak entity set is existence dependent, plus the weak entity set’s discriminator.
We depict a weak entity set by double rectangles.
We underline the discriminator of a weak entity set with a dashed line.
payment-number – discriminator of the payment entity set
Primary key for payment – (loan-number, payment-number)

Extended E-R Features


Specialization
Top-down design process; we designate subgroupings within an entity set that are distinctive
from other entities in the set.
These subgroupings become lower-level entity sets that have attributes or participate in
relationships that do not apply to the higher-level entity set.
Depicted by a triangle component labeled ISA (E.g. customer “is a” person).

5
Attribute inheritance – a lower-level entity set inherits all the attributes and relationship
participation of the higher-level entity set to which it is linked.

Generalization
A bottom-up design process – combine a number of entity sets that share the same features into a
higher-level entity set.
Specialization and generalization are simple inversions of each other; they are represented in an
E-R diagram in the same way.
The terms specialization and generalization are used interchangeably.
Can have multiple specializations of an entity set based on different features.
! E.g. permanent-employee vs. temporary-employee, in addition to
officer vs. secretary vs. teller
! Each particular employee would be
a member of one of permanent-employee or temporary-employee,
and also a member of one of officer, secretary, or teller
! The ISA relationship also referred to as superclass – subclass Relationship
Design Constraints on a Specialization/Generalization
Constraint on which entities can be members of a given lower-level entity set.
Condition-defined
# E.g. all customers over 65 years are members of seniorcitizenentity set; senior-
citizen ISA person.
User-defined
Constraint on whether or not entities may belong to more than one lower-level
entity set within a single generalization.
Disjoint
# an entity can belong to only one lower-level entity set
# Noted in E-R diagram by writing disjoint next to the ISA triangle
Overlapping
# an entity can belong to more than one lower-level entity set
Completeness constraint -- specifies whether or not an entity in the higher-level entity set
must belong to at least one of the lower-level entity sets within a generalization.
total : an entity must belong to one of the lower-level entity sets
partial: an entity need not belong to one of the lower-level entity Sets
Aggregation
Consider the ternary relationship works-on, which we saw earlier
Suppose we want to record managers for tasks performed by an employee at a branch

6
Relationship sets works-on and manages represent overlapping information
Every manages relationship corresponds to a works-on relationship
However, some works-on relationships may not correspond to any manages relationships
# So we can’t discard the works-on relationship
Eliminate this redundancy via aggregation
Treat relationship as an abstract entity
Allows relationships between relationships
Abstraction of relationship into new entity
Without introducing redundancy, the following diagram represents:
An employee works on a particular job at a particular branch
An employee, branch, job combination may have an associated manager

E-R Design Decisions


 The use of an attribute or entity set to represent an object.
 Whether a real-world concept is best expressed by an entity set or a relationship set.
 The use of a ternary relationship versus a pair of binary relationships.
 The use of a strong or weak entity set.
 The use of specialization/generalization – contributes to modularity in the design.
 The use of aggregation – can treat the aggregate entity set as a single unit without
concern for the details of its internal structure.

7
E-R Diagram for a Banking Enterprise

Summary of Symbols Used in E-R Notation

8
NORMALIZATION
 The process of restructuring / adjusting / organizing the logical data model of a database
is termed as Normalization or Decomposition of a Relation.
 Normalization usually involves dividing larger tables into smaller tables and defining the
relationship between them.
 The normal forms provides criteria for determining a table’s degree of vulnerability to
logical database inconsistencies and anomalies.
 Higher the Normal Form, lesser is the vulnerability towards inconsistency and anomalies.
NEED FOR NORMALIZATION
 To overcome the pitfalls of a poor database design such as redundancy, dependency and
anomalies, normalization of complex relations is performed.

9
 Normalization is basically performed to
– Eliminate Redundancy
– Reduce Anomalies
– Organize data efficiently for faster retrieval
– To improve the overall performance of the DBMS.
The Normalization was first proposed by Edgar F. Codd in early 70’s. Hence normal forms are
usually called as Codd’s Laws

Figure: Normal Forms & its Hierarchy


First Normal Form (1NF)
 A relation R is said to be in First Normal Form (1NF) if and only if all the attributes
contain atomic values only
 It should follow the 4 rules to be in 1st normal form
o Table must contain Atomic values only. Atomic means single, indivisible values,
can be NULL
o A column should contain values that are of same type.
o Each column should have a unique name.
o Order of the columns doesn’t matter.
User_ID Name Phone1 Email ID
101 Arjun 403-555-1717, 403-555-1919 [email protected], [email protected]
125 Ajay 403-555-1919 [email protected], [email protected]
143 Arav 403-555-1919, 403-555-1111 [email protected]

The above example is not in 1NF because it does not contain atomic values. Also It contains
multiple values for an Attribute.
Solution:
 One Solution to the above problem is to have individual rows of multiple values.
 User_ID can no more act as a primary key, Since it does not uniquely identify a row.
 This brings extra redundancy even thou’ atomicity is achieved.
User_ID Name Phone Email ID
101 Arjun 403-555-1717 [email protected]
101 Arjun 403-555-1919 [email protected]
125 Ajay 403-555-1919 [email protected]
125 Ajay NULL [email protected]
143 Arav 403-555-1919 [email protected]
143 arav 403-555-1111 NULL
---------------------------------------------------------------------------------------------------------------------

10
SECOND NORMAL FORM (2NF)
 A relation R is in second normal form (2NF) if and only if the relation R is in 1NF and every
non-prime attribute is fully functionally dependent on the primary key of R.
 No Partial Dependency should exist.
FUNCTIONAL DEPENDENCY (FD)
Full Functional Dependency
 A functional dependency X  Y is said to be fully functional dependent if an attribute
is removed from the key, Y cannot be determined. i.e., X – {A}  Y
Partial Dependency
 A functional dependency X  Y is said to be partial, if even an attribute is removed
from the key, Y can be determined. i.e., X – {A}  Y
Example:

 In the above example, Sid does not uniquely identify a row


 Also, Course_ID too cannot be used to uniquely identify a row.
 Hence, (Sid, Course_id) both are needed to uniquely identify a row.
 The above key (Sid, Course_id) is said to be as Composite Key.
 if you know both Sid and Course-id for any student you will be able to retrieve Sname,
Phone, Course-description, Credit-hours and Grade, because these attributes
are dependent on the primary key (Sid, Course_id) .
The List of functional dependencies are
• Sid  Sname
• Sid  Phone
• (Sid, Course-id)  Grade
• Course-id  Course-description
• Course-id  Course-hours

• Attribute Grade is fully functionally dependent on the primary key (Sid, Course-
id) because both parts of the primary keys are needed to determine Grade.
• On the other hand both Sname, and Phone attributes are not fully functionally dependent
on the primary key, because only a part of the primary key namely Sid is enough to
determine both Sname and Phone.
• Similarly the attributes Credit-hours and Course-Description are not fully functionally
dependent on the primary key because only Course-id is enough to determine their
values.
• Hence Sname, Phone, C-h, C-D are partially dependent .
Decomposition into 2NF:
• Student_details(Sid, Sname, Phone)
• Grade_details (Sid, Course-id, Grade)
• Course_details( Course-id, Course-description, Course-hours)
Example2:
Let us consider an inventory relation as shown below,
Inventory (supplier_no, status, city, part_no, quantity)

11
Functional Dependencies:
• (supplier_no, part_no)  quantity (Composite Primary Key)
• (supplier_no)  status
• (supplier_no)  city
• city  status (Supplier's status is determined by location)
Comments:
• Non-key attributes are not mutually independent (city  status).
• Non-key attributes are not fully functionally dependent on the primary key (i.e., status
and city are dependent on just part of the key, namely supplier_no).
The above relation is in 1NF but not in 2NF so it is decomposed into two relations as,
• SUPPLIER_DETAIL (supplier_no, status, city)
• SUPPLIER_PART (supplier_no, part_no, quantity) ----- Now in 2NF
---------------------------------------------------------------------------------------------------------------------
Third Normal Form (3NF)
 A relation R is in third normal form (3NF) if and only if the relation R is in 2NF and
every non-key attribute is non-transitively dependent on the primary key.
 Eliminates Transitive Dependency
 A relation R with more than one candidate key will clearly have transitive dependencies
of the form: primary_key  other_candidate_key  any_non-key_column
 A relation R having just one candidate key is in third normal form (3NF) if and only if
the non-prime attributes of R (if any) are:
1) Mutually independent, and
2) Fully functionally dependent on the primary key of R.
Example (2NF but not 3NF):
SUPPLIER_DETAIL (supplier_no, status, city)
• Functional Dependencies:
– supplier_no  status
– supplier_no  city
– city  status
Comments:
– Lacks mutual independence among non-key attributes. (supplier_no  city, city 
status.)
Decomposition (into 3NF):
– SUPPLIER_CITY (supplier_no, city)
– CITY_STATUS (city, status)
---------------------------------------------------------------------------------------------------------------------
BOYCE CODD NORMAL FORM (BCNF)
 A relation R is in Boyce-Codd Normal Form (BCNF) if and only if every determinant is a
candidate key
 Eliminates Transitive Dependency
 The definition of 3NF does not deal with a relation that has multiple candidate keys,
where
o those candidate keys are composite, and
o the candidate keys overlap (i.e., have at least one common attribute)
Example (3NF but not BCNF)
SUPPLIER_PART (supplier_no, supplier_name, part_no, quantity)
Functional Dependencies:
 We assume that supplier_name's are always unique to each supplier. Thus we have two
candidate keys:
(supplier_no, part_no) and (supplier_name, part_no)
 Thus we have the following dependencies:
12
o (supplier_no, part_no)  quantity
o (supplier_no, part_no)  supplier_name
o (supplier_name, part_no)  quantity
o (supplier_name, part_no)  supplier_no
o supplier_name  supplier_no
o supplier_no  supplier_name
Decomposition (into BCNF):
o SUPPLIER_ID (supplier_no, supplier_name)
o SUPPLIER_PARTS (supplier_no, part_no, quantity)
---------------------------------------------------------------------------------------------------------------------
Fourth Normal Form (4NF)
 A relation is said to be in 4NF if and only if it is in 3NF and no multi-valued
dependencies exists in the relation.
 Eliminates Multivalued Dependency
 A Multi-valued Dependency (MVD) exists when there exists at least three attributes
A,B and C in a relation, such that,
 For each value of A there is a well-defined set of values for B and well-defined
set of values for C
 But the set of values of B is independent of set C
 A multi-determines B (B is multi-dependent on A) A B if and only if the set of B
values matching an (A, C) pair depends only on the A value.
Example:
employee Skills
employee Hobby
Employee Skills Hobby
1 Programming Golf
1 Programming Bowling
1 Analysis Golf
1 Analysis Bowling
2 Analysis Golf
2 Analysis Gardening
2 Management Golf
2 Management Gardening
Decomposition: Table2: Hobbies(employee, hobby)
Table1: Skills (employee, skills) Employee hobby
Employee Skills 1 Golf
1 Programming 1 Bowling
1 Analysis 2 Golf
2 Analysis 2 Gardening
2 Management
---------------------------------------------------------------------------------------------------------------------

13
Fifth Normal Form (5NF) or Project Join Normal Form (PJNF)
 A table is in Fifth Normal Form (5NF) or Project-Join Normal Form (PJNF) if it is in 4NF and it
cannot have a lossless decomposition into any number of smaller tables.
 When Projection and Join operation is performed on 2 or more tables, it should not create spurious
tuples, i.e irrelavant tuples that are not present in the base table.
 Eliminates Join Dependency
Example
SUBJECT LECTURER SEMESTER
Computer Anshika Semester 1
Computer John Semester 1
Math John Semester 1
Math Akash Semester 2
Chemistry Praveen Semester 1

 In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class
for Semester 2. In this case, combination of all these fields required to identify a valid data.
 Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that
subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we can't
leave other two columns blank. So to make the above table into 5NF, we can decompose it into three relations P1,
P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Anomalies
Anomalies in DBMS develops when not all of the required changes in the redundant data are made
successfully. They are of three types: insert, delete and update.
INSERT Anomaly in Database

14
An Insert Anomaly occurs when attributes cannot be inserted into the database without the presence of
other attributes. Usually when a child is inserted without parent.
 Jerry is a new Student with department id 6. There is no Department with this Dept_ID 6. Hence, the
anomaly. The usual behaviour should be a new department id with 6 and only then Student could have
it.
 An insertion anomaly is the inability to add data to the database due to absence of other data.
UPDATE Anomaly in Database

When duplicated data is updated at one instance and not across all instances where it was duplicated.
That’s an update anomaly . See below English department has now Dept_ID 8 , but unfortunately it was
not updated in Student table.
DELETE Anomaly in Database

 Now if someone decides to delete Computer Science department , he may end up deleting all student’s
data who had the department of Computer Science. So to say deletion of some attribute which causes
deletion of other attributes is deletion anomaly.
 These anomalies are addressed by Normalization.

15

You might also like