DBMS Unit2 Print
DBMS Unit2 Print
Syllabus
Entity-Relationship model - E-R Diagrams - Enhanced-ER Model - ER-to-Relational Mapping
- Functional Dependencies Non-loss Decomposition - First, Second, Third Normal Forms,
Dependency Preservation - Boyce/Codd Normal Form Multi-valued Dependencies and Fourth
Normal Form - Join Dependencies and Fifth Normal Form.
Part I: Entity Relationship Model
Introduction to Entity Relationship Model
Entity Relational model is a model for identifying entities to be represented in the database and
representation of how those entities are related.
ER Model
The ER data model specifies enterprise schema that represents the overall logical structure of
a database.
The E-R model is very useful in mapping the meanings and interactions of real-world entities
onto a conceptual schema.
The ER model consists of three basic concepts -
1) Entity Sets
• Entity: An entity is an object that exists and is distinguishable from other objects The entity
can be concrete or abstract. The concrete entity can be - Person, Book, Bank. The abstract entity
can be like - holiday, concept entity is represented as a box.
• Entity set: The entity set is a set of entities of the same types Each entity in entity set have
the same set of attributes and the set of attributes will distinguish it from other entity sets2)
Relationship Sets
Relationship is an association among two or more entities.
The relationship set is a collection of similar relationships. Example - shows the relationship
works for the two entities Employee and Departments.
The association between entity sets is called as participation. That is, the entity sets E1,
E2,..., En participate in relationship set R.The function that an entity plays in a relationship is
called that entity's role.
3) Attributes
Attributes define the properties of a data object of entity. For example, if student is an
entity, his ID, name, address, date of birth, class are its attributes. The attributes help in
determining the unique entity. Entity is shown by rectangular box and attributes are shown in
oval. The primary key is underlined.
Types of Attributes
1) Simple and Composite Attributes:
1) Simple attributes are attributes that are drawn from the atomic value domains
For example - Name = {Parth}; Age = {23}
1) Composite attributes: Attributes that consist of a hierarchy of attributes For example -
Address may consists of "Number", "Street" and "Suburb"→ Address = {59+ 'JM Road' +
'Shivaji Nagar'}
3) Derived attribute:
Derived attributes are the attributes that contain values that are calculated from other
attributes. To represent derived attribute there is dotted ellipse inside the solid ellipse. For
example, Age can be derived from attribute DateOfBirth. In this situation, DateOfBirth might
be called Stored Attribute.
Mapping Cardinality
Mapping Cardinality represents the number of entities to which another entity can be
associated via a relationship set.
The mapping cardinalities are used in representing the binary relationship sets. Various types
of mapping cardinalities are -
1) One to One: An entity A is associated with at least one entity on B and an entity B is
associated with at one entity on A. This can be represented as,
2) One to Many: An entity in A is associated with any number of entities in B. An entity in B,
however, can be associated with at most one entity in A.
4) Many to many:An entity in A is associated with any number (zero or more) of entities in
B, and an entity in B is associated with any number (zero or more) of entities in A.
ER DIAGRAMS
An E-R diagram can express the overall logical structure of a database graphically. E-
R diagrams are used to model real-world objects like a person, a car, a company and the relation
between these real-world objects.
ii) One to many :When entity A is associated with more than one entities at a time then there
is one to many relation. For example - One customer places order at a time.
iii) Many to one : When more than one entities are associated with only one entity then there
is many to one relation. For example – Many student take a ComputerSciCourse
iv) Many to many: When more than one entities are associated with more than one entities.
For example -Many teachers can teach many students.
Alternate representation can be
TERNARY RELATIONSHIP
The relationship in which three entities are involved is called ternary relationship. For example
• Single ternary relation: Now consider a different scenario. Suppose the customer buys
products but the price depends not only on the product, but also on the supplier. Suppose you
needed a customerID, a productID, and a supplierID to identify a price.
Weak Entity Set
• A weak entity is an entity that cannot be uniquely identified by its attributes alone. The entity
set which does not have sufficient attributes to form a primary key is called as weak entity set.
• For example - There can be two subclass entities namely Hourly_Emps and Contract_Emps
which are subclasses of Empoyee class. We might have attributes hours_worked and hourly
wage defined for Hourly_Emps and an attribute contractid defined for ContractEmps.
Therefore, the attributes defined for an Hourly_Emps entity are the attributes for
Employees plus Hourly_Emps. We say that the attributes for the entity set Employees are
inherited by the entity set Hourly_Emps and that Hourly-Emps ISA (read is a) Employees.
CONSTRAINTS ON SPECIALIZATION/GENERALIZATION
There are four types of constraints on specialization/generalization relationship. These are -
1) Membership constraints: This is a kind of constraints that involves determining which
entities can be members of a given lower-level entity. There are two types of membership
constraints –
i) Condition defined: In condition-defined lower-level entity sets, membership is evaluated
on the basis of whether or not an entity satisfies an explicit condition or predicate.
ii) User defined: This is kind of entity set that in which the membership is manually defined.
2) Disjoint constraints: The disjoint constraint only applies when a superclass has more than
one subclass. If the subclasses are disjoint, then an entity occurrence can be a member of only
one of the subclasses. For entity Student has either Postgraduate Student entity or
Undergraduate Student
3) Overlapping: When some entity can be a member of more than one subclasses. For example
- Person can be both a Student or a Staff. The And can be used to represent this constraint.
4) Completeness: It specifies whether or not an entity in the higher-level entity set must belong
to at least one of the lower-level entity sets within the generalization/specialization.
i) Total generalization or specialization: Each higher-level entity must belong to a lower-
level entity set. For example - Account in the bank must either Savings account or Current
Account. The mandatory can be used to represent this constraint.
ii) Partial generalization or specialization: Some higher-level entities may not belong to
any lower-level entity set.
Aggregation
A featu re of the entity relationship model that allows a relationship set to participate in
another relationship set. This is indicated on an ER diagram by drawing a dashed box around
the aggregation. Example:
Review Question
1. Explain with suitable example, the constraints of specialization and generalization in ER
modeling. AU: Dec. 19, Marks 7
Examples based on ER Diagram
Example 2.5.1 Draw the ER diagram for banking systems (home loan applications). AU:
Dec.-17, Marks 8OR Draw an ER diagram corresponding to customers and loans. AU: May.-
14, Marks 8OR Write short notes on: E-R diagram for banking system. AU: Dec.-14,
Solution:
Construct an ER model for the car rental company database."AU: Dec.-15, Marks 16
Solution:
ER to Relational Mapping
AU: May-17, Dec.-19, Marks 13
In this section we will discuss how to map various ER model constructs to Relational Model
construct.
Mapping of Entity Set to Relationship
• An entity set is mapped to a relation in a straightforward way.
• Each attribute of entity set becomes an attribute of the table.
• The primary key attribute of entity set becomes an entity of the table.
• For example - Consider following ER diagram.
The SQL statement captures the information for relationship present in above ER diagram as
follows -
CREATE TABLE Works In (EmpID CHAR(11),
DeptID CHAR(11),EName CHAR(30), Salary INTEGER,
DeptName CHAR(20),Building CHAR(10),
PRIMARY KEY(EmpID,DeptID),
FOREIGN KEY (EmpID) REFERENCES Employee,
FOREIGN KEY (DeptID) REFERENCES Department )
Mapping Relationship Sets (With Constraints) to Tables
• If a relationship set involves n entity sets and some m of them are linked via arrows in the ER
diagram, the key for anyone of these m entity sets constitutes a key for the relation to which
the relationship set is mapped.
• Hence we have m candidate keys, and one of these should be designated as the primary key.
• There are two approaches used to convert a relationship sets with key constraints into table.
• Approach 1:
• By this approach the relationship associated with more than one entities is separately
represented using a table. For example - Consider following ER diagram. Each Dept has at
most one manager, according to the key constraint on Manages.
Here the constraint is each department has at the most one manager to manage it. Hence no two
tuples can have same DeptID. Hence there can be a separate table named Manages with DeptID
as Primary Key. The table can be defined using following SQL statement
CREATE TABLE Manages (EmpID CHAR(11),
DeptID INTEGER,
Since DATE,
PRIMARY KEY (DeptID),
FOREIGN KEY (EmpID) REFERENCES Employees,
FOREIGN KEY (DeptID) REFERENCES Departments)
Approach 2:
• In this approach, it is preferred to translate a relationship set with key constraints.
• It is a superior approach because, it avoids creating a distinct table for the relationship set.
• The idea is to include the information about the relationship set in the table corresponding
to the entity set with the key, taking advantage of the key constraint.
• This approach eliminates the need for a separate Manages relation, and queries asking for
a department's manager can be answered without combining information from two relations.
• The only drawback to this approach is that space could be wasted if several departments
have no managers.
• The following SQL statement, defining a Dep_Mgr relation that captures the information
in both Departments and Manages, illustrates the second approach to translating relationship
sets with key constraints:
CREATE TABLE Dep_Mgr (DeptID INTEGER,
DName CHAR(20), Budget REAL,
EmpID CHAR (11), since DATE,
PRIMARY KEY (DeptID), FOREIGN KEY (EmpID) REFERENCES Employees)
Method 1: All the entities in the relationship are mapped to individual tables
InventoryItem(ID, name)
Book(ID,Publisher)
DVD(ID, Manufacturer)
Method 2: Only subclasses are mapped to tables. The attributes in the superclass are duplicated
in all subclasses. For example -
Book(ID,name, Publisher)
DVD(ID, name, Manufacturer)
Method 3: Only the superclass is mapped to a table. The attributes in the subclasses are taken
to the superclass. For example -
InventoryItem(ID, name, Publisher, Manufacturer)
This method will introduce null values. When we insert a Book record in the table, the
Manufacturer column value will be null. In the same way, when we insert a DVD record in the
table, the Publisher value will be null.
Part II: Relational Database Design
Concept of Relational Database Design AU: Dec.-19, Marks 7
• There are two primary goals of relational database design -
i) Store information without unnecessary redundancy,
ii) To allows us to retrieve information easily.
• For achieving these goals, the database design need to be normalized. That means we have to
check whether the schema is in normal form or not.
• For checking the normal form of the schema, it is necessary to check the functional
dependencies and other data dependencies that exists within the schema.
Functional Dependencies
Definition: Let P and Q be sets of columns, then: P functionally determines Q, written P→Q
if and only if any two rows that are equal on (all the attributes in) P must be equal on (all the
attributes in) Q.
In other words, the functional dependency holds If , T1.P =T2.P, then T1.Q=T2.Q
For example: Consider a relation in which the roll of the student and his/her name is stored as
follows:
Here, R->N is true. That means the functional dependency holds true here. Because for every
assigned RollNuumber of student there will be unique name.
1) Redundant storage: Note that the information about DeptID, DeptName and DeptLoc is
repeated.
2) Update anomalies: In above table if we change DeptLoc of Pune to Chennai, then it will
result inconsistency as for DeptID 101 the DeptLoc is Pune. Or otherwise, we need to update
multiple copies of DeptLoc from Pune to Chennai. Hence this is an update anomaly.
3) Insertion anomalies: For above table if we want to add new tuple say (5, EEE,50000) for
DeptID 101 then it will cause repeated information of (101, XYZ,Pune) will occur.
4) Deletion anomalies: For above table, if we delete a record for EmpID 4, then automatically
information about the DeptID 102,DeptName PQR and DeptLoc Mumbai will get deleted and
one may not be aware about DeptID 102. This causes deletion anomaly.
Decomposition
• Decomposition is the process of breaking down one table into multiple tables.
• Formal definition of decomposition is -
• A decomposition of relation Schema R consists of replacing the relation Schema by two
relation schema that each contain a subset of attributes of R and together include all attributes
of R by storing projections of the instance.
• For example - Consider the following table
Employee_Department table as follows -
We can decompose the above relation Schema into two relation schemas as Employee (Eid,
Ename, Age, City, Salary) and Department (Deptid, Eid, DeptName) as follows –
Employee Table
Department Table
Dependency Preservation
• Definition: A Decomposition D = {R1, R2, R3....Rn} of R is dependency preserving for a
set F of Functional dependency if - (F1 U F2 U... U Fm) = F.
• If decomposition is not dependency-preserving, some dependency is lost in the
decomposition.
Example 2.10.4 Consider the relation R (A, B, C) for functional dependency set (A-> B and
B-> C) which is decomposed into two relations R1 = (A, C) and R2 = (B, C). Then check if this
decomposition dependency preserving or not.
Solution: This can be solved in following steps:
Step 1: For checking whether the decomposition is dependency preserving or not we need to
check following condition
F+= (F1UF2)+
Step 2: We have with us the F+ = { A->B and B->C}
Step 3: Let us find (F1)+ for relation R1 and (F2)+ for relation R2
Step 4: We will eliminate all the trivial relations and useless relations. Hence we can obtain
R1 and R2 as,
Step 5:This proves that F+= (F1UF2UF3)+. Hence given decomposition is dependency
preserving.
Part –III
NORMALIZATION IN DBMS
Normalization in DBMS is a technique using which you can organize the data in the database
tables so that:
There is less repetition of data,
A large set of data is structured into a bunch of smaller tables,
and the tables have a proper relationship between them.
DBMS Normalization is a systematic approach to decompose (break down) tables to
eliminate data redundancy(repetition) and undesirable characteristics like Insertion anomaly in
DBMS, Update anomaly in DBMS, and Delete anomaly in DBMS.
It is a multi-step process that puts data into tabular form, removes duplicate data, and set up the
relationship between tables.
In the table above, we have data for four Computer Sci. students.
As we can see, data for the fields branch, hod (Head of Department), and office_tel are
repeated for the students who are in the same branch in the college, this is Data Redundancy.
student cannot be inserted, or else we will have to set the branch information as NULL.
Also, if we have to insert data for 100 students of the same branch, then the branch
information will be repeated for all those 100 students.
These scenarios are nothing but Insertion anomalies.
If you have to repeat the same data in every row of data, it's better to keep the data
separately and reference that data in each row.
So in the above table, we can keep the branch information separately, and just use
the branch_id in the student table, where branch_id can be used to get the branch
information.
2. Updation Anomaly in DBMS
What if Mr. X leaves the college? or Mr. X is no longer the HOD of the computer
science department? In that case, all the student records will have to be updated, and if
by mistake we miss any record, it will lead to data inconsistency.
This is an Updation anomaly because you need to update all the records in your table
just because one piece of information got changed.
As you can see in the table above, the student_id column is a primary key because
using the student_id value we can uniquely identify each row of data, hence the remaining
columns then become the non-key attributes.
TYPES OF DBMS NORMAL FORMS
2 JavaScript
3 Java
3 Linux
3 C++
ii) Add Multiple rows for Multiple skills
We can also simply add multiple rows to add multiple skills. This will lead to repetition
of the data, but that can be handled as you further Normalize your data using the
Second Normal form and the Third Normal form.
emp_id emp_name emp_mobile emp_skill
1 1 70 Miss. C
1 2 82 Mr. D
2 1 65 Mr. Op
Now in the above table, the primary key is student_id + subject_id, because both these
information are required to select any row of data.
But in the Score table, we have a column teacher_name, which depends on the subject
information or just the subject_id, so we should not keep that information in the Score table.
The column teacher_name should be in the Subjects table. And then the entire system
will be Normalized as per the Second Normal Form.
Updated Subject table and Updated Score table:
1 C Language Miss. C 1 1 70
2 DSA Mr. D 1 2 82
Operating 2 1 65
3 Mr. Op
System
1 1 70 Theory 100
1 2 82 Theory 100
2 1 42 Practical 50
In the table above, the column exam_type depends on
both student_id and subject_id, because,
o a student can be in the CSE branch or the Mechanical branch,
o and based on that they may have different exam types for different subjects.
o The CSE students may have both Practical and Theory for Compiler Design,
o whereas Mechanical branch students may only have Theory exams for Compiler
Design.
But the column total_marks just depends on the exam_type column. And
the exam_type column is not a part of the primary key. Because the primary key
is student_id + subject_id, hence we have a Transitive dependency here.
How to Transitive Dependency?
We create a separate table for ExamType and use it in the Score table.
New ExamType table,
exam_type_id exam_type total_marks duration
1 Practical 50 45
Boyce-Codd Normal Form or BCNF is an extension to the third normal form, and is
also known as 3.5 Normal Form.
For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two
conditions:
1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the
table may have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B,
then B and C should be independent of each other.
If all these conditions are true for any relation(table), it is said to have multi-valued
dependency.
Example
Below we have a college enrolment table with columns s_id, course and hobby.
s_id course Hobby
1 Science Cricket
1 Maths Hockey
2 C# Cricket
2 Php Hockey
As you can see in the table above, student with s_id 1 has opted for two
courses, Science and Maths, and has two hobbies, Cricket and Hockey.
Two records for student with s_id 1, will give rise to two more records, as shown below,
s_id course Hobby
1 Science Cricket
1 Maths Hockey
1 Science Hockey
1 Maths Cricket
And, in the table above, there is no relationship between the columns course and hobby. They
are independent of each other.
So there is multi-value dependency, which leads to un-necessary repetition of data and other
anomalies as well.
The above table is in 4th Normal Form as there is no multivalued dependency. But it is not in
5th normal form because if we join the above two table we may get
To avoid the above problem we can decompose the tables into three tables as
Seller_Company, Seller_Product, and Company Product table