0% found this document useful (0 votes)
116 views131 pages

CS8492 - DBMS - 1

Uploaded by

Antony Selvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views131 pages

CS8492 - DBMS - 1

Uploaded by

Antony Selvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 131

UNIT - I

Syllabus
Purpose of Database System - Views of data - Data Models- Database System Architecture -
Introduction to relational databases - Relational Model - Keys - Relational Algebra - SQL
fundamentals - Advanced SQL features - Embedded SQL - Dynamic SQL.

Contents
1.1 Introduction to Database Management
1.2 Purpose of Database System ...................................... May-07, 12, Dec.-04, ........... Marks 8
1.3 Views of Data ................................................................ May-16 ............................... Marks 16
1.4 Data Models ................................................................ Dec.-14, ................................ Marks 8
1.5 Database System Architecture .................................... May-12, 13, 14, 16, 17,
................................................................................. Dec.-08, 15, 17,................... Marks 16
1.6 Data Independence
1.7 Introduction to Relational Databases
1.8 Relational Model
1.9 Keys ........................................................................... May-06, 07, 12, Dec.-06, ..... Marks 4
1.10 Integrity Constraints ................................................. Dec.-05, ............................. Marks 10
1.11 Database integrity
1.12 Relational Algebra..................................................... May-03,04,05,14,15,16,17,18,
................................................................................. Dec.-02,07,08,11,15,16,17 . Marks 16
1.13 SQL Fundamentals ................................................... Dec.-15,............................... Marks 16
1.14 Advanced SQL Features
1.15 Dynamic SQL ............................................................. May-17, Dec.-17, ................ Marks 11
1.16 Two Marks Questions with Answers

(1 - 1)
Database Management Systems 1-2 Relational Databases

Part I : Introduction of DBMS

1.1 Introduction to Database Management


 Definition : A Database Management System (DBMS) is a collection of
interrelated data and various programs that are used to handle that data.
 The primary goal of DBMS is to provide a way to store and retrieve the required
information from the database in convenient and efficient manner.
 For managing the data in the database two important tasks are conducted -
(i) Define the structure for storage of information.
(ii) Provide mechanism for manipulation of information.
 In addition, the database systems must ensure the safety of information stored.
Database System Applications
There are wide range of applications that make use of database systems. Some of the
applications are -
1) Accounting : Database systems are used in maintaining information employees,
salaries, and payroll taxes.
2) Manufacturing : For management of supply chain and tracking production of
items in factories database systems are maintained.
3) For maintaining customer, product and purchase information the databases are
used.
4) Banking : In banking sector, for customer information, accounts and loan and for
performing banking applications the DBMS is used.
5) For purchase on credit cards and generation of monthly statements database
systems are useful.
6) Universities : The database systems are used in universities for maintaining student
information, course registration, and accounting.
7) Reservation systems : In airline/railway reservation systems, the database is used
to maintain the reservation and schedule information.
8) Telecommunication : In telecommunications for keeping records of the calls made,
generating monthly bills, maintaining balances on prepaid calling cards, and
storing information about communication networks the database systems are used.
1.2 Purpose of Database System AU : May-07, 12, Dec.-04

 Earlier database systems are created in response to manage the commercial data.
These data is typically stored in files. To allow users to manipulate these files
various programs are written for
Database Management Systems 1-3 Relational Databases

1) Addition of new data


2) Updating the data
3) Deleting the data.
 As per the addition of new need, separate application programs were required to
write. Thus as the time goes by, the system acquires more files and more
application programs.
 This typical file processing system is supported by conventional operating
system. Thus the file processing system can be described as –
 The system that stores the permanent records in files and it needs different
application programs to extract or add the records.
 Before introducing database management system, this file processing system was
in use. However, such a system has many drawbacks. Let us discuss them –
Disadvantages of Traditional File Processing System
The traditional file system has following disadvantages :
1) Data redundancy : Data redundancy means duplication of data at several places.
Since different programmers create different files and these files might have
different structures, there are chances that some information may appear repeatedly
in some or more format at several places.
2) Data inconsistency : Data inconsistency occurs when various copies of same data
may no longer get matched. For example changed address of an employee may be
reflected in one department and may not be available (or old address present) for
other department.
3) Difficulty in accessing data : The conventional file system does not allow to
retrieve the desired data in efficient and convenient manner.
4) Data isolation : As the data is scattered over several files and files may be in
different formats, it becomes to retrieve the desired data from the file for writing the
new application.
5) Integrity problems : Data integrity means data values entered in the database fall
within a specified range and are of correct format. With the use of several files
enforcing such constraint on the data becomes difficult.
6) Atomicity problems : An atomicity means particular operation must be carried out
entirely or not at all with the database. It is difficult to ensure atomicity in
conventional file processing system.
7) Concurrent access anomalies : For efficient execution, multiple users update data
simultaneously, in such a case data need to be synchronized. As in traditional file
systems, data is distributed over multiple files, one cannot access these files
concurrently.
Database Management Systems 1-4 Relational Databases

8) Security problems : Every user is not allowed to access all the data of database
system. Since application program in file system are added in an ad hoc manner,
enforcing such security constraints become difficult.
Database systems offer solutions to all the above mentioned problems.
Difference between Database System and Conventional File System
Sr. No. Database systems Conventional file systems

1. Data redundancy is less. Data redundancy is more.

2. Security is high. Security is very low.

3. Database systems are used when Conventional file systems are used where
security constraints are high. there is less demand for security constraints.
4. Database systems define the data in a File systems define the data in un-structured
structured manner. Also there is well manner. Data is usually in isolated form.
defined co-relation among the data.
5. Data inconsistency is less in database Data inconsistency is more in file systems.
systems.
6. User is unknown to the physical User locates the physical address of file to
address of the data used in database access the data in conventional file systems.
systems.
7. We can retrieve the data in any desired We cannot retrieve the data in any desired
format using database systems. format using file systems.

8. There is ability to access the data There is no ability to concurrently access the
concurrently using database systems. data using conventional file system.

Characteristics of Database Systems


Following are the characteristics of database system -
1) Representation of some aspects of real world applications.
2) Systematic management of information.
3) Representing the data by multiple views.
4) Efficient and easy implementation of various operations such as insertion, deletion
and updation.
5) It maintains data for some specific purpose.
6) It represents logical relationship between records and data.

Advantages of Database Systems


Following are the advantages of DBMS -
1) DBMS removes the data redundancy that means there is no duplication of data in
database.
2) DBMS allows to retrieve the desired data in required format.
Database Management Systems 1-5 Relational Databases

3) Data can be isolated in separate tables for convenient and efficient use.
4) Data can be accessed efficiently using a simple query language.
5) The data integrity can be maintained. That means – the constraints can be applied
on data and it should be in some specific range.
6) The atomicity of data can be maintained. That means, if some operation is
performed on one particular table of the database, then the change must be reflected
for the entire database.
7) The DBMS allows concurrent access to multiple users by using the synchronization
technique.
8) The security policies can be applied to DBMS to allow the user to access only
desired part of the database system.
Disadvantages of Database Systems
1) Complex design : Database design is complex, difficult and time consuming.
2) Hardware and software cost : Large amount of investment is needed to setup the
required hardware or to repair software failure.
3) Damaged part : If one part of database is corrupted or damaged, then entire
database may get affected.
4) Conversion cost : If the current system is in conventional file system and if we need
to convert it to database systems then large amount of cost is incurred in purchasing
different tools, and adopting different techniques as per the requirement.
5) Training : For designing and maintaining the database systems, the people need to
be trained

University Questions
1. Compare file system with database system AU : May-07, Marks 8, May-12, Marks 2

2. What are the advantages and disadvantages of DBMS ? AU : Dec.-04, Marks 4

1.3 Views of Data AU : May-16

 Database is a collection of interrelated data and set of programs that allow users to
access or modify the data.
 Abstract view of the system is a view in which the system hides certain details of
how the data are stored and maintained.
 The main purpose of database systems is to provide users with abstract view of
the data.
 The view of the system helps the user to retrieve data efficiently.
 For simplifying the user interaction with the system there are several levels of
abstraction - these levels are - Physical level, logical level and view level.
Database Management Systems 1-6 Relational Databases

1.3.1 Data Abstraction


Data abstraction : Data abstraction means retrieving only required amount of
information of the system and hiding background details.
There are several levels of abstraction that simplify the user interactions with the
system. These are
i) Physical level :

o This is the lowest level.


o This level describes how actually the data are stored.
o This level describes complex low level data structures.
2) Logical level :

o This is the next higher level, which describes the what data are stored in database.
o This level also describes the relationship among the data.
o The logical level thus describes then entire database in terms of small number of
relatively simple structures.
The database administrators use logical level of abstraction for deciding what
information to keep in database.
3) View level :

o This is highest level of abstraction that describes only part of the entire database.
o The view level can provide the access to only part of the database.
o This level helps in simplifying the interaction with the system.
o The system can provide multiple views of the same system.
o Clerk at the reservation system, can see only part of the database and can access the
required information of the passenger.
Fig. 1.3.1 shows the relationship among the three levels of abstraction.

Fig. 1.3.1 : Levels of data abstraction


Database Management Systems 1-7 Relational Databases

For example : Consider following record


type employee = record
empID:numeric(10)
empname:char(20)
dept_no:numeric(10)
salary:numeric(8,2)
end
This code defines a new record employee with four fields. Each field is associated with
field name and its type. There are several other records such as
department with fields dept_no, dept_name, building
customer with fields cust_id,cust_name
o At the physical level, the record - customer, employee, department can be
described as block of consecutive storage locations. Many database systems
hide lowest level storage details from database programmer.
o The type definition of the records is decided at the logical level. The
programmer work of the record at this level, similarly database
administrators also work at this level of abstraction.
o There is specific view of the record is allowed at the view level. For instance -
-customer can view the name of the employee, or id of the employee but
cannot access employee’s salary.

1.3.2 Instances and Schemas


Schema : The overall design of the database is called schema
For example - In a program we do variable declaration and assignment of values to the
variable. The variable declaration is called schema and the value assigned to the variable
is called instance. The schema for the student record can be
RollNo Name Marks

Instances : When information is inserted or deleted from the database then the
database gets changed. The collection of information at particular moment is called
instances. For example - following is an instance of student database
RollNo Name Marks

10 AAA 43

20 BBB 67
Database Management Systems 1-8 Relational Databases

Types of Schema : The database has several schema based on the levels of abstraction.
(1) Physical Schema : The physical schema is a database design described at the
physical level of abstraction.
(2) Logical Schema : The logical schema is a database design at the logical level of
abstraction.
(3) Subschema : A database may have several views at the view level which are
called subschemas.

1.3.3 Database Languages


There are two types of languages supported by database systems. These are -
(1) DDL -
 Data Definition Language (DDL) is a specialized language used to specify a
database schema by a set of definitions.
 It is a language which is used for creating and modifying the structures of tables,
views, indexes and so on.
 DDL is also used to specify additional properties of data.
 Some of the common commands used in DDL are -CREATE, ALTER, DROP.
 The main use of CREATE command is to build a new table. Using ALTER
command, the users can add up some additional column and drop existing
columns. Using DROP command, the user can delete table or view.
(2) DML
 DML stands for Data Manipulation Language.
 This language enables users to access or manipulate data as organized by
appropriate data model.
 The types of access are -

o Retrieval of information stored in the database


o Insertion of new information into the database.
o Deletion of information from the database.
o Modification of information stored in database.
 There are two types of DML -
o Procedural DML - Require a user to specify what data are needed and how to
get those data.
o Declarative DML - Require a user to specify what data are needed without
specifying how to get those data.
Database Management Systems 1-9 Relational Databases

 Query is a statement used for requesting the retrieval of information. This


retrieval of information using some specific language is called query language.

University Question
1. Briefly explain about views of data. AU : May-16, Marks 16

1.4 Data Models AU : Dec.-14

 Definition : It is a collection of conceptual tools for describing data, relationships


among data, semantics (meaning) of data and constraints.
 Data model is a structure below the database.
 Data model provides a way to describe the design of database at physical, logical
and view level.
 There are various data models used in database systems and these are as follows -
(1) Relational model :
o Relation model consists of collection of tables which stores data and also
represents the relationship among the data.
o Table is also known as relation.
o The table contains one or more columns and each column has unique name.
o Each table contains record of particular type, and each record type defines a
fixed number of fields or attributes.
o For example – Following figure shows the relational model by showing the
relationship between Student and Result database. For example – Student
Ram lives in city Chennai and his marks are 78. Thus the relationship
between these two databases is maintained by the SeatNo. Column

SeatNo Name City SeatNo Marks


101 Ram Chennai 101 78
102 Shyam Pune 102 95

Advantages :
(i) Structural Independence : Structural independence is an ability that allows us to
make changes in one database structure without affecting other. The relational
model have structural independence. Hence making required changes in the
database is convenient in relational database model.
(ii)Conceptual Simplicity : The relational model allows the designer to simply focus
on logical design and not on physical design. Hence relational models are
conceptually simple to understand.
Database Management Systems 1 - 10 Relational Databases

(iii) Query Capability : Using simple query language (such as SQL) user can get
information from the database or designer can manipulate the database structure.
(iv) Easy design,maintenance and usage : The relational models can be designed
logically hence they are easy to maintain and use.
Disadvantages :
(i) Relational model requires powerful hardware and large data storage devices.
(ii) May lead to slower processing time.
(iii) Poorly designed systems lead to poor implementation of database systems.

(2) Entity relationship model :


o As the name suggests the entity relationship model uses collection of basic
objects called entities and relationships.
o The entity is a thing or object in the real world.
o The entity relationship model is widely used in database design.
o For example - Following is a representation of Entity Relationship model in
which the relationship works_for is between entities Employee and
Department.

Advantages :
i) Simple : It is simple to draw ER diagram when we know entities and relationships.
ii) Easy to understand : The design of ER diagram is very logical and hence they are
easy to design and understand.
iii) Effective: It is effective communication tool.
iv) Integrated : The ER model can be easily integrated with Relational model.
v) Easy conversion: ER model can be converted easily into other type of models.
Disadvantages :
i) Loss of information : While drawing ER model some information can be hidden
or lost.
ii) Limited relationships : The ER model can represent limited relationships as
compared to other models.
Database Management Systems 1 - 11 Relational Databases

iii) No Representation for data manipulation : It is not possible to represent data


manipulation in ER model.

iv) No industry standard : There is no industry standard for notations of ER diagram.

(3) Object Based Data Model :


o The object oriented languages like C++, Java, C# are becoming the dominant
in software development.
o This led to object based data model.
o The object based data model combines object oriented features with relational
data model.

Advantages :
i) Enriched modelling : The object based data model has capability of modelling the
real world objects.

ii) Reusability : There are certain features of object oriented design such as inheritance,
polymorphism which help in reusability.

iii) Support for schema evolution : There is a tight coupling between data and
applications, hence there is strong support for schema evolution.

iv) Improved performance : Using object based data model there can be significant
improvement in performance using object based data model.

Disadvantages :
i) Lack of universal data model : There is no universally agreed data model for an
object based data model, and most models lack a theoretical foundation.
ii) Lack of experience : In comparison with relational database management the use of
object based data model is limited. This model is more dependent on the skilled
programmer.
iii)Complex : More functionalities present in object based data model make the design
complex.

(4) Semi-structured data model :


o The semi-structured data model permits the specification of data where
individual data items of same type may have different sets of attributes.
o The Extensible Markup Language (XML) is widely used to represent semi-
structured data model.
Database Management Systems 1 - 12 Relational Databases

Advantages
i) Data is not constrained by fixed schema.

ii) It is flexible.

iii) It is portable.
Disadvantage
i) Queries are less efficient than other types of data model.

University Question
1. Write short note on : Data model and its types. AU : Dec.-14, Marks 8

1.5 Database System Architecture AU : May-12, 13, 14, 16, 17, Dec.-08, 15, 17

• The typical structure of typical DBMS is based on relational data model as shown in
Fig. 1.5.1. (Refer page 1-14).

• Consider the top part of Fig. 1.5.1. It shows application interfaces used by naïve
users, application programs created by application programmers, query tools used
by sophisticated users and administration tools used by database administrator

• The lowest part of the architecture is for disk storage.

• The two important components of database architecture are - Query processor and
storage manager.

Query processor :

 The interactive query processor helps the database system to simplify and
facilitate access to data. It consists of DDL interpreter, DML compiler and query
evaluation engine.

 With the following components of query processor, various functionalities are


performed -

i) DDL interpreter : This is basically a translator which interprets the DDL


statements in data dictionaries.

ii) DML compiler : It translates DML statements query language into an evaluation
plan. This plan consists of the instructions which query evaluation engine
understands.
Database Management Systems 1 - 13 Relational Databases

iii) Query evaluation engine : It executes the low-level instructions generated by the
DML compiler.

 When a user issues a query, the parsed query is presented to a query optimizer,
which uses information about how the data is stored to produce an efficient
execution plan for evaluating the query. An execution plan is a blueprint for
evaluating a query. It is evaluated by query evaluation engine.

Storage manager :

o Storage manager is the component of database system that provides interface


between the low level data stored in the database and the application programs and
queries submitted to the system.
o The storage manager is responsible for storing, retrieving, and updating data in the
database. The storage manager components include -
i) Authorization and integrity manager : Validates the users who want to access
the data and tests for integrity constraints.

ii) Transaction manager : Ensures that the database remains in consistent despite
of system failures and concurrent transaction execution proceeds without
conflicting.

iii) File manager : Manages allocation of space on disk storage and


representation of the information on disk.

iv) Buffer manager : Manages the fetching of data from disk storage into main
memory. The buffer manager also decides what data to cache in main memory.
Buffer manager is a crucial part of database system.

o Storage manager implements several data structures such as -

i) Data files : Used for storing database itself.

ii) Data dictionary : Used for storing metadata, particularly schema of database.

iii) Indices : Indices are used to provide fast access to data items present in the
database
Database Management Systems 1 - 14 Relational Databases

Fig. 1.5.1 Architecture of database


Database Management Systems 1 - 15 Relational Databases

University Questions
1. Explain the overall architecture of database system in detail.
AU : May-14,17, Dec.-17, Marks 8, May-16, Marks 16
2. With the help of a neat block diagram explain basic architecture of a database management system.
AU : May-12, May 13, Marks 16,Dec.-15, Marks 8
Q. Explain component modules of a DBMS and their interactions with the architecture
AU : Dec 08, Marks 10

1.6 Data Independence


Definition : Data independence is an ability by which one can change the data at one
level without affecting the data at another level. Here level can be physical, conceptual or
external.
Data independence is one of the important characteristics of database management
system.
By this property, the structure of the database or the values stored in the database can
be easily modified by without changing the application programs.
There are two types of data independence

Fig. 1.6.1 Data independence

1. Physical Independence : This is a kind of data independence which allows the


modification of physical schema without requiring any change to the conceptual
schema. For example - if there is any change in memory size of database server then
it will not affect the logical structure of any data object.
2. Logical Independence : This is a kind of data independence which allows the
modification of conceptual schema without requiring any change to the external
schema. For example - Any change in the table structure such as addition or
deletion of some column does not affect user views.
Database Management Systems 1 - 16 Relational Databases

By these data independence the time and cost acquired by changes in any one level can
be reduced and abstract view of data can be provided to the user.

Part II Relational Databases

1.7 Introduction to Relational Databases


 Relation database is a collection of tables having unique names.
 For example – Consider the example of Student table in which the information
about the student is stored.
RollNo Name Phone
001 AAA 1111111111
002 BBB 2222222222
003 CCC 3333333333
Fig. 1.7.1 Student table
The above table consists of three column headers RollNo, Name and Phone.
Each row of the table indicates the information of each student by means of his
Roll Number, Name and Phone number.
Similarly consider another table named Course as follows –

CourseID CourseName Credits


101 Mechanical 4

102 Computer Science 6

103 Electrical 5

104 Civil 3

Fig. 1.7.2 Course table


Clearly, in above table the columns are CourseID, CourseName and Credits.
The CourseID 101 is associated with the course named Mechanical and associated
with the course of mechanical there are 4 credit points. Thus the relation is
represented by the table in the relation model. Similarly we can establish the
relationship among the two tables by defining the third table. For example –
Consider the table Admission as
RollNo CourseID
001 102
002 104
003 101
Fig. 1.7.3 Admission
Database Management Systems 1 - 17 Relational Databases

From this third table we can easily find out that the course to which the RollNo 001 is
admitted is computer Science.

1.8 Relational Model

There are some commonly used terms in Relational Model and those are -

Table or relation : In relational model, table is a collection of data items arranged in


rows and columns. The table cannot have duplicate data or rows. Below is an example of
student table

Roll No Name Marks Phone

001 AAA 88 1111111111

002 BBB 83 2222222222

003 CCC 98 3333333333

004 DDD 67 4444444444

Tuple or record or row : The single entry in the table is called tuple. The tuple
represents a set of related data. In above Student table there are four tuples. One of the
tuple can be represented as
001 AAA 88 1111111111

Attribute or columns : It is a part of table that contains several records. Each record
can be broken down into several small parts of data known as attributes. For example the
above table consists of four attributes such as RollNo,Name,Marks and Phone.

Relation schema : A relation schema describes the structure of the relation, with the
name of the relation (i.e. name of table), its attributes and their names and type.

Relation Instance : It refers to specific instance of relation i.e. containing a specific set
of rows. For example – the following is a relation instance – which contains the records
with marks above 80.
Database Management Systems 1 - 18 Relational Databases

RollNo Name Marks Phone


001 AAA 88 1111111111
002 BBB 83 2222222222
003 CCC 98 3333333333
Domain : For each attribute of relation, there is a set of permitted values called
domain. For example – in above table, the domain of attribute Marks is set of all possible
permitted marks of the students. Similarly the domain of Name attribute is all possible
names of students.
That means Domain of Marks attribute is (88,83,98)
Atomic : The domain is atomic if elements of the domain are considered to be
indivisible units. For example in above Student table, the attribute Phone is non-atomic.
NULL attribute : A null is a special symbol, independent of data type, which means
either unknown or inapplicable. It does not mean zero or blank. For example - Consider a
salary table that contains NULL
Emp# Job Name Salary Commission

E10 Sales 12500 32090

E11 Null 25000 8000

E12 Sales 44000 0

E13 Sales 44000 Null

Degree : It is nothing but total number of columns present in the relational database. In
given Student table –

Roll No Name Marks Phone

001 AAA 88 1111111111

002 BBB 83 2222222222

003 CCC 98 3333333333

The degree is 4.

Cardinality : It is total number of tuples present in the relational database. In above


given table the cardinality is 3

Example 1.8.1 Find out following for given Staff table


i) No of Columns
ii) No of tuples
iii) Different attributes
Database Management Systems 1 - 19 Relational Databases

iv) Degree
v) Cardinality

StaffID Name Sex Designation Salary DOJ


S001 John M Manager 50000 1 Oct. 2012

S002 Ram M Executive 20000 20 Jan. 2015

S003 Meena F Supervisor 40000 12 Aug. 2011

Solution :

i) No of Columns = 6
ii) No of Tuples= 3
iii) Different attributes are StaffID, Name,Sex, Designation, Salary, DOJ
iv) Degree= Total number of columns=6
v) Cardinality =Total number of rows = 3

1.9 Keys AU : May-06, 07, 12, Dec.-06

Keys are used to specify the tuples distinctly in the given relation.
Various types of keys used in relational model are – Superkey, Candidate Keys,
primary keys, foreign keys. Let us discuss them with suitable example
1) Super Key(SK): It is a set of one or more attributes within a table that can uniquely
identify each record within a table. For example – Consider the Student table as
follows –
Reg No. Roll No Phone Name Marks

R101 001 1111111111 AAA 88

R102 002 2222222222 BBB 83

R103 003 3333333333 CCC 98

R104 004 4444444444 DDD 67

Fig. 1.9.1 Student


The superkey can be represented as follows
Database Management Systems 1 - 20 Relational Databases

Clearly using the (RegNo) and (RollNo,Phone,Name) we can identify the records
uniquely but (Name, Marks) of two students can be same, hence this combination
not necessarily help in identifying the record uniquely.

2) Candidate Key(CK) : The candidate key is a subset of superset. In other words


candidate key is a single attribute or least or minimal combination of attributes that
uniquely identify each record in the table. For example - in above given Student
table, the candidate key is RegNo, (RollNo,Phone). The candidate key can be

Thus every candidate key is a superkey but every superkey is not a candidate key.
3) Primary Key(PK): The primary key is a candidate key chosen by the database
designer to identify the tuple in the relation uniquely. For example – Consider the
following representation of primary key in the student table
Database Management Systems 1 - 21 Relational Databases

Other than the above mentioned primary key, various possible primary keys can be
(RollNo), (RollNo,Name), (RollNo, Phone)
The relation among super key, candidate key and primary can be denoted by
Candidate Key=Super Key – Primary Key
Rules for Primary Key
(i) The primary key may have one or more attributes.
(ii) There is only one primary key in the relation.
(iii) The value of primary key attribute can not be NULL.
(iv) The value of primary key attribute does not get changed.
4) Alternate key : The alternate key is a candidate key which is not chosen by the
database designer to uniquely identify the tuples. For example –
Database Management Systems 1 - 22 Relational Databases

5) Foreign key : Foreign key is a single attribute or collection of attributes in one table
that refers to the primary key of other table.
 Thus foreign keys refer to primary key.
 The table containing the primary key is called parent table and the table
containing foreign key is called child table.
 Example -

From above example, we can see that two tables are linked. For instance we could
easily find out that the ‘Student CCC has opted for ComputerSci course’

University Question

1. Explain distinction among the terms primary key, candidate key, foreign key and super key with
suitable example AU : May-06, 07, 12, Dec.-06, Marks 4

1.10 Integrity Constraints AU : Dec.-05

Database integrity means correctness or accuracy of data in the database.A database


may have number of integrity constraints. For example –
(i) The Employee ID and Department ID must consists of two digits.
(ii) Every Employee ID must start with letter.
The integrity constraints are classified based on the concept of primary key and foreign
key. Let us discuss the classification of constraints based on primary key and foreign key
as follows –

1.10.1 Entity Integrity Rule


This rule states that “ In the relations , the value of attribute of primary key can not be
null”.
The NULL represents a value for an attribute that is currently unknown or is not
applicable for this tuple. The Nulls are always to deal with incomplete or exceptional
data.
Database Management Systems 1 - 23 Relational Databases

The primary key value helps in uniquely identifying every row in the table. Thus if the
users of the database want to retrieve any row from the table or perform any action on
that table, they must know the value of the key for that row. Hence it is necessary that the
primary key should not have the NULL value.

1.10.2 Referential Integrity Rule


 Referential integrity refers to the accuracy and consistency of data within a
relationship.
 In relationships, data is linked between two or more tables. This is achieved by
having the foreign key (in the associated table) reference a primary key value (in
the primary - or parent - table). Because of this, we need to ensure that data on
both sides of the relationship remain intact.
 The referential integrity rule states that “whenever a foreign key value is used it
must reference a valid, existing primary key in the parent table”.
 Example : Consider the situation where you have two tables : Employees and
Managers. The Employees table has a foreign key attribute entitled ManagedBy,
which points to the record for each employee’s manager in the Managers table.
Referential integrity enforces the following three rules :
i) You cannot add a record to the Employees table unless the ManagedBy attribute
points to a valid record in the Managers table. Referential integrity prevents the
insertion of incorrect details into a table. Any operation that doesn't satisfy
referential integrity rule fails.
ii) If the primary key for a record in the Managers table changes, all corresponding
records in the Employees table are modified.
iii) If a record in the Managers table is deleted, all corresponding records in the
Employees table are deleted.
Advantages of Referential Integrity
Referential integrity offers following advantages :
i) Prevents the entry of duplicate data.
ii) Prevents one table from pointing to a nonexistent field in another table.
iii) Guaranteed consistency between "partnered" tables.
iv) Prevents the deletion of a record that contains a value referred to by a foreign key in
another table.
v) Prevents the addition of a record to a table that contains a foreign key unless there is
a primary key in the linked table.
Database Management Systems 1 - 24 Relational Databases

University Question
1. Discuss the entity Integrity and referential integrity constraints. Why are they important ? Explain
them with suitable examples. AU : Dec.-05, Marks 10

1.11 Database integrity


 The foreign key is a key in one table that refers to the primary key of another table.
 The foreign key is basically used to link two tables. For example –
Consider Customer table as follows –

Customer

CustID Name City

C101 AAA Chennai

C102 BBB Mumbai

C103 CCC Pune

Order

OrderID Description CustID

111 Bolts C103

222 Nuts C103

333 Beams C101

444 Screws C102

555 Disks C101

 Note that the "CustID" column in the "Order" table points to the "CustID" column
in the "Customer" table.
 The "CustID" column in the "Customer" table is the PRIMARY KEY in the
"Customer" table.
 The "CustID" column in the "Order" table is a FOREIGN KEY in the "Order" table.
 The table containing the foreign key is called the child table, and the table
containing the primary key is called the referenced or parent table.
 The FOREIGN KEY constraint is used to prevent actions that would destroy links
between tables.
Database Management Systems 1 - 25 Relational Databases

 The FOREIGN KEY constraint also prevents invalid data from being inserted into
the foreign key column, because it has to be one of the values contained in the
table it points to.

1.12 Relational Algebra AU : May-03,04,05,14,15,16,17,18, Dec.-02,07,08,11,15,16,17


 There are two formal query languages associated with relational model and those
are relational algebra and relational calculus.

 Definition : Relational algebra is a procedural query language which is used to


access database tables to read data in different ways.

 The queries present in the relational algebra are denoted using operators.

 Every operator in relational algebra accepts relational instances (tables) as input


and returns relational instance as output. For example :

 Each relational algebra is procedural. That means Each relational query describes
a step-by-step procedure for computing the desired answer, based on the order in
which operators are applied in the query.

 A sequence of relational algebra operations forms a relational algebra expression,


whose result will also be a relation that represents the result of a database query.
The By composing the operators in relational expressions the complex relation can
be defined.
Database Management Systems 1 - 26 Relational Databases

1.12.1 Relational Operations


Various types of relational operations are as follows -

(1) Selection :
 This operation is used to fetch the rows or tuples from the table(relation).
 Syntax : The syntax is
predicate(relation)

 where σ represents the select operation. The predicate denotes some logic using
which the data from the relation(table) is selected.
 For example - Consider the relation student as follows

sid sname age gender


1 Ram 21 Male

2 Shyam 18 Male

3 Seeta 16 Female

4 Geeta 23 Female

Fig.1.12.1 Student Table


Query : Fetch students with age more than 18
We can write it in relational algebra as
age >18(Student)

The output will be -


sname
Ram
Geeta
Database Management Systems 1 - 27 Relational Databases

We can also specify conditions using and, or operators.


age >18 and gender = ‘Male’ (Student)
sname
Ram
(2) Projection :
 Project operation is used to project only a certain set of attributes of a relation.
That means if you want to see only the names all of the students in the Student
table, then you can use Project operation.
 Thus to display particular column from the relation, the projection operator is
used.
 It will only project or show the columns or attributes asked for, and will also
remove duplicate data from the columns.
 Syntax:
C1, C2… (r)
where C1, C2 etc. are attribute names(column names).
 For example - Consider the Student table given in Fig. 1.12.2.
Query : Display the name and age all the students
This can be written in relational algebra as
sname, age(Student)
Above statement will show us only the Name and Age columns for all the rows of data
in Student table.

sname age
Ram 21
Shyam 18
Seeta 16
Geeta 23

Fig. 1.12.2

(3) Cartesian product :


 This is used to combine data from two different relations(tables) into one and
fetch data from the combined relation.
 Syntax : A × B
 For example : Suppose there are two tables named Student and Reserve as follows
Database Management Systems 1 - 28 Relational Databases

Student Reserve
sid sname age sid isbn day
1 Ram 21 1 005 07-07-18

2 Shyam 18 2 005 03-03-17

3 Seeta 16 3 007 08-11-16

4 Geeta 23

 Query : Find the names of all the students who have reserved isbn = 005. To
satisfy this query we need to extract data from two table. Hence the cartesian
product operator is used as
(Student.sid = Reserve.sid
^
Reserve.Isbn =005(Student × Reserve)

As an output we will get


sid sname age sid isbn day

1 Ram 21 1 005 07-07-18

2 Shyam 18 2 005 03-03-18

Note : that although the sid columns is same, it is repeated.


(4) Set operations : Various set operations are - union, intersection and set-difference.
Let us understand each of these operations with the help of examples.
(i) Union:

o This operation is used to fetch data from two relations(tables) or temporary


relation(result of another operation).

o For this operation to work, the relations(tables) specified should have same
number of attributes(columns) and same attribute domain. Also the duplicate
tuples are automatically eliminated from the result.

o Syntax : A ∪ B
o where A and B are relations.
o For example : If there are two tables student and book as follows –
Database Management Systems 1 - 29 Relational Databases

Student Book
sid sname age isbn bname Author
1 Ram 21 005 DBMS XYZ

2 Shyam 18 006 OS PQR

3 Seeta 16 007 DAA ABC

4 Geeta 23

o Query : We want to display both the student name and book names from both the
tables then
Sname(Student) ∪ bname (Book)

(ii) Intersection :

o This operation is used to fetch data from both tables which is common in both
the tables.
o Syntax : A ∩ B
where A and B are relations.

o Example – Consider two tables – Student and Worker


Student Worker

Name Branch Name Salary


AAA ComputerSci XXX 3000
BBB Mechanical AAA 2000
CCC Civil YYY 1500
DDD Electrical DDD 2500

o Query : If we want to find out the names of the students who are working in a
company then
name(Student) ∩ name (Worker)

Name
AAA
DDD
(iii) Set-Difference : The result of set difference operation is tuples, which are
present in one relation but are not in the second relation.
Database Management Systems 1 - 30 Relational Databases

Syntax : A – B
For Example : Consider two relations Full_Time_Employee and Part_Time_Employee, if
we want to find out all the employee working for Fulltime, then the set difference
operator is used -
EmpName(Full_Time_Employee) - EmpName (Part_Time_Employee)

(5) Join : The join operation is used to combine information from two or more relations.
Formally join can be defined as a cross-product followed by selections and projections,
joins arise much more frequently in practice than plain cross-products. The join operator
is used as ⋈
There are three types of joins used in relational algebra
i) Conditional join : This is an operation in which information from two tables is
combined using some condition and this condition is specified along with the join
operator.

A ⋈cB = c(A × B)
Thus ⋈ is defined to be a cross-product followed by a selection. Note that the
condition c can refer to attributes of both A and B. The condition C can be specified
using <,<=,>,<= or = operators.
For example consider two table student and reserve as follows -

Student Reserve
sid sname age sid isbn day
1 Ram 21 1 005 07-07-18

2 Shyam 18 2 005 03-03-17

3 Seeta 16 3 007 08-11-16

4 Geeta 23

If we want the names of students with sid(Student) = sid(Reserve) and isbn = 005,
then we can write it using Cartesian product as -

(((Student.sid = Reserve.sid)  (Student × Reserve))


(Reserve.(Isbn) =005))

Here there are two conditions as


i) (Student.sid = Reserve.sid) and ii) (Reserve.isbn = 005) which are joined by 
operator.
Now we can use ⋈C instead of above statement and write it as –

(Student ⋈(Student.sid = Reserve.sid)  (Reserve.(Isbn) =005) Reserve))


Database Management Systems 1 - 31 Relational Databases

The result will be -


sid sname age isbn day
1 Ram 21 005 07-07-18

2 Shyam 18 005 03-03-18

ii) Equijoin : This is a kind of join in which there is equality condition between two
attributes(columns) of relations(tables). For example - If there are two table Book
and Reserve table and we want to find the book which is reserved by the student
having isbn 005 and name of the book is ‘DBMS’, then :

Book Reserve
isbn bname Author sid isbn day
005 DBMS XYZ 1 005 07-07-18

006 OS PQR 2 005 03-03-17

007 DAA ABC 3 007 08-11-16

(bname = ‘DBMS’ (Book ⋈ (Book.isbn = Reserve.isbn) Reserve)


Then we get
isbn bname Author sid day
005 DBMS XYZ 1 07-07-18

005 DBMS XYZ 2 03-03-18

iii)Natural Join : When there are common columns and we have to equate these
common columns then we use natural join. The symbol for natural join is simply ⋈
without any condition. For example, consider two tables -

Book Reserve
isbn bname Author sid isbn day

005 DBMS XYZ 1 005 07-07-18

006 OS PQR 2 005 03-03-17

007 DAA ABC 3 007 08-11-16

Now if we want to list the books that are reserved, then that means we want to
match Books.isbn with Reserve.isbn. Hence it will be simply
Database Management Systems 1 - 32 Relational Databases

Books ⋈ Reserve
(6) Rename operation : This operation is used to rename the output relation for any
query operation which returns result like Select, Project etc. Or to simply rename a
relation(table). The operator (rho) is used for renaming.
Syntax : (RelationNew, RelationOld)
For example : If you want to create a relation Student_names with sid and sname from
Student, it can be done using rename operator as :
ρ(Student_names, (sid.sname(Student))
(7) Divide operation
The division operator is used when we have to evaluate queries which contain the
keyword ALL.
It is denoted by A/B where A and B are instances of relation.
For example - Find all the customers having accounts in all the branches. For that
consider two tables - Customer and Account as
Customer Account
Name Branch Branch
A Pune Pune
B Mumbai Mumbai
A Mumbai
C Pune

Now A/B will give us


Name
A
Here We check all the branches from Account table against all the names from Customer
table. We can then find that only customer A has all the accounts in all the branches.
Formal Definition of Division Operation : The operation A/B is define as the set of all x
values (in the form of unary tuples) such that for every y value in (a tuple of) B, there is a
tuple <x,y> in A.
Example 1.12.1 Consider following databases reserves(sid, bid, day) sailors (sid, sname,
rating, age) boats (bid,bname,color)
(i) Find the names of sailors who have reserved boat number 103
(ii) Find the names of sailors who have reserved a red boat
Database Management Systems 1 - 33 Relational Databases

(iii) Find the id of sailors with age over 20 who have not reserved red boat
(iv) Find the names of sailors who have reserved at least one boat

Solution :
(i) (sname((bid=103 Reserves) ⋈ Sailors)

(ii) (sname((color=’red’ Boats) ⋈ Reserves ⋈ Sailors)

(iii) (sid((age>20 Sailors) - sid((color=’red’ Boats) ⋈ Reserves)

(iv) (sname(Sailors ⋈ Reserves)


Example 1.12.2 Consider the following expressions, which use the result of a redational
algebra operation as the input to another operation. For each expression explain in words
what the expression does : a) year≥2009(takes) ⋈ Student b) year≥2009(takes ⋈ Student) c) ID, name,
course-id (student ⋈ takes)

Solution :
a. Select each student who takes at least one course in 2009, display the student
information along with the information about what the courses the student took.
b. Select each student who takes at least one course in 2009, display the student
information along with the information about what the courses the student took but
the selection must be before join operation.
c. Display the ID, Name and Course_id of all the students who took any course in the
university.
Example 1.12.3 Consider following relational database
branch(branch_name, branch_city, assets)
customer (customer_name, customer_street, customer_city)
loan (loan_number, branch_name, amount)
borrower (customer_name, loan_number)
account (account_number, branch_name, balance)
depositor (customer_name, account_number)
i) Find the names of all branches located in “Chennai”.
ii) Find the names of all borrowers who have a loan in branch “ABC”.
Solution :
i) branch_name(branch_city =’Chennai’) (branch))

ii) customer_name(branch_name =’ABC’) (borrower ⋈ loan))


Database Management Systems 1 - 34 Relational Databases

Example 1.12.4 author (author_id, first_name, last_name)


author_pub(author_id, pub_id, author_position)
book(book_id, book_title, month, year, editor)
pub(pub_id, title, book_id)
(i) Give the relational algebra expression that returns names of all the authors that are book
editors
(ii) Give the relational algebra expression that returns names of all the authors that are not
book editors
(iii) Write a relational algebra expression that returns the names of all authors who have at
least one publication in the database.
Solution :

i) (first_name,last_name(author author_id = editor ⋈ book)

ii) (first_name,last_name((author_id (author) - editor (book) ) × author)


iii) (first_name,last_name (author × author_pub)
Example 1.12.5 Consider the following schema :
Supplier(sid, sname,address)
Parts(pid, pname, color)
Cataloge(sid,pid,cost)
Write the relational algebraic queries for the following :
i) Find the sids of supplier who supply some red or some green parts
ii) Find the sids of supplier who supply every red or some green parts
iii) Find the pids of parts supplied by at least two different suppliers
Solution :
i) ρ(R1, sid((pid color=’red’ Parts) ⋈ Cataloge))

ρ(R2, sid((pid color=’red’ Parts) ⋈ Cataloge))


R1 ∪ R2
ii) ρ(R1, sid, pidCataloge)/ (pid color=’red’ Parts) )
ρ(R2, sid((pid color=’red’ Parts) ⋈ Cataloge))
R1 ∪ R2
iii) ρ(R1, Cataloge)
ρ(R2, Cataloge)
(R1.pid R1.pid = R2.pid (R1× R2)
^R1.sid!=R2.sid
Database Management Systems 1 - 35 Relational Databases

Example 1.12.6 Consider the relational database


employee (person-name, street, city)
works (person-name, company-name, salary)
company (company-name, city)
manages (person-name, manager-name)
where primary keys are underlined.
(a) Find the names of all employees who work for First Bank Corporation
(b) Find the names, street address, and cities of residence of all employees who work for First
Bank Corporation and earn more than 200,000 per annum.
(c) Find the names of all employees in this database who live in the same city as the company
for which they work.
AU : Dec.-06, Marks 8

Solution :
a)  person-name(company-name = “First Bank Corporation”(works))

b)  person-name, street, city(company-name = “First Bank Corporation”  salary > 200000 (works ⋈ employee))

c)  person-name(works ⋈ employee ⋈ company))

University Questions
1. Explain select, project, cartesian product and join operations in relational algebra with an example
AU : May-18, Marks 13, Dec.-16, Marks 6
2. List operations of relational algebra and purpose of each with example
AU : May-17, Marks 5
3. Differentiate between foreign key constraints and referential Integrity constraints with suitable
example.
AU : Dec.-17, Marks 6
4. Explain various operations in relational algebra with examples
AU : May 03, Marks 10, Dec-07, Marks 8, Dec.- 08, Marks 10, May-14, Marks 16
5. Explain all join operations in relational algebra
AU : May 05, Marks 8
6. Briefly explain relational algebra
AU : May 04, Marks 8
7. What is rename operation in relational algebra ? Illustrate your answer with example
AU : Dec 02, Marks 2
Database Management Systems 1 - 36 Relational Databases

Part III Structured Query Language(SQL)

1.13 SQL Fundamentals AU : Dec.-15

 Structure Query Language(SQL) is a database query language used for storing


and managing data in Relational DBMS.
 Various parts of SQL are –
o Data Definition Language(DDL) : It consists of a set of commands for
defining relation schema, deleting relations, and modifying relation
schemas.
o Data Manipulation Language(DML) : It consists of set of SQL commands for
inserting tuples into relational schema, deleting tuples from or modifying
tuples in databases.
o Integrity : The SQL DDL includes commands for specifying integrity
constraints. These constraints must be satisfied by the databases.
o View definition : The SQL DDL contains the commands for defining views
for database.
o Transaction control : The SQL also includes the set of commands that
indicate beginning and ending of the transactions.
o Embedded SQL and Dynamic SQL : There is a facility of including SQL
commands in the programming languages like C,C++, COBOL or Java.
o Authorization : The SQL DDL includes the commands for specifying access
rights to relations and views.

1.13.1 Data Abstraction


The Basic data types used in SQL are –
(1) char(n): For representing the fixed length character string this data type is used.
For instance – to represent name,designation, coursename, we use this data type.
Instead of char we can also use character. The n is specified by the user.
(2) varchar(n) : The varchar means character varying. That means – for denoting the
variable length character strings this data type is used. The n is user specified
maximum character length.
(3) int : For representing the numeric values without precision, the int data type is
used.
(4) numeric : For representing, a fixed point number with user-specified precision this
data type is used. The number consists of m digits plus sign k digits are to the right
of precision. For instance the numeric(3,2) allows 333.11 but it does not allow
3333.11
Database Management Systems 1 - 37 Relational Databases

(5) smallint : It is used to store small integer value. It allows machine dependent subset
of integer type.
(6) real : It allows the floating point, double precision numbers.
(7) float(n) : For representing the floating point number with precision of at least n
digits this data type is used.

1.13.2 Basic Schema Definition


In this section, we will discuss various SQL commands for creating the schema
definition.
There are three types of SQL Languages -
1. DDL commands : DDL or Data Definition Language actually consists of the SQL
commands that can be used to define the database schema. It simply deals with
descriptions of the database schema and is used to create and modify the structure
of database objects in database.
Examples of DDL commands are :
 CREATE – is used to create the database or its objects such as table, function,
views and so on.
 DROP – is used to delete objects from the database.
 ALTER-is used to alter the structure of the database.
 TRUNCATE–is used to remove all records from a table, including all spaces
allocated for the records are removed.
 COMMENT –is used to add comments to the data dictionary.
 RENAME –is used to rename an object existing in the database.
2. DML commands : DML stands for Data Manipulation Language. These commands
deal with manipulation of data present in the database.
Examples of DML commands are :
 SELECT – is used to retrieve data from the a database.
 INSERT – is used to insert data into a table.
 UPDATE – is used to update existing data within a table.
 DELETE – is used to delete records from a database table.
3. DCL commands : It stands for Data Control Language. It includes commands such
as GRANT and REVOKE which mainly deals with the rights, permissions and other
controls of the database system.
Database Management Systems 1 - 38 Relational Databases

Examples of DCL commands :


 GRANT-gives user’s access privileges to database.
 REVOKE-withdraw user’s access privileges given by using the GRANT
command.
Let us discuss various commonly used SQL commands that help in building the basic
schema.
(1) Create Table : The database relation can be created by using the create table command

Syntax
create table table_name;
Example
create table Student
(RollNo int,
Name varchar(10),
Marks numeric(3,2),
Primary key(RollNo));
The primary key attribute must be non null and unique.
(2) Insert : The insert command is used to insert data into the table. There are two
syntaxes of inserting data into SQL

Syntax
i) Insert into table_name (column1, column2, column3, ...)
values (value1, value2, value3, ...);
ii) insert into table_name
values (value1, value2, value3, ...);
Example
(i) insert into Student(RollNo,Name,Makrs) values(101,’AAA’,56.45)
(ii) insert into Student values(101,’AAA’,56.45)
(3) Delete : This command is used to delete the existing record.

Syntax
delete from table_name
where condition;
Example
Delete from student
where RollNo=10
(4) Alter: The alter table statement is used to add, delete, or modify columns in an
existing table.
The alter table statement is also used to add and drop various constraints on an existing
table.
Database Management Systems 1 - 39 Relational Databases

Syntax for adding a columns


alter table table_name
add column_name datatype;
Example
Alter table student
Add address varchar(20
Syntax for dropping column
Alter table table_name
drop column column_name;
Example
Alter table student
drop column address;

1.13.3 Basic Structure of SQL Queries


The basic form of SQL queries is
SELECT-FROM-WHERE. The syntax is as follows :
SELECT [DISTINCT] target-list
FROM relation-list
WHERE qualification
 SELECT : This is one of the fundamental query command of SQL. It is similar to
the projection operation of relational algebra. It selects the attributes based on the
condition described by WHERE clause.
 FROM : This clause takes a relation name as an argument from which attributes
are to be selected/projected. In case more than one relation names are given, this
clause corresponds to Cartesian product.
 WHERE : This clause defines predicate or conditions, which must match in order
to qualify the attributes to be projected.
 Relation-list : A list of relation names(tables)
 target-list : A list of attributes of relations from relation list(tables)
 qualification : Comparisons of attributes with values or with other attributes
combined using AND, OR and NOT.
 DISTINCT is an optional keyword indicating that the answer should not contain
duplicates. Normally if we write the SQL without DISTINCT operator then it does
not eliminate the duplicates.
Example
SELECT sname
FROM Student
WHERE age>18
 The above query will return names of all the students from student table where
age of each student is greater than 18
Database Management Systems 1 - 40 Relational Databases

1.13.3.1 Queries on Multiple Relations


Many times it is required to access multiple relations(tables) to operate on some
information. For example consider two tables as Student and Reserve.

sid sname age sid isbn day


1 Ram 21 1 005 07-07-18

2 Shyam 18 2 007 03-03-18

3 Seeta 16 3 009

4 Geeta 23

Query : Find the names of students who have reserved the books with book isbn
Select Student.sname,Reserve.isbn
From Student, Reserve
Where Student.sid=Reserve.sid
Use of SQL Join
The SQL Joins clause is used to combine records from two or more tables in a database. A
JOIN is a means for combining fields from two tables by using values common to each.
Example : Consider two tables for using the joins in SQL. Note that cid is common
column in following tables.

Student Reserve

sid cid sname cid cname

1 101 Ram 101 Pune

2 101 Shyam 102 Mumbai

3 102 Seeta 103 Chennai

4 NULL Geeta

1) Inner Join :
 The most important and frequently used of the joins is the INNER JOIN. They are
also known as an EQUIJOIN.
 The INNER JOIN creates a new result table by combining column values of two
tables (Table1 and Table2) based upon the join-predicate.
 The query compares each row of table1 with each row of Table2 to find all pairs of
rows which satisfy the join-predicate.
Database Management Systems 1 - 41 Relational Databases

 When the join-predicate is satisfied, column values for each matched pair of rows
of A and B are combined into a result row. It can be represented as :

 Syntax : The basic syntax of the INNER JOIN is as follows.


SELECT Table1.column1, Table2.column2...
FROM Table1
INNER JOIN Table2
ON Table1.common_field = Table2.common_field;
 Example : For above given two tables namely Student and City, we can apply
inner join. It will return the record that are matching in both tables using the
common column cid. The query will be
SELECT *
FROM Student Inner Join City on Student.cid=City.cid
The result will be

sid cid sname cid cname


1 101 Ram 101 Pune

2 101 Shyam 101 Pune

3 102 Seeta 102 Mumbai

2) Left Join :
 The SQL LEFT JOIN returns all rows from the left table, even if there are no
matches in the right table. This means that if the ON clause matches 0 (zero)
records in the right table; the join will still return a row in the result, but with
NULL in each column from the right table.
 This means that a left join returns all the values from the left table, plus matched
values from the right table or NULL in case of no matching join predicate.
 It can be represented as -

 Syntax : The basic syntax of a LEFT JOIN is as follows.


SELECT
SELECT Table1.column1, Table2.column2...
Database Management Systems 1 - 42 Relational Databases

FROM Table1
LEFT JOIN Table2
ON Table1.common_field = Table2.common_field;
 Example : For above given two tables namely Student and City, we can apply Left
join. It will Return all records from the left table, and the matched records from
the right table using the common column cid. The query will be
SELECT *
FROM Student Left Join City on Student.cid=City.cid
The result will be

sid cid sname cid cname


1 101 Ram 101 Pune

2 101 Shyam 101 Pune

3 102 Seeta 102 Mumbai

4 NULL Geeta NULL NULL

3) Right Join :
 The SQL RIGHT JOIN returns all rows from the right table, even if there are no
matches in the left table.
 This means that if the ON clause matches 0 (zero) records in the left table; the join
will still return a row in the result, but with NULL in each column from the left
table.
 This means that a right join returns all the values from the right table, plus
matched values from the left table or NULL in case of no matching join predicate.
 It can be represented as follows :

 Syntax : The basic syntax of a RIGHT JOIN is as follow -


SELECT Table1.column1, Table2.column2...
FROM Table1
RIGHT JOIN Table2
ON Table1.common_field = Table2.common_field;
 Example : For above given two tables namely Student and City, we can apply
Right join. It will return all records from the right table, and the matched records
from the left table using the common column cid. The query will be
SELECT *
FROM Student Right Join City on Student.cid=City.cid
Database Management Systems 1 - 43 Relational Databases

The result will be –

sid cid sname cid cname


1 101 Ram 101 Pune

2 101 Shyam 101 Pune

3 102 Seeta 102 Mumbai

NULL NULL NULL 103 Chennai

4) Full Join :
 The SQL FULL JOIN combines the results of both left and right outer joins.
 The joined table will contain all records from both the tables and fill in NULLs for
missing matches on either side.
 It can be represented as

 Syntax : The basic syntax of a FULL JOIN is as follows :


SELECT Table1.column1, Table2.column2...
FROM Table1 FULL JOIN Table2 ON Table1.common_field = Table2.common_field;
The result will be -
 Example : For above given two tables namely Student and City, we can apply Full
join. It will return returns rows when there is a match in one of the tables using the
common column cid. The query will be -
SELECT *
FROM Student Full Join City on Student.cid=City.cid
The result will be -

sid cid sname cid cname

1 101 Ram 101 Pune

2 101 Shyam 101 Pune

3 102 Seeta 102 Mumbai

4 NULL Geeta NULL NULL

NULL NULL NULL 103 Chennai


Database Management Systems 1 - 44 Relational Databases

1.13.4 Additional Basic Operations


1) The Rename Operation : The SQL AS is used to assign temporarily a new name to a
table column or table(relation) itself. One reason to rename a relation is to replace a
long relation name with a shortened version that is more convenient to use
elsewhere in the query. For example – “Find the names of students and isbn of
book who reserve the books”.
Student Reserve
sid sname age sid isbn day
1 Ram 21 1 005 07-07-18
2 Shyam 18 2 007 03-03-18
3 Seeta 16 3 009 05-05-18
4 Geeta 23

Select S.sname,R.isbn
From Student as S, Reserve as R
Where S.sid=R.sid
In above case we could shorten the names of tables Student and Reserve as S and R
respectively.
Another reason to rename a relation is a case where we wish to compare tuples in the
same relation. We then need to take the Cartesian product of a relation with itself. For
example –
If the query is – Find the names of students who reserve the book of isbn 005. Then the
SQL statement will be –
Select S.sname,R.isbn
From Student as S, Reserve as R
Where S.sid=R.sid and S.isbn=005
2) Attribute Specification in Select clause : The symbol * is used in select clause to
denote all attributes. For example – To select all the records from Student table we
can write
Select* from Student
3) Ordering the display of tuples : For displaying the records in particular order we
use order by clause.
The general syntax with ORDER BY is
SELECT column_name(s)
FROM table_name
WHERE condition
ORDER BY column_name(s)
 Example : Consider the Student table as follows –
Database Management Systems 1 - 45 Relational Databases

sid sname marks city


1 AAA 60 Pune
2 BBB 70 Mumbai
3 CCC 90 Pune
4 DDD 55 Mumbai
Query : Find the names of students from highest marks to lowest
Select sname
From Student
Order By marks
We can also use the desc for descending order and asc for ascending order. For
example : .
In order to display names of the students in descending order of city – we can specify
Select sname
From Student
Order by city desc;
(4) Where clause Predicate :
(i) The between operator can be used to simplify the where clause which is used to
denote the value be less than or equal to some value and greater than or equal to
some other value. For example – of we want the names of the students whose marks
are between 80 and 90 then SQL statement will be
Select name
From Students
Where marks between 80 and 90;
Similarly we can make use of the comparison operators for various attributes. For
example - If the query is – Find the names of students who reserve the book of isbn 005.
Then the SQL statement will be –
Select sname
From Student , Reserve
Where (Student.sid,Reserve.isbn)=(Reserve.sid,005);
(ii) We can use AND, OR and NOT operators in the Where clause. For filtering the
records based on more than one condition, the AND and OR operators can be used.
The NOT operator is used to demonstrate when the condition is not TRUE.
Consider following sample database – Students database, for applying AND, OR and
NOT operators
sid sname marks city
1 AAA 60 Pune

2 BBB 70 Mumbai

3 CCC 90 Pune

4 DDD 55 Mumbai
Database Management Systems 1 - 46 Relational Databases

Syntax of AND
SELECT column1, column2, ...
FROM table_name
WHERE condition1 AND condition2 AND condition3 ...;
Example : Find the student having name “AAA” and lives in city “Pune”
SELECT *
FROM Students
Where sname=’AAA’ AND city=’Pune’
Output

sid sname marks city


1 AAA 60 Pune

Syntax OR
SELECT column1, column2, ...
FROM table_name
WHERE condition1 OR condition2 OR condition3 ...;
Example : Find the student having name “AAA” OR lives in city “Pune”
SELECT *
FROM Students
Where sname=’AAA’ OR city=’Pune’
Output

sid sname marks city


1 AAA 60 Pune

3 CCC 90 Pune

Syntax NOT
SELECT column1, column2, ..
FROM table_name
WHERE NOT condition
Example : Find the student who do not have city “Pune”
SELECT *
FROM Students
Where NOT city=’Pune’
Output

sid sname marks city


2 BBB 70 Mumbai
4 DDD 55 Mumbai
Database Management Systems 1 - 47 Relational Databases

1.13.5 Domain and Key Constraint


Domain Constraint
 A domain is defined as the set of all unique values permitted for an attribute. For
example, a domain of date is the set of all possible valid dates, a domain of Integer
is all possible whole numbers, and a domain of day-of-week is Monday, Tuesday
... Sunday.
 This in effect is defining rules for a particular attribute. If it is determined that an
attribute is a date then it should be implemented in the database to prevent
invalid dates being entered.
 Domain constraints are user defined data type and we can define them like this :
 Domain constraint = Data type + Constraints
 The constraints can be specified using NOT NULL / UNIQUE / PRIMARY KEY /
FOREIGN KEY / CHECK / DEFAULT.
 For example –
Create domain id_value integer
constraint id_test
check(value > 100);  cheking if stud_id value is greater than 100

create table student (


stu_id id_value PRIMARY KEY,
stu_name CHAR(30),
stu_age integer
);
Key Constraint
 A key constraint is a statement that a certain minimal subset of the fields of a
relation is a unique identifier for a tuple.
 For example - Consider the students relation and the constraint that no two
students have the same student id. This IC is an example of a key constraint.
 The definition of key constraints contain two parts -
o Two distinct tuples in a legal instance (an instance that satisfies all Integrity
Constraints including the key constraint) cannot have identical values in all the
fields of a key.
o No subset of the set of fields in a key is a unique identifier for a tuple.
 The first part of the definition means that, in any legal instance, the values in the
key fields uniquely identify a tuple in the instance. When specifying a key
constraint, the DBA or user must be sure that this constraint will not prevent them
from storing a 'correct' set of tuples. For example, several students may have the
same name, although each student has a unique student id. If the name field is
declared to be a key, the DBMS will not allow the Students relation to contain two
tuples describing different students with the same name.
Database Management Systems 1 - 48 Relational Databases

 The second part of the definition means, for example, that the set of fields
{RollNo, Name} is not a key for Students, because this set properly contains the
key {RollNo}. The set {RollNo, Name} is an example of a superkey, which is a set
of fields that contains a key.
 The key constraint can be specified using SQL as follows -

o In SQL, we can declare that a subset of the columns of a table constitute a key by
using the UNIQUE constraint.
o At most one of these candidate keys can be declared to be a primary key, using
the PRIMARY KEY constraint. For example -
CREATE TABLE Student(RollNo integer,
Name CHAR(20),
age integer,
UNIQUE(Name,age),
CONSTRAINT StudentKey PRIMARY KEY(RollNo))
This definition says that RollNo is a Primary key and Combination of Name and
age is also a key.

1.13.6 String Operations


 For string comparisons, we can use the comparison operators =, <, >,<=,>=,<>
with the ordering of strings determined alphabetically as usual.
 SQL also permits a variety of functions on character strings such as concatenation
suing operator||, extracting substrings, finding length of string, converting
strings to upper case(using function upper(s)) and lowercase(using function
lower(s)), removing spaces at the end of string(using function(trim(s)) and so on.
 Pattern matching can also be performed on strings using two types of special
characters –

o Percent(%): It matches zero, one or multiple characters


o Underscore(_): The _ character matches any single character.
 The percentage and underscore can be used in combinations.
 Patterns are case sensitive. That means upper case characters do not match
lowercase characters or vice versa.
 For instance :

o ‘Data%’ matches any string beginning with “Data”, For instance it could
be with “Database”, “DataMining”,”DataStructure”
o ‘_ _ _’ matches any string of exactly three characters.
o ‘_ _ _ %’matches any string of at least length 3 characters.
Database Management Systems 1 - 49 Relational Databases

 The LIKE clause can be used in WHERE clause to search for specific patterns.
 For example – Consider following Employee Database

EmpID EmpName Department Date_of_Join


1 Sunil Marketing 1-Jan

2 Mohsin Manager 2-Jan

3 Supriya Manager 3-Jan

4 Sonia Accounts 4-Jan

5 Suraj Sales 5-Jan

6. Archana Purchase 6-Jan

(1) Find all the employee with EmpName starting with “s”
SQL Statement:
SELECT * FROM Employee
WHERE EmpName LIKE ‘s%’
Output
EmpID EmpName Department Date_of_Join
1 Sunil Marketing 1-Jan
3 Supriya Manager 3-Jan
4 Sonia Accounts 4-Jan
5 Suraj Sales 5-Jan

(2) Find the names of employee whose name begin with S and end with a
SQL Statement :
SELECT EmpName FROM Employee
WHERE EmpName LIKE ‘S%a’
Output
EmpName
Supriya
Sonia
(3) Find the names of employee whose name begin with S and followed by exactly
four characters
SELECT EmpName FROM Employee
WHERE EmpName LIKE ‘S_ _ _ _ ‘
Database Management Systems 1 - 50 Relational Databases

Output

EmpName
Sunil

Sonia

Suraj

1.13.7 Set Operations


1) UNION : To use this UNION clause, each SELECT statement must have
i) The same number of columns selected
ii) The same number of column expressions
iii) The same data type and
iv) Have them in the same order
This clause is used to combine two tables using UNION operator. It replaces the OR
operator in the query. The union operator eliminates duplicate while the union all query
will retain the duplicates.

Syntax
The basic syntax of a UNION clause is as follows –
SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]
UNION
SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]
Here, the given condition could be any given expression based on your requirement.
Consider Following relations –

Student Reserve
sid sname age sid isbn day
1 Ram 21 1 005 07-07-18

2 Shyam 18 2 005 03-03-17

3 Seeta 16 3 007 08-11-16

4 Geeta 23
Database Management Systems 1 - 51 Relational Databases

Book
isbn bname Author
005 DBMS XYZ

006 OS PQR

007 DAA ABC

Example : Find the names of the students who have reserved the ‘DBMS’ book or ‘OS’
Book
The query can then be written by considering the Student, Reserve and Book table as
SELECT S.sname
FROM Student S, Reserve R, Book B
WHERE S.sid=R.sid AND R.isbn=B.isbn AND B.bname=’DBMS’
UNION
SELECT S.sname
FROM Student S, Reserve R, Book B
WHERE S.sid=R.sid AND R.isbn=B.isbn AND B.bname=’OS’
2) Intersect : The common entries between the two tables can be represented with the
help of Intersect operator. It replaces the AND operator in the query.

Syntax
The basic syntax of a INTERSECT clause is as follows –
SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]
INTERSECT
SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]
Example : Find the students who have reserved both the ‘DBMS’ book and ‘OS’ Book
The query can then be written by considering the Student, Reserve and Book table as
SELECT S.sid, S.sname
FROM Student S, Reserve R, Book B
WHERE S.sid=R.sid AND R.isbn=B.isbn AND B.bname=’DBMS’
INTERSECT
SELECT S.sname
FROM Student S, Reserve R, Book B
WHERE S.sid=R.sid AND R.isbn=B.isbn AND B.bname=’OS’
3) Except : The EXCEPT clause is used to represent the set-difference in the query.
This query is used to represent the entries that are present in one table and not in other.
Database Management Systems 1 - 52 Relational Databases

Syntax :
The basic syntax of a EXCEPT clause is as follows –
SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]
EXCEPT
SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]
Example : Find the students who have reserved both the ‘DBMS’ book but not reserved
‘OS’ Book
The query can then be written by considering the Student, Reserve and Book table as
SELECT S.sid, S.sname
FROM Student S, Reserve R, Book B
WHERE S.sid=R.sid AND R.isbn=B.isbn AND B.bname=’DBMS’
EXCEPT
SELECT S.sname
FROM Student S, Reserve R, Book B
WHERE S.sid=R.sid AND R.isbn=B.isbn AND
B.bname=’OS’

1.13.8 Aggregate Functions


 An aggregate function allows you to perform a calculation on a set of values to
return a single scalar value.
 SQL offers five built-in aggregate functions :
1. Average : avg
2. Minimum : min
3. Maximum : max
4. Total: sum
5. Count :

1.13.8.1 Basic Aggregation


 The aggregate functions that accept an expression parameter can be modified by
the keywords DISTINCT or ALL. If neither is specified, the result is the same as if
ALL were specified.
DISTINCT Modifies the expression to include only distinct
values that are not NULL
ALL Includes all rows where expression is not NULL
Database Management Systems 1 - 53 Relational Databases

 Syntax of all the Aggregate Functions


AVG( [ DISTINCT | ALL ] expression)
COUNT(*)
COUNT( [ DISTINCT | ALL ] expression )
MAX( [ DISTINCT | ALL ] expression)
MIN( [ DISTINCT | ALL ] expression)
SUM( [ DISTINCT | ALL ] expression)
 The avg function is used to compute average value. For example – To compute
average marks of the students we can use
SQL Statement
SELECT AVG(marks)
FROM Students
 The Count function is used to count the total number of values in the specified
field. It works on both numeric and non-numeric data type. COUNT (*) is a
special implementation of the COUNT function that returns the count of all the
rows in a specified table. COUNT (*) also considers Nulls and duplicates. For
example Consider following table
Test
id value
11 100
22 200
33 300
NULL 400
SQL Statement
SELECT COUNT(*)
FROM Test
Output
4
SELECT COUNT(ALL id)
FROM Test
Output
3
 The min function is used to get the minimum value from the specified column.
For example – Consider the above created Test table
SQL Statement
SELECT Min(value)
FROM Test
Output
100
Database Management Systems 1 - 54 Relational Databases

 The max function is used to get the maximum value from the specified column.
For example – Consider the above created Test table
SQL Statement
SELECT Max(value)
FROM Test
Output
400
 The sum function is used to get total sum value from the specified column. For
example – Consider the above created Test table
SQL Statement
SELECT sum(value)
FROM Test
Output
1000

1.13.8.2 Use of Group By and Having Clause


(i) Group By :
 The GROUP BY clause is a SQL command that is used to group rows that have the
same values.
 The GROUP BY clause is used in the SELECT statement.
 Optionally it is used in conjunction with aggregate functions.
 The queries that contain the GROUP BY clause are called grouped queries
 This query returns a single row for every grouped item.
 Syntax :
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
The general syntax with ORDER BY is
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s)
 Example : Consider the Student table as follows -

sid sname marks city


1 AAA 60 Pune
2 BBB 70 Mumbai
3 CCC 90 Pune
4 DDD 55 Mumbai
Database Management Systems 1 - 55 Relational Databases

Query : Find the total marks of each student in each city


SELECT SUM(marks), city
FROM Student
GROUP BY city
Output
SUM(marks) city
150 Pune
125 Mumbai
(ii) Having :
 HAVING filters records that work on summarized GROUP BY results.
 HAVING applies to summarized group records, whereas WHERE applies to
individual records.
 Only the groups that meet the HAVING criteria will be returned.
 HAVING requires that a GROUP BY clause is present.
 WHERE and HAVING can be in the same query.
 Syntax :
SELECT column-names
FROM table-name
WHERE condition
GROUP BY column-names
HAVING condition
 Example : Consider the Student table as follows -
sid sname marks city

1 AAA 60 Pune

2 BBB 70 Mumbai

3 CCC 90 Pune

4 DDD 55 Mumbai

5 EEE 84 Chennai

Query : Find the total marks of each student in the city named ‘Pune’ and ‘Mumbai’ only
SELECT SUM(marks), city
FROM Student
GROUP BY city
HAVING city IN(‘Pune’,’Mumbai’)
Database Management Systems 1 - 56 Relational Databases

Output
 The result will be as follows –
SUM(marks) city
150 Pune
125 Mumbai

1.13.9 Nested Queries


In nested queries, a query is written inside a query. The result of inner query is used
in execution of outer query.
There are two types of nested queries :
i) Independent Query :
 In independent nested queries, query execution starts from innermost query
to outermost queries.
 The execution of inner query is independent of outer query, but the result of
inner query is used in execution of outer query.
 Various operators like IN, NOT IN, ANY, ALL etc are used in writing
independent nested queries.
 For example - Consider three tables namely Student, City and Student_City
as follows -
Student City
sid sname phone cid cname
1 Ram 1111 101 Pune

2 Shyam 2222 102 Mumbai

3 Seeta 3333 103 Chennai

4 Geeta 4444

Student_City
sid cid
1 101
1 103
2 101
3 102
4 102
4 103
Database Management Systems 1 - 57 Relational Databases

 Example 1 - If we want to find out sid who live in city ‘Pune’ or ‘Chennai’.
We can then write independent nested query using IN operator. Here we can
use the IN operator allows you to specify multiple values in a WHERE
clause. The IN operator is a shorthand for multiple OR conditions.

Step 1 : Find cid for cname=’Pune’ or ‘Chennai’. The query will be


SELECT cid
FROM City
WHERE cname=’Pune’ or ‘Chennai’

Step 2 : Using cid obtained in step 1 we can find the sid. The query will be
SELECT sid
FROM Student_City
WHERE cid IN
(SELECT cid FROM City WHERE cname=’Pune’ or cname=’Chennai’)
The inner query will return a set with members 101 and 103 and outer query will return
those sid for which cid is equal to any member of set (101 and 103 in this case). So, it will
return 1, 2 and 4.
 Example 2 : If we want to find out sname who live in city ‘Pune’ or ‘Chennai’.
SELECT sname FROM Student WHERE sid IN
(SELECT sid FROM Student_City WHERE cid IN
(SELECT cid FROM City WHERE cname=’Pune’ or cname=’Chennai’))

ii) Co-related Query :


In co-related nested queries, the output of inner query depends on the row which is
being currently executed in outer query. For example
If we want to find out sname of Student who live in city with cid as 101, it can be done
with the help of co-related nested query as :
SELECT sname FROM Student S WHERE EXISTS
(SELECT * FROM Student_City SC WHERE S.sid=SC.sid and SC.cid=101)
Here For each row of Student S, it will find the rows from Student_City where
S.sid = SC.sid and SC.cid=101.
If for a sid from Student S, atleast a row exists in Student_City SC with cid=101, then
inner query will return true and corresponding sid will be returned as output.

1.13.10 Modification of Databases


The modification of database is an operation for making changes in the existing
databases. Various operations of modification of database are – insertion, deletion and
updation of databases.
1. Deletion : The delete command is used to delete the existing record.
Database Management Systems 1 - 58 Relational Databases

Syntax
delete from table_name
where condition;
Example
delete from student
where RollNo=10
2. Insertion : The insert command is used to insert data into the table. There are two
syntaxes of inserting data into SQL

Syntax
(i) Insert into table_name (column1, column2, column3, ...)
values (value1, value2, value3, ...);
(ii) insert into table_name
values (value1, value2, value3, ...);
Example
(i) insert into Student(RollNo,Name,Makrs) values(101,’AAA’,56.45)
(ii) insert into Student values(101,’AAA’,56.45)
3. Update : The update statement is used to modify the existing records in the table.
update table_name
set column1=value1, column2=value2,…
where condition;
Example:
Delete student
Set Name=’WWW’
where RollNo=101
Example 1.13.1 Write the DDL, DML, DCL for the students database. Which contains
student details:name, id,DOB, branch, DOJ.
Course details : Course name, Course id, Stud.id,Faculty name, id, marks
AU : Dec.-17, Marks 15
Solution :

DDL Commands
CREATE TABLE Student
(
stud_name varchar(20),
stud_id int(3),
DOB varchar(15),
branch varchar(10),
DOJ varchar(15),
);
CREATE TABLE Course
(
Database Management Systems 1 - 59 Relational Databases

course_name varchar(20),
course_id int(5),
stud_id int(3),
facult_name varchar(20),
faculty_id varchar(5),
marks real
);
DML Commands
The commands which we will use here are insert and select. The insert command is
used to insert the values into database tables. Using the select command, the database
values can be displayed.
(1) Inserting values into Student table
insert into Student(stud_name,stud_id,DOB,branch,DOJ)
values(’AAA’,11,’01-10-1999’ , ’computers’,’5-3-2018’)

insert into Student(stud_name,stud_id,DOB,branch,DOJ)


values(’BBB’,12,’24-5-1988’ , ’Mechanical’,’17-2-2016’)

insert into Student(stud_name,stud_id,DOB,branch,DOJ)


values(’CCC’,13,’8-1-1990’ , ’Electrical’,’22-9-2017’)
(2) Inserting values into Course table
insert into Course(course_name,course_id,stud_id,faculty_name,faculty_id,marks)
values(’Basic’,101,11,’Archana’ ,’F001’,’50’)

insert into Course(course_name,course_id,stud_id,faculty_name,faculty_id,marks)


values(’Intermediate’,102,12,’Rupali’ ,’F002’,’70’)

insert into Course(course_name,course_id,stud_id,faculty_name,faculty_id,marks)


values(’Advanced’,103,13,’Sunil’ ,’F003’,’100’)
(3) Displaying records of Student table
Select * from Student;
(4) Displaying records of Course table
Select * from Course;
DCL Commands
Database Management Systems 1 - 60 Relational Databases

The DCL command is used to control privileges in Database. To perform any operation
in the database, such as for creating tables, sequences or views, a user needs privileges.
We will use the command GRANT.
To allow a user to create tables in the database, we can use the below command,
Grant create table to user1;
Example 1.13.2 C Write the following queries in relational algebra and SQL
(i) Find the names of employee who have borrowed a book published by McGraw Hill
(ii) Find the names of employees who have borrowed all books published by McGraw-Hill
AU : May 17, Marks 10

Solution :
We will assume the databases as –
member(memb_no, name, dob)
books(isbn, title, authors, publisher)
borrowed(memb_no, isbn, date)
(i) Relational Algebra :
name((publisher=’McGraw Hill’ books) ⋈ borrowed ⋈ member)
SQL :
SELECT name
FROM member
WHERE meber.memb_no=borrowed.memb_no
AND books.isbn=borrowed.isbn
AND books.publisher=’McGraw Hill’;
(ii) Relational Algebra
 (Tempname,( memb_no,isbn borrowed)/ isbn (publisher=’McGraw Hill’ books)))
name(Tempname ⋈ member)
SQL :
SELECT distinct M.name
FROM Member M,
WHERE NOT EXIST
(
(SELECT isbn
FROM books
WHERE publisher = ’McGrawHill’
)
EXCEPT
(SELECT isbn
FROM borrowed R
Database Management Systems 1 - 61 Relational Databases

WHERE R.memb_no = M.memb_no


)
)

Example 1.13.3 Assume the following table.


Degree (degcode, name, subject)
Candidate (seatno, degcode, name, semester, month, year, result)
Marks (seatno, degcode, semester, month, year, papcode, marks)
[degcode – degree code, name – name of the degree (Eg. MSc.), subject – subject of the course
(Eg. Physis), papcode – paper code (Eg. A1)]
Solve the following queries using SQL;
Write a SELECT statement to display,
(i) all the degree codes which are there in the candidate table but not present in degree table in
the order of degcode.
(ii) the name of all the candidates who have got less than 40 marks in exactly 2 subjects.
(iii) the name, subject and number of candidates for all degrees in which there are less than 5
candidates.
(iv) the names of all the candidate who have got highest total marks in MSc. Maths.
AU : Dec.-15, Marks 4 + 4 + 4 + 4
Solution :
(i) SELECT C.degcode
FROM Candidate C,
WHERE NOT EXISTS
(SELECT D.degcode
FROM Degree D
WHERE D.degcode=C.degcode)
ORDER by C.degcode
(ii) SELECT C.name
FROM Candidate C, Degree D, Marks M
WHERE
C.seatno=M.seatno AND C.degcode=D.degcode AND C.degcode=M.degcode AND
M.marks<40
GROUP BY C.seatno
HAVING count(D.subject)=2;
(iii) SELECT D.name,D.subject,count(*)
FROM degree D, Candidate C
WHERE D.degcode=C.degcode
HAVING( SELECT count(*) FROM Candidate <5);
(iv)SELECT C.name
FROM Candidate C, Degree D, Marks M
WHERE
Database Management Systems 1 - 62 Relational Databases

D.degname=’MSc’ AND D.subject=’Maths’ AND C.degcode=D.degcode AND


C.seatno=M.seatno AND
M.marks= (SELECT max(M.marks) FROM Marks M)
Example 1.13.4 Consider a student registration database comprising of the below given table
schema.
Student File
Student Number Student Name Address Telephone
Course File
Course Number Description Hours Professor Number
Professor File
Professor Number Name Office
Registration File
Student Number Course Number Date
Consider a suitable sample of tuples / records for the above mentioned tables and
write DML statements (SQL) to answer for the queries listed below.
i) Which courses does a specific professor teach ?
ii) What courses are taught by two specific professors ?
iii) Who teaches a specific course and where is his/her office ?
iv) For a specific student number, in which courses is the student registered and what
is his/her name ?
v) Who are the professors for a specific student ?
vi) Who are the students registered in a specific course ?
AU : May 15, Marks 16

Solution :
(i)
SELECT P.name,C.description
FROM Professor P, Course C
WHERE P.ProfessorNumber=C.ProfessorNumber
HAVING count(DISTINCT P.name)=2
(ii)
SELECT P.name,C.description
FROM Professor P, Course C
WHERE P.ProfessorNumber=C.ProfessorNumber
(iii)
SELECT P.name,P.office, C.description
FROM Professor P, Course C
WHERE P.ProfessorNumber=C.ProfessorNumber
Database Management Systems 1 - 63 Relational Databases

(iv)
SELECT S.StudentNumber,S.StudentNumber,C.Description
FROM Student S, Course C, Registration R
WHERE S.StudentNumber=R.StudentNumber AND C.CourseNumber=R.CourseNumber
(v)
SELECT S.StudentName, P.Name
FROM Student S, Course C, Professor P, Registration R
WHERE C.ProfessorNumber=P.ProfessorNumber
AND C.CourseNumber=R.CourseNumber
AND S.StudentNumber=R.StudentNumber
GROUP BY P.ProfessorNumber
(vi)
SELECT S.StudentName, C.Description
FROM Student S, Course C, Registration R
WHERE S.StudentNumber=R.StudentNumber
AND R.CourseNumber=C.CourseNumber
GROUP BY C.CourseNumber

University Questions
1. Explain aggregate functions in SQL with example. AU : May 18, Marks 13

2. Write DDL, DML,DCL commands for the students database. AU : Dec 17, Marks 7

3. Explain about SQL fundamentals. AU : May 16, Marks 8

4. Explain about Data Definition Language. AU : May 16, Marks 8

5. Explain the six clauses in the syntax of SQL query and show what type of constructs can be specified in
each of the six clauses. Which of the six clauses are required and which are optional.
AU : Dec 15, Marks 16
6. Explain- DDL and DML AU : Dec 14, Marks 8

1.14 Advanced SQL Features

1.14.1 Embedded SQL


 The programing module in which the SQL Statements are embedded is called
Embedded SQL module.
 It is possible to embed SQL statements inside the programming language such as C,
C++, PASCAL,Java and so on.
 It allows the application languages to communicate with DB and get requested result.

 The high level languages which supports embedding SQLs within it are also known as
host language.
Database Management Systems 1 - 64 Relational Databases

 An embedded SQL program must be processed by a special preprocessor prior to


compilation. The preprocessor replaces embedded SQL requests with host-language
declarations and procedure calls that allow runtime execution of the database accesses.
Then, the resulting program is compiled by the host-language compiler. This is the
main distinction between embedded SQL and JDBC or ODBC.
Example of Embedded SQL – Following program prompts the user for an order
number, retrieves the customer number, salesperson, and status of the order, and displays
the retrieved information on the screen.
int main() {
EXEC SQL INCLUDE SQLCA;
EXEC SQL BEGIN DECLARE SECTION;
int OrderID; /* Employee ID (from user) */
int CustID; /* Retrieved customer ID */
char SalesPerson[10] /* Retrieved salesperson name */
char Status[6] /* Retrieved order status */
EXEC SQL END DECLARE SECTION;

/* Set up error processing */


EXEC SQL WHENEVER SQLERROR GOTO query_error;
EXEC SQL WHENEVER NOT FOUND GOTO bad_number;

/* Prompt the user for order number */


printf ("Enter order number: ");
scanf_s("%d", &OrderID);

/* Execute the SQL query */


EXEC SQL SELECT CustID, SalesPerson, Status
FROM Orders
WHERE OrderID = :OrderID
INTO :CustID, :SalesPerson, :Status;

/* Display the results */


printf ("Customer number: %d\n", CustID);
printf ("Salesperson: %s\n", SalesPerson);
printf ("Status: %s\n", Status);
exit();

query_error:
printf ("SQL error: %ld\n", sqlca->sqlcode);
exit();

bad_number:
printf ("Invalid order number.\n");
exit();
}
Database Management Systems 1 - 65 Relational Databases

Features of Embedded SQL


(1) It is easy to use.
(2) It is ANSI/ISO standard programming language.
(3) It requires less coding
(4) The precompiler can optimize execution time by generating stored procedures for
the Embedded SQL statements.
(5) It is identical over different host languages, hence writing applications using
different programming languages is quite easy.

University Questions
1. What is the need of embedded SQL. AU : May 17, Dec 17, Marks 2

2. What is embedded SQL ? Give an example AU : Dec 16, Marks 5, May-14, Dec 14, Marks 8

1.15 Dynamic SQL AU : May-17, Dec.-17

 Dynamic SQL is a programming technique which allows to build the SQL statements
dynamically at runtime.
 Dynamic SQL statements are not embedded in the source program but stored as strings
of characters that are manipulated during a program's runtime.
 These SQL statements are either entered by a programmer or automatically generated
by the program.
 Dynamic SQL statements also may change from one execution to the next without
manual intervention.
 Dynamic SQL facilitates automatic generation and manipulation of program modules
for efficient automated repeating task preparation and performance.
 Dynamic SQL facilitates the development of powerful applications with the ability to
create database objects for manipulation according to user input.
 The simplest way to execute a dynamic SQL statement is with an EXECUTE
IMMEDIATE statement. This statement passes the SQL statement to the DBMS for
compilation and execution.
Example 1.15.1 Consider the relation student(Reg.No.,name,mark, and grade). Write
embedded dynamic SQL program in C language to retrieve all the students’ records whose
mark is more than 90. AU : May 17, Marks 11, Dec 17, Marks 6
Solution :
int main() {
/* Begin program */
EXEC SQL INCLUDE SQLCA;
Database Management Systems 1 - 66 Relational Databases

EXEC SQL BEGIN DECLARE SECTION


int Reg_No;
char name[10][10];
float marks;
char grade;

EXEC SQL END DECLARE SECTION


EXEC SQL WHENEVER SQLERROR STOP
EXEC SQL SELECT Reg_No,name,marks,grade
FROM Student
WHERE marks>90
INTO :Reg_No,:name,:marks,:grade;
/* Display the results */
printf ("Registration number: %d\n", Reg_No);
printf ("Name: %s\n", name);
printf ("Marks: %f\n", marks);
printf ("Grade: %c\n", grade);
exit();
EXEC SQL DISCONNECT
/* End program */
}

1.16 Two Marks Questions with Answers


Q.1 What is Database Management System ? Why do we need a DBMS ?
AU : May 05,Dec - 08

Ans. :
 A Database Management System (DBMS) is collection of interrelated data and various
programs that are used to handle the data.
 The primary goal of DBMS is to provide a way to store and retrieve the required
information from the database in convenient and efficient manner.

Q.2 What is the purpose of database management system ? AU : Dec 14

Ans. : The purpose of database management system is –

 Define the structure for storage of information.

 Provide mechanism for manipulation of information.

 In addition, the database systems must ensure the safety of information stored.

Q.3 List any two advantages of database systems AU : Dec 07

Ans. : Following are the advantages of DBMS -


Database Management Systems 1 - 67 Relational Databases

1) DBMS removes the data redundancy that means there is no duplication of data in
database.
2) DBMS allows to retrieve the desired data in required format.
3) Data can be isolated in separate tables for convenient and efficient use.
4) Data can be accessed efficiently using a simple query language.

Q.4 Define data abstraction AU : May 05

Ans. : Data abstraction means retrieving only required amount of information of the
system and hiding background details.

Q.5 What are three levels of data abstraction ? AU : Dec 02, 04,May 14, Dec 17

Ans. : The three levels of data abstraction are –

1. Physical Level
2. Logical Level
3. View Level

Q.6 Is it possible for several attributes to have same domain ? Illustrate your answer
with suitable example AU : Dec 04, Dec 15

Ans. : A domain is the set of legal values that can be assigned to an attribute. Each
attribute in a database must have a well-defined domain; we can’t mix values from
different domains in the same attribute. Hence it is not possible for several attributes to
have same domain.
For example - Student domain has attributes RollNo, Name, Address. Similarly
Employee domain has EmpID, Ename,Salary,Address. We can not define the same
domain for defining several attributes.

Q.7 Write the characteristic that distinguish the database approach with File based
approach AU : May 15, Dec 16

OR What are main differences between file processing system and a DBMS ?
AU : May 06, Dec 06

Ans. : Refer section 1.2

Q.8 Discuss briefly three major disadvantages of keeping organizational information in


a file processing system AU : Dec 04, May 16

Ans. : Refer Section 1.2

Q.9 What is data model ? AU : Dec 11


Database Management Systems 1 - 68 Relational Databases

Ans. :
 It is a collection of conceptual tools for describing data, relationships among data,
semantics (meaning) of data and constraints.
 Data model is a structure below the database.

Q.10 What are different types of data models ? AU : May 12

Ans. : Various types of data models are –

(1) Relational Data Model (2) Entity Relational Data Model


(3) Object Based Data Model (4) Semi-structured Data Model

Q.11 Name the categories of SQL commands AU : May 12

Ans. : The categories of SQL commands are –

(1) Data Definition Language (DDL)


(2) Data Manipulation Language (DML)
(3) Data Control Language (DCL)

Q.12 What is data definition language ? Give example AU : Dec 16, May 18

Ans. :
 Data Definition Language (DDL) is a specialized language used to specify a database
schema by a set of definitions.
 It is a language which is used for creating and modifying the structures of tables,
views, indexes and so on.
 Some of the common commands used in DDL are -CREATE, ALTER, DROP.

Q.13 Give brief description of DCL command AU : Dec 14

Ans. : DCL stands for Data Control Language. It includes commands such as GRANT
and REVOKE which mainly deals with the rights, permissions and other controls of
the database system.

Q.14 Define the term tuple AU : Dec 05

Ans. : Tuple means a row present in the table

Q.15 Why does SQL allow duplicate tuples in a table or in a query result ? AU : Dec 15

Ans. :
 Data can be the same. Two people may have the same name. Since SQL is a database
where you store your data and data can be duplicate.
Database Management Systems 1 - 69 Relational Databases

 But we can apply primary key constraints, Unique constraints or Distinct keyword to
identify the record uniquely

Q.16 Why key is essential? Write the different types of keys AU : Dec 04

Ans. :
 Keys are used to specify the tuples distinctly in the given relation.

 Various types of keys used in relational model are – Superkey, Candidate Keys,
primary keys, foreign keys.

Q.17 Define primary key. Give example. AU : May 09

Ans. :
 The primary key is a candidate key chosen by the database designer to identify the
tuple in the relation uniquely.
 For example – Consider a Student database as Student (RollNo,Name,Address). The
primary key for this database is RollNo.The primary is underlined.

Q.18 Define foreign key. Give example AU : May 18

Ans. :
 Foreign key is a single attribute or collection of attributes in one table that refers to the
primary key of other table.
 For example - Consider a Student database as Student (RollNo,Name,Address) and
Course(CourseId, CourseName, RollNo). Here RollNo is a foreign key

Q.19 What is the difference between primary key and foreign key ? AU : Dec 05

Ans. :
Primary Key Foreign Key
Primary key is a column or a set of Foreign key is a column or a set of
columns that can be used to uniquely columns that refer to a primary key or a
identify a row in a table candidate key of another table.

A table can have a single primary key, A table can have multiple foreign keys
that can reference different tables.

Q.20 What is referential integrity ? AU : May 04,08

Ans. :
 The referential integrity rule states that “whenever a foreign key value is used it
must reference a valid, existing primary key in the parent table”.
Database Management Systems 1 - 70 Relational Databases

 Example : Consider the situation where you have two tables : Employees and
Managers. The Employees table has a foreign key attribute entitled ManagedBy,
which points to the record for each employee’s manager in the Managers table.

Q.21 What is domain integrity? Give example AU : Dec 08

Ans. : Domain integrity ensures that all the data items in a column fall within a
defined set of valid values. Each column in a table has a defined set of values, such as the
set of all numbers for zip (five-digit), the set of all character strings for name.

Q.22 What are different types of integrity constraints used in designing relational
databases
AU : Dec 07

Ans. : Different types of integrity constraints are –

(1) Entity Integrity Constraint


(2) Referential Integrity Constraint
(3) Domain Integrity Constraint
(4) Key Integrity Constraint

Q.23 List the reasons why null value might be introduced into the database AU : May 06

Ans. : NULL is a special value provided by database in two cases – i) When field
values of some tuples are unknown(For e.g. city name is not assigned) and
ii) inapplicable(For e.g. middle name is not present).

Q.24 List various operators used in relational algebra AU : May 06

Ans. : Various operators used in Relational algebra are –

(1) Selection Operator(σ)


(2) Projection Operator(∏)
(3) Cartesian Product()
(4) Rename Operator()

Q.25 Describe briefly any two undesirable properties that a database design may have ?
AU : Dec 02

Ans. : The two undesirable properties that a database design may have –

(1) Repetition of data


(2) In-ability of representation of certain information in database.
Database Management Systems 1 - 71 Relational Databases

Q.26 Specify with suitable examples, the different types of keys used in database
management system. AU : Dec 02

Ans. : Refer section 1.9

Q.27 Define data independence. AU : May 08

Ans. : Data independence is an ability by which one can change the data at one level
without affecting the data at another level. Here level can be physical, conceptual or
external.

Q.28 Distinguish between Physical and logical data independence AU : May 03

Ans. : Refer Section 1.6

Q.29 What is meant by instance and Schema of the database AU : May 04, Dec 05

Ans. :

 When information is inserted or deleted from the database then the database gets
changed. The collection of information at particular moment is called instances.
 The overall design of the database is called schema

Q.30 Differentiate between Dynamic SQL and Static SQL


AU : Dec 14,May 15, Dec 15, Dec 16, Dec 17

Ans:
Sr.No. Static SQL Dynamic SQL

1 SQL statements are compiled at SQL statements are compiled at


compile time. run time.
2 It is more efficient. It is less efficient.

3 It is less flexible. It is more flexible.

4 It is used in the situations where It is used in situations where data


data is distributed uniformly is distributed non uniformly.


Database Management Systems 1 - 72 Relational Databases

Notes
UNIT - II

Syllabus
Entity-Relationship model - E-R Diagrams - Enhanced-ER Model - ER-to-Relational Mapping -
Functional Dependencies - Non-loss Decomposition - First, Second, Third Normal Forms,
Dependency Preservation - Boyce/Codd Normal Form - Multi-valued Dependencies and Fourth
Normal Form - Join Dependencies and Fifth Normal Form.

Contents
2.1 Introduction to Entity Relationship Model
2.2 Mapping Cardinality
2.3 ER Diagrams
2.4 Enhanced ER Model
2.5 Examples based on ER Diagram
2.6 ER to Relational Mapping ................................. May-17, .............................. Marks 13
2.7 Concept of Relational Database Design
2.8 Functional Dependencies
2.9 Concept of Redundancy and Anomalies
2.10 Decomposition ................................................... Dec.-17, ................................ Marks 7
2.11 Normal Forms ........................................................ Dec.-14, 15, May-18 ........... Marks 16
2.12 Boyce / Codd Normal Form (BCNF)
2.13 Multivalued Dependencies and Fourth Normal Form May-14, Dec.-16 ................ Marks 16
2.14 Join Dependencies and Fifth Normal Form
2.15 Two Marks Questions with Answers

(2 - 1)
Database Management Systems 2-2 Database Design

Part I Entity Relationship Model

2.1 Introduction to Entity Relationship Model


Entity Relational model is a model for identifying entities to be represented in the
database and representation of how those entities are related.
Let us first understand the design process of database design.

2.1.1 Design Phases


Following are the six steps of database design process. The ER model is most relevant
to first three steps

Fig. 2.1.1 : Database design process

Step 1 : Requirement analysis :


 In this step, it is necessary to understand what data need to be stored in the
database, what applications must be built, what are all those operations that are
frequently used by the system.
 The requirement analysis is an informal process and it requires proper
communication with user groups.
 There are several methods for organizing and presenting information gathered in
this step.
 Some automated tools can also be used for this purpose.

Step 2 : Conceptual database design :


 This is a steps in which E-R Model i.e. Entity Relationship model is built.
 E-R model is a high level data model used in database design.
 The goal of this design is to create a simple description of data that matches with
the requirements of users.

Step 3 : Logical database design :


 This is a step in which ER model in converted to relational database schema,
sometimes called as the logical schema in the relational data model.
Database Management Systems 2-3 Database Design

Step 4 : Schema refinement :


 In this step, relational database schema is analyzed to identify the potential
problems and to refine it.
 The schema refinement can be done with the help of normalizing and
restructuring the relations.

Step 5 : Physical database design :


 In this step, the design of database is refined further.
 The tasks that are performed in this step are - building indexes on tables and
clustering tables, redesigning some parts of schema obtained from earlier design
steps.

Step 6 : Application and security design :


 Using design methodologies like UML(Unified Modeling Language) the design of
the database can be accomplished.
 The role of each entity in every process must be reflected in the application task.
 For each role, there must be the provision for accessing the some part of database
and prohibition of access to some other part of database.
 Thus some access rules must be enforced on the application(which is accessing
the database) to protect the security features.

2.1.2 ER Model
The ER data model specifies enterprise schema that represents the overall logical
structure of a database.
The E-R model is very useful in mapping the meanings and interactions of real-world
entities onto a conceptual schema.
The ER model consists of three basic concepts –

1) Entity Sets
 Entity : An entity is an object that exists and is distinguishable from other objects.
For example - Student named “Poonam” is an entity and can be identified by her
name. The entity can be concrete or abstract. The concrete entity can be - Person,
Book, Bank. The abstract entity can be like - holiday, concept entity is represented
as a box.
Student Employee Department
 Entity set : The entity set is a set of entities of the same types. For example - All
students studying in class X of the School. The entity set need not be disjoint. Each
entity in entity set have the same set of attributes and the set of attributes will
Database Management Systems 2-4 Database Design

distinguish it from other entity sets. No other entity set will have exactly the same
set of attributes.

2) Relationship Sets
Relationship is an association among two or more entities.
The relationship set is a collection of similar relationships. For example - Following
Fig. 2.1.2 shows the relationship works_for for the two entities Employee and
Departments.

Fig. 2.1.2 : Relation set

The association between entity sets is called as participation. that is, the entity sets E1,
E2, . . . , En participate in relationship set R.
The function that an entity plays in a relationship is called that entity’s role.

3) Attributes
Attributes define the properties of a data object of entity. For example if student is an
entity, his ID, name, address, date of birth, class are its attributes. The attributes help
in determining the unique entity. Refer Fig. 2.1.3 for Student entity set with attributes
- ID, name, address. Note that entity is shown by rectangular box and attributes are
shown in oval. The primary key is underlined.

Fig. 2.1.3 : Student entity set with attributes

Types of Attributes

1) Simple and Composite Attributes :


1) Simple attributes are attributes that are drawn from the atomic value domains
For example - Name = {Parth} ; Age = {23}
Database Management Systems 2-5 Database Design

2) Composite attributes: Attributes that consist of a hierarchy of attributes


For example - Address may consists of “Number”, “Street” and “Suburb”
→ Address = {59 + ‘JM Road’ + ‘ShivajiNagar’}

2) Single valued and multivalued :


 There are some attributes that can be represented using a single value. For
example - StudentID attribute for a Student is specific only one studentID.
 Multivalued attributes : Attributes that have a set of values for each entity. It is
represented by concentric ovals
For example - Degrees of a person: ‘ BSc’ , ‘MTech’, ‘PhD’

3) Derived attribute :
Derived attributes are the attributes that contain values that are calculated from other
attributes. To represent derived attribute there is dotted ellipse inside the solid ellipse. For
example –Age can be derived from attribute DateOfBirth. In this situation, DateOfBirth
might be called Stored Attribute.

Fig. 2.1.4
Database Management Systems 2-6 Database Design

2.2 Mapping Cardinality


Mapping Cardinality represents the number of entities to which another entity can be
associated via a relationship set.
The mapping cardinalities are used in representing the binary relationship sets.
Various types of mapping cardinalities are -
1) One to One : An entity A is associated with at least one entity on B and an entity B
is associated with at one entity on A. This can be represented as

2) One to Many : An entity in A is associated with any number of entities in B. An


entity in B, however, can be associated with at most one entity in A.

3) Many to One : An entity in A is associated with at most one entity in B. An entity in


B, however, can be associated with any number of entities in A.
Database Management Systems 2-7 Database Design

4) Many to many : An entity in A is associated with any number (zero or more) of


entities in B, and an entity in B is associated with any number (zero or more) of
entities in A.

2.3 ER Diagrams
An E-R diagram can express the overall logical structure of a database graphically.E-R
diagrams are used to model real-world objects like a person, a car, a company and the
relation between these real-world objects.

Features of ER model
i) E-R diagrams are used to represent E-R model in a database, which makes them
easy to be converted into relations (tables).
ii) E-R diagrams provide the purpose of real-world modeling of objects which makes
them intently useful.
iii) E-R diagrams require no technical knowledge and no hardware support.
iv) These diagrams are very easy to understand and easy to create even by a naive user.
v) It gives a standard solution of visualizing the data logically.

Various Components used in ER Model are -


Component Symbol Example
Entity : Any real-world
object can be represented
as an entity about which
data can be stored in a
database. All the real
world objects like a book,
an organization, a product,
a car, a person are the
examples of an entity.
Database Management Systems 2-8 Database Design

Relationship : Rhombus is
used to setup relationships
between two or more
entities.

Attribute : Each entity has


a set of properties. These
properties of each entity
are termed as attributes.
For example, a car entity
would be described by
attributes such as price,
registration number, model
number, color etc

Derived attribute :
Derived attributes are
those which are derived
based on other attributes,
for example, age can be
derived from date of birth.

To represent a derived
attribute, another dotted
ellipse is created inside the
main ellipse

Multivalued attribute : An
attribute that can hold
multiple values is known
as multivalued attribute.
We represent it with
double ellipses in an E-R
Diagram. E.g. A person can
have more than one phone
numbers so the phone
number attribute is
multivalued.
Database Management Systems 2-9 Database Design

Total participation : Each


entity is involved in the
relationship. Total
participation is represented
by double lines.

2.3.1 Mapping Cardinality Representation using ER Diagram


There are four types of relationships that are considered for key constraints.
i) One to one relation : When entity A is associated with at the most one entity B then
it shares one to one relation. For example - There is one project manager who
manages only one project.

ii) One to many : When entity A is associated with more than one entities at a time
then there is one to many relation. For example - One customer places order at a
time.

ii) Many to one : When more than one entities are associated with only one entity then
there is is many to one relation. For example - Many student take a
ComputerSciCourse.

Alternate representation can be


Database Management Systems 2 - 10 Database Design

iii) Many to many : When more than one entities are associated with more than one
entities. For example -Many teachers can teach many students.

Alternate representation can be

2.3.2 Ternary Relationship


The relationship in which three entities are involved is called ternary relationship. For
example -

2.3.3 Binary and Ternary Relationships


 Although binary relationships seem natural to most of us, in reality it is
sometimes necessary to connect three or more entities. If a relationship connects
three entities, it is called ternary or "3-ary."
 Ternary relationships are required when binary relationships are not sufficient to
accurately describe the semantics of an association among three entities.
 For example - Suppose, you have a database for a company that contains the
entities, PRODUCT, SUPPLIER, and CUSTOMER. The usual relationships might
be PRODUCT/ SUPPLIER where the company buys products from a supplier - a
normal binary relationship. The intersection attribute for PRODUCT/SUPPLIER is
wholesale_price
Database Management Systems 2 - 11 Database Design

Fig. 2.3.1 : A binary relationship of PRODUCT and

SUPPLIER and an intersection attribute, wholesale_price

 Now consider the CUSTOMER entity, and that the customer buys products. If all
customers pay the same price for a product, regardless of supplier, then you have
a simple binary relationship between CUSTOMER and PRODUCT. For the
CUSTOMER/ PRODUCT relationship, the intersection attribute is retail_price.

Fig. 2.3.2 : A binary relationship of PRODUCT and CUSTOMER

and an Intersection attribute, retail_price

 Single ternary relation : Now consider a different scenario. Suppose the customer
buys products but the price depends not only on the product, but also on the
supplier. Suppose you needed a customerID, a productID, and a supplierID to
identify a price. Now you have an attribute that depends on three things and
hence you have a relationship between three entities (a ternary relationship) that
will have the intersection attribute, price.
Database Management Systems 2 - 12 Database Design

Fig. 2.3.3 : Ternary relation

2.3.4 Weak Entity Set


 A weak entity is an entity that cannot be uniquely identified by its attributes
alone. The entity set which does not have sufficient attributes to form a primary
key is called as weak entity set.

Fig. 2.3.4 : Weak entity set

 Strong Entity Set

The entity set that has primary key is called as strong entity set

Weak entity rules


 A weak entity set has one or more many-one relationships to other (supporting)
entity sets.
 The key for a weak entity set is its own underlined attributes and the keys for the
supporting entity sets. For example - player-number and team-name is a key for
Players.

Difference between Strong and Weak Entity Set


Sr. No. Strong entity set Weak entity set
1 It has its own primary key. It does not have sufficient attribute to
form a primary key on its own.
Database Management Systems 2 - 13 Database Design

2. It is represented by rectangle It is represented by double rectangle.

3. It represents the primary key which It represents the partial key or


is underlined. discriminator which is represented by
dashed underline.

4. The member of strong entity set is The member of weak entity set is called
called as dominant entity set subordinate entity set.

5. The relationship between two The relationship between strong entity


strong entity sets is represented by set and weak entity set is represented
diamond symbol. by double diamond symbol.

6. The primary key is one of the The primary key of weak entity set is a
attributes which uniquely identifies combination of partial key and primary
its member. key of the strong entity set.

2.4 Enhanced ER Model

2.4.1 Specialization and Generalization


 Some entities have relationships that form hierarchies. For instance, Employee can
be an hourly employee or contracted employee.
 In this relationship hierarchies, some entities can act as superclass and some other
entities can act as subclass.
 Superclass : An entity type that represents a general concept at a high level, is
called superclass.
 Subclass : An entity type that represents a specific concept at lower levels, is
called subclass.
 The subclass is said to inherit from superclass. When a subclass inherits from one
or more superclasses, it inherits all their attributes. In addition to the inherited
attributes, a subclass can also define its own specific attributes.
 The process of making subclasses from a general concept is called specialization.
This is top-down process. In this process, the sub-groups are identified within an
entity set which have attributes that are not shared by all entities.
 The process of making superclass from subclasses is called generalization. This is
a bottom up process. In this process multiple sets are synthesized into high level
entities.
 The symbol used for specialization/ Generalization is
Database Management Systems 2 - 14 Database Design

 For example – There can be two subclass entities namely Hourly_Emps and
Contract_Emps which are subclasses of Empoyee class. We might have attributes
hours_worked and hourly_wage defined for Hourly_Emps and an attribute
contractid defined for ContractEmps.
Therefore, the attributes defined for an Hourly_Emps entity are the attributes for
Employees plus Hourly_Emps. We say that the attributes for the entity set
Employees are inherited by the entity set Hourly_Emps and that Hourly-Emps
ISA (read is a) Employees. It can be represented by following Fig. 2.4.1.

Fig. 2.4.1

2.4.2 Constraints on Specialization/Generalization


There are four types of constraints on specialization/generalization relationship. These
are -
1) Membership constraints : This is a kind of constraints that involves determining
which entities can be members of a given lower-level entity. There are two types of
membership constraints -
i) Condition defined : In condition-defined lower-level entity sets,membership
is evaluated on the basis of whether or not an entity satisfies an explicit
condition or predicate. For example - Consider the high-level entity Set
Employee that has attribute Employee_type. All Employee entities are
evaluated on defining Employee_type attribute. All entities that satisfy the
condition student type = “ContractEmployee” are included in Contracted
Employee. Since all the lower-level entities are evaluated on the basis of the
same attribute this type of generalization is said to be attribute-defined.
ii) User defined : This is kind of entity set that in which the membership is
manually defined.
2) Disjoint constraints : The disjoint constraint only applies when a superclass has
more than one subclass. If the subclasses are disjoint, then an entity occurrence can
be a member of only one of the subclasses. For entity Student has either
Postgraduate_Student entity or Undergraduate_Student
Database Management Systems 2 - 15 Database Design

3) Overlapping : When some entity can be a member of more than one subclasses. For
example - Person can be both a Student or a Staff. The And can be used to represent
this constraint.

4) Completeness : It specifies whether or not an entity in the higher-level entity set


must belong to at least one of the lower-level entity sets within the
generalization/specialization. This constraint may be one of the following -
i) Total generalization or specialization : Each higher-level entity must belong
to a lower-level entity set. For example - Account in the bank must either
Savings account or Current Account. The mandatory can be used to represent
this constraint.

ii) Partial generalization or specialization : Some higher-level entities may not


belong to any lower-level entity set.
Database Management Systems 2 - 16 Database Design

2.4.3 Aggregation
A feature of the entity relationship model that allows a relationship set to participate in
another relationship set. This is indicated on an ER diagram by drawing a dashed box
around the aggregation.
For example - We treat the relationship set work and the entity sets employee and
project as a higher-level entity set called work.

Fig. 2.4.2 : ER model with aggregation

2.5 Examples based on ER Diagram


Example 2.5.1 Draw the ER diagram for banking systems (home loan applications).
AU : Dec.-17, Marks 8
OR Draw an ER diagram corresponding to customers and loans. AU : May.-14, Marks 8

OR Write short notes on : E-R diagram for banking system . AU : Dec.-14, Marks 8
Database Management Systems 2 - 17 Database Design

Solution :

Example 2.5.2 Consider the relation schema given in Figure. Design and draw an ER
diagram that capture the information of this schema. AU : May-17, Marks 5

Employee(empno,name,office,age)
Books(isbn,title,authors,publisher)
Loan(empno,isbn,date)
Database Management Systems 2 - 18 Database Design

Solution :

Example 2.5.3 Construct an E-R diagram for a car insurance company whose customers own
one or more cars each.Each car has associated with it zero to any number of recorded
accidents. Each insurance policy covers one or more cars and has one or more premium
payments associated with it. Each payment is for particular period of time and has an
associated due date and date when the payment was received. AU : Dec.-16, Marks 7

Solution :

Example 2.5.4 A car rental company maintains a database for all vehicles in its current fleet.
For all vehicles, it includes the vehicle identification number license number, manufacturer,
model, date of purchase and color. Special data are included for certain types of vehicles.
Database Management Systems 2 - 19 Database Design

Trucks : Cargo capacity


Sports cars : horsepower, renter age requirement
Vans : number of passengers
Off-road vehicles : ground clearance, drivetrain (four-or two-wheel drive)
Construct an ER model for the car rental company database. AU : Dec.-15, Marks 16

Solution :

Example 2.5.5 Draw E-R diagram for the "Restaurant Menu Ordering System", which will
facilitate the food items ordering and services within a restaurant. The entire restaurant
scenario is detailed as follows. The customer is able to view the food items menu, call the
waiter, place orders and obtain the final bill through the computer kept in their table. The
Waiters through their wireless tablet PC are able to initialize a table for customers, control
the table functions to assist customers, orders, send orders to food preparation staff (chef)
and finalize the customer's bill. The Food preparation staffs (chefs), with their touch-display
interfaces to the system, are able to view orders sent to the kitchen by waiters. During
preparation they are able to let the waiter know the status of each item, and can send
notifications when items are completed. The system should have full accountability and
logging facilities, and should support supervisor actions to account for exceptional
Database Management Systems 2 - 20 Database Design

circumstances, such as a meal being refunded or walked out on. AU : May-15, Marks 16

Solution :

Example 2.5.6 A university registrar’s office maintains data about the following entities :
(1) courses, including number, title, credits, syllabus, and prerequisites;
(2) course offerings, including course number, year, semester, section number,
instructor(s), timings, and classroom;
(3) students, including student-id, name, and program; and
(4) instructors, including identification number, name, department, and title.
Further, the enrollment of students in courses and grades awarded to students in each
course they are enrolled for must be appropriately modeled. Construct an E-R diagram for
the registrar’s office. Document all assumptions that you make about the mapping
constraints.
AU : Dec.-13, Marks 10
Database Management Systems 2 - 21 Database Design

Solution :

Example 2.5.7 What is aggregation in ER model ? Develop an ER diagram using


aggregation that captures following information : Employees work for projects. An
employee working for particular project uses various machinery. Assume necessary
attributes. State any assumptions you make. Also discuss about the ER diagram you have
designed. AU : Dec.-11, Marks 8

Solution : Aggregation : Refer section 2.4.3.


ER Diagram : The ER diagram for above described scenario can be drawn as follows -

The above ER model contains the redundant information, because every Employee,
Project, Machinery combination in works_on relationship is also considered in manages
Database Management Systems 2 - 22 Database Design

relationship. To avoid this redundancy problem we can make use of aggregation


relationship in ER diagram as follows -

We can then create a binary relationship manages for between Manager and
(Employee, Project, Machinery).
Example 2.5.8 Construct an E-R diagram for a hospital with a set of patients and a set of
medical doctors. Associate with each patient a log of the various tests and examinations
conducted. AU : Dec.-07, Marks 8

Solution :
Database Management Systems 2 - 23 Database Design

2.6 ER to Relational Mapping AU : May-17, Marks 13

In this section we will discuss how to map various ER model constructs to Relational
Model construct.

2.6.1 Mapping of Entity Set to Relationship


 An entity set is mapped to a relation in a straightforward way.
 Each attribute of entity set becomes an attribute of the table.
 The primary key attribute of entity set becomes an entity of the table.
 For example - Consider following ER diagram.

The converted employee table is as follows -

EmpID EName Salary

201 Poonam 30000

202 Ashwini 35000

203 Sharda 40000

The SQL statement captures the information for above ER diageam as follows -

CREATE TABLE Employee( EmpID CHAR(11),


EName CHAR(30),
Salary INTEGER,
PRIMARY KEY(EmpID))

2.6.2 Mapping Relationship Sets(without Constraints) to Tables


 Create a table for the relationship set.
 Add all primary keys of the participating entity sets as fields of the table.
 Add a field for each attribute of the relationship.
 Declare a primary key using all key fields from the entity sets.
Database Management Systems 2 - 24 Database Design

 Declare foreign key constraints for all these fields from the entity sets.
For example - Consider following ER model

The SQL statement captures the information for relationship present in above ER
diagram as follows -

CREATE TABLE Works_In (EmpID CHAR(11),


DeptID CHAR(11),
EName CHAR(30),
Salary INTEGER,
DeptName CHAR(20),
Building CHAR(10),
PRIMARY KEY(EmpID,DeptID),
FOREIGN KEY (EmpID) REFERENCES Employee,
FOREIGN KEY (DeptID) REFERENCES Department
)

2.6.3 Mapping Relationship Sets( With Constraints) to Tables


 If a relationship set involves n entity sets and some m of them are linked via
arrows in the ER diagram, the key for anyone of these m entity sets constitutes a
key for the relation to which the relationship set is mapped.
 Hence we have m candidate keys, and one of these should be designated as the
primary key.
 There are two approaches used to convert a relationship sets with key constraints
into table.
 Approach 1 :

o By this approach the relationship associated with more than one entities is
separately represented using a table. For example - Consider following ER
diagram. Each Dept has at most one manager, according to the key
constraint on Manages.
Database Management Systems 2 - 25 Database Design

Here the constraint is each department has at the most one manager to manage it.
Hence no two tuples can have same DeptID. Hence there can be a separate table
named Manages with DeptID as Primary Key. The table can be defined using
following SQL statement

CREATE TABLE Manages(EmpID CHAR(11),


DeptID INTEGER,
Since DATE,
PRIMARY KEY(DeptID),
FOREIGN KEY (EmpID) REFERENCES Employees,
FOREIGN KEY (DeptID) REFERENCES Departments)

 Approach 2 :

o In this approach , it is preferred to translate a relationship set with key


constraints.
o It is a superior approach because, it avoids creating a distinct table for the
relationship set.
o The idea is to include the information about the relationship set in the
table corresponding to the entity set with the key, taking advantage of the
key constraint.
o This approach eliminates the need for a separate Manages relation, and
queries asking for a department's manager can be answered without
combining information from two relations.
o The only drawback to this approach is that space could be wasted if
several departments have no managers.
o The following SQL statement, defining a Dep_Mgr relation that captures
the information in both Departments and Manages, illustrates the second
approach to translating relationship sets with key constraints :
Database Management Systems 2 - 26 Database Design

CREATE TABLE Dep_Mgr ( DeptID INTEGER,


DName CHAR(20),
Budget REAL,
EmpID CHAR (11),
since DATE,
PRIMARY KEY (DeptID),
FOREIGN KEY (EmpID) REFERENCES Employees)

2.6.4 Mapping Weak Entity Sets to Relational Mapping


A weak entity can be identified uniquely only by considering the primary key of
another (owner) entity. Following steps are used for mapping Weka Entity Set to
Relational Mapping
 Create a table for the weak entity set.
 Make each attribute of the weak entity set a field of the table.
 Add fields for the primary key attributes of the identifying owner.
 Declare a foreign key constraint on these identifying owner fields.
 Instruct the system to automatically delete any tuples in the table for which there
are no owners
For example - Consider following ER model

Following SQL Statement illustrates this mapping

CREATE TABLE Department(DeptID CHAR(11),


DeptName CHAR(20),
Bldg_No CHAR(5),
PRIMARY KEY (DeptID,Bldg_No),
FOREIGN KEY(Bldg_No) References Buildings on delete cascade
)
Database Management Systems 2 - 27 Database Design

2.6.5 Mapping of Specialization / Generalization(EER Construct) to Relational


Mapping
The specialialization/Generalization relationship(Enhanced ER Construct) can be
mapped to database tables(relations) using three methods. To demonstrate the methods,
we will take the – InventoryItem, Book, DVD

Method 1 : All the entities in the relationship are mapped to individual tables

InventoryItem(ID , name)
Book(ID,Publisher)
DVD(ID, Manufacturer)

Method 2 : Only subclasses are mapped to tables. The attributes in the superclass
are duplicated in all subclasses. For example -

Book(ID,name,Publisher)
DVD(ID, name,Manufacturer)

Method 3 : Only the superclass is mapped to a table. The attributes in the subclasses
are taken to the superclass. For example -

InventoryItem(ID , name,Publisher,Manufacturer)

This method will introduce null values. When we insert a Book record in the table, the
Manufacturer column value will be null. In the same way, when we insert a DVD record
in the table, the Publisher value will be null.
Database Management Systems 2 - 28 Database Design

Example 2.6.1 Construct an E-R diagram for a hospital with a set of patients and a set of
medical doctors. Associate with each patient a log of the various tests and examinations
conducted. Also construct appropriate tables for the ER diagram you have drawn.
Solution :
ER Diagram - Refer example 2.5.8.
Relational Mapping

patients (P_id, name, insurance, date-admitted, date-checked-out)


doctors (Dr_id, name, specialization)
test (testid, testname, date, time, result)
doctor-patient (P_id, Dr_id)
test-log (testid, P_id) performed-by (testid, Dr_id)

University Question
1. Discuss the correspondence between the ER model construct and the relational model constructs.
Show how each ER model construct can be mapped to the relational model. Discuss the option for
mapping EER construct. AU : May-17, Marks 13

Part II Relational Database Design

2.7 Concept of Relational Database Design


 There are two primary goals of relational database design – i) to generate a set of
relation schemas that allows us to store information without unnecessary
redundancy, and ii) to allows us to retrieve information easily.
 For achieving these goals, the database design need to be normalized. That means
we have to check whether the schema is it normal form or not.
 For checking the normal form of the schema, it is necessary to check the functional
dependencies and other data dependencies that exists within the schema.
Hence before letting us know what the normalization means, it is necessary to
understand the concept of functional dependencies.
2.8 Functional Dependencies
Definition : Let P and Q be sets of columns, then: P functionally determines Q,
written P → Q if and only if any two rows that are equal on (all the attributes in) P must
be equal on (all the attributes in) Q.
In other words, the functional dependency holds if
T1.P = T2.P, then T1.Q=T2.Q
Database Management Systems 2 - 29 Database Design

Where notation T1.P projects the tuple T1 onto the attribute in P.


For example : Consider a relation in which the roll of the student and his/her name is
stored as follows :

R N
1 AAA
2 BBB
3 CCC
4 DDD
5 EEE

Fig. 2.8.1 : Table which holds functional


dependency i.e. R->B

Here, R->N is true. That means the functional dependency holds true here. Because for
every assigned RollNuumber of student there will be unique name. For instance : The
name of the Student whose RollNo is 1 is AAA. But if we get two different names for the
same roll number then that means the table does not hold the functional dependency.
Following is such table -

R N
1 AAA
2 BBB
3 CCC
1 XXX
2 YYY

Fig. 2.8.2 : Table which does not hold


functional dependency

In above table for RollNumber 1 we are getting two different names - “AAA” and
“XXX”. Hence here it does not hold the functional dependency.

2.8.1 Computing Closure Set of Functional Dependency


The closure set is a set of all functional dependencies implied by a given set F. It is
denoted by F+
The closure set of functional dependency can be computed using basic three rules
which are also called as Armstrong’s Axioms.
Database Management Systems 2 - 30 Database Design

These are as follows -


i) Reflexivity : If X  Y, then X Y
ii) Augmentation : If X Y, then XZ  YZ for any Z
iv) Transitivity : If X  Y and Y  Z, then X  Z
In addition to above axioms some additional rules for computing closure set of
functional dependency are as follows -
 Union : If X  Y and X Z then X YZ
 Decomposition : If X YZ, then X Y and X Z
Example 2.8.1 Compute the closure of the following set of functional dependencies for a
relation scheme R(A,B,C,D,E), F={A->BC, CD->E, B->D, E->A)
Solution : Consider F as follows
A->BC
CD->E
B->D
E->A
The closure can be written for each attribute of relation as follows
 (A)+ = Step 1 : {A} -> the attribute itself
Step 2 : {ABC} as A->BC
Step 3 : {ABCD} as B->D
Step 4 : {ABCDE} as CD->E
Step 5 : {ABCDE} as E->A and A is already present
Hence (A)+ ={ABCDE}
 (B)+ = Step 1:{B}
Step 2 : {BD} as B->D
Step 3 : {BD} as there is no BD pair on LHS of F
Hence (B)+ ={BD}
 (C)+ = Step 1 :{C}
Step 2 : {C} as there is no single C on LHS of F
Hence (C)+ ={C}
 (D)+ = Step 1 : {D}
Step 3 : {D} as there is no BD pair on LHS of F
Database Management Systems 2 - 31 Database Design

Hence (D)+ ={D}


 (E)+ = Step 1 : {E}
Step 2 : {EA} as E->A
Step 3 : {EABC} as A->BC
Step 4 : {EABCD} as B->D
Step 5 : {EABCD} as CD->E and E is already present
By rearranging we get {ABCDE}
Hence (E)+ ={ABCDE}
 (CD)+ = Step 1:{CD}
Step 2 :{CDE}
Step 3 :{CDEA}
Step 4 :{CDEAB}
By rearranging we get {ABCDE}
Hence (CD)+ ={ABCDE}
Example 2.8.2 Compute the closure of the following set of functional dependencies for a
relation scheme R(A,B,C,D,E), F={A->BC, CD->E, B->D, E->A) and Find the candidate
key.
Solution : For finding the closure of functional dependencies - Refer example 2.8.1.
We can identify candidate from the given relation schema with the help of functional
dependency. For that purpose we need to compute the closure set of attribute. Now we
will find out the closure set which can completely identify the relation R(A,B,C,D).
Let, (A)+ = {ABCDE}
(B)+ = {BD}
(C)+ = {C}
(D)+ = {D}
(E)+ = {ABCDE}
(CD)+ = {ABCDE}
Clearly, only (A)+,(E)+ and (CD)+ gives us {ABCD} i.e. complete relation R. Hence these
are the candidate keys.
Database Management Systems 2 - 32 Database Design

2.8.2 Canonical Cover or Minimal Cover


Formal Definition : A minimal cover for a set F of FDs is a set G of FDs such that :
1) Every dependency in G is of the form X->A, where A is a single attribute.
2) The closure F+ is equal to the closure G+.
3) If we obtain a set H of dependencies from G by deleting one or more dependencies
or by deleting attributes from a dependency in G, then F+ H+.

Concept of Extraneous Attributes


Definition : An attribute of a functional dependency is said to be extraneous if we can
remove it without changing the closure of the set of functional dependencies. The formal
definition of extraneous attributes is as follows:
Consider a set F of functional dependencies and the functional dependency    in F

 Attribute A is extraneous in if A , and F logically implies (F – {  }) ∪


{( – A )   }
 Attribute A is extraneous in if A and the set of functional dependencies
(F – {   }) ∪ {(  ( – A) } logically implies F.
Algorithm for computing Canonical Cover for set of functional Dependencies F
Fc = F
repeat
Use the union rule to replace any dependencies in Fc of the form
1  1 and 1  2 and 1  12
Find a functional dependency    in Fc with an extraneous attribute either in  or in .
/* The test for extraneous attributes is done using Fc, not F */
If an extraneous attribute is found, delete it from    in Fc .
until (Fc does not change)
Example 2.8.3 Consider the following functional dependencies over the attribute set
R(ABCDE) for finding minimal cover FD = {A->C, AC->D, B->ADE}
Solution :
Step 1 : Split the FD such that R.H.S contain single attribute. Hence we get
A->C
AC->D
B->A
Database Management Systems 2 - 33 Database Design

B->D
B->E
Step 2 : Find the redundant entries and delete them. This can be done as follows -

o For A->C : We find (A)+ by assuming that we delete A->C temporarily. We


get (A)+={A}. Thus from A it is not possible to obtain C by deleting A->C.
This means we can not delete A->C
o For AC->D : We find (AC)+ by assuming that we delete AC->D
temporarily. We get (AC)+={AC}. Thus by such deletion it is not possible to
obtain D. This means we can not delete AC->D
o For B->A : We find (B)+ by assuming that we delete B->A temporarily. We
get (B)+={BDE}. Thus by such deletion it is not possible to obtain A. This
means we can not delete B->A
o For B->D : We find (B)+ by assuming that we delete B->D temporarily. We
get (B)+={BEACD}. This shows clearly that even if we delete B->D we can
obtain D. This means we can delete B->A. Thus it is redundant.
o For B->E : We find (B)+ by assuming that we delete B->E temporarily. We
get (B)+={BDAC}. Thus by such deletion it is not possible to obtain E. This
means we can not delete B->E
To summarize we get now
A->C
AC->D
B->A
B->E
Thus R.H.S gets simplified.
Step 3 : Now we will simplify L.H.S.
Consider AC->D. Here we can split A and C. For that we find closure set of A and C.
(A)+ = (AC)
(C)+ = (C)
Thus C can be obtained from both A as well as C. That also means we need not have to
have AC on L.H.S. Instead, only A can be allowed and C can be eliminated. Thus after
simplification we get
A->D
Database Management Systems 2 - 34 Database Design

To summarize we get now


A->C
A->D
B->A
B->E
Thus L.H.S gets simplified.
Step 3 : The simplified L.H.S. and R.H.S can be combined together to form

A->CD
B->AE
This is a minimal cover or Canonical cover of functional dependencies.
2.9 Concept of Redundancy and Anomalies
Definition : Redundancy is a condition created in database in which same piece of
data is held at two different places.
Redundancy is at the root of several problems associated with relational schemas.
Problems caused by redundancy : Following problems can be caused by redundancy-
i) Redundant storage : Some information is stored repeatedly.
ii) Update anomalies : If one copy of such repeated data is updated then inconsistency
is created unless all other copies are similarly updated.
iii) Insertion anomalies : Due to insertion of new record repeated information get
added to the relation schema.
iv) Deletion anomalies : Due to deletion of particular record some other important
information associated with the deleted record get deleted and thus we may lose
some other important information from the schema.
Example : Following example illustrates the above discussed anomalies or redundancy
problems
Consider following Schema in which all possible information about Employee is
stored.
Database Management Systems 2 - 35 Database Design

1) Redundant storage : Note that the information about DeptID, DeptName and
DeptLoc is repeated.
2) Update anomalies : In above table if we change DeptLoc of Pune to Chennai, then
it will result inconsistency as for DeptID 101 the DeptLoc is Pune. Or otherwise, we
need to update multiple copies of DeptLoc from Pune to Chennai. Hence this is an
update anomaly.
3) Insertion anomalies : For above table if we want to add new tuple say
(5, EEE,50000) for DeptID 101 then it will cause repeated information of
(101, XYZ,Pune) will occur.
4) Deletion anomalies : For above table, if we delete a record for EmpID 4, then
automatically information about the DeptID 102,DeptName PQR and DeptLoc
Mumbai will get deleted and one may not be aware about DeptID 102. This causes
deletion anomaly.
2.10 Decomposition AU : Dec.-17, Marks 7

 Decomposition is the process of breaking down one table into multiple tables.
 Formal definition of decomposition is -
 A decomposition of relation Schema R consists of replacing the relation Schema by
two relation schema that each contain a subset of attributes of R and together
include all attributes of R by storing projections of the instance.
 For example - Consider the following table
Employee_Department table as follows -

Eid Ename Age City Salary Deptid DeptName


E001 ABC 29 Pune 20000 D001 Finance
E002 PQR 30 Pune 30000 D002 Production
E003 LMN 25 Mumbai 5000 D003 Sales
E004 XYZ 24 Mumbai 4000 D004 Marketing
E005 STU 32 Hyderabad 25000 D005 Human Resource
We can decompose the above relation Schema into two relation schemas as Employee
(Eid, Ename, Age, City, Salary) and Department (Deptid, Eid, DeptName). as follows -

Employee Table
Eid Ename Age City Salary
E001 ABC 29 Pune 20000
E002 PQR 30 Pune 30000
E003 LMN 25 Mumbai 5000
Database Management Systems 2 - 36 Database Design

E004 XYZ 24 Mumbai 4000


E005 STU 32 Hyderabad 25000

Department Table
Deptid Eid DeptName
D001 E001 Finance
D002 E002 Production
D003 E003 Sales
D004 E004 Marketing
D005 E005 Human Resource
 The decomposition is used for eliminating redundancy.
 For example : Consider following relation Schema R in which we assume that the
grade determines the salary, the redundancy is caused

Schema R

 Hence, the above table can be decomposed into two Schema S and T as follows :

Schema S Schema T
Name eid deptname Grade Grade Salary
AAA 121 Accounts 2 2 8000
AAA 132 Sales 3 3 7000
BBB 101 Marketing 4 4 7000
CCC 106 Purchase 2 2 8000

Problems Related to Decomposition :


Following are the potential problems to consider :
1) Some queries become more expensive.
2) Given instances of the decomposed relations, we may not be able to reconstruct the
corresponding instance of the original relation!
Database Management Systems 2 - 37 Database Design

3) Checking some dependencies may require joining the instances of the decomposed
relations.
4) There may be loss of information during decomposition.

Properties Associated With Decomposition


There are two properties associated with decomposition and those are –
1) Loss-less Join or non Loss Decomposition : When all information found in the
original database is preserved after decomposition, we call it as loss less or non loss
decomposition.
2) Dependency Preservation : This is a property in which the constraints on the
original table can be maintained by simply enforcing some constraints on each of
the smaller relations.

2.10.1 Non-loss Decomposition or Loss-less Join


The lossless join can be defined using following three conditions :
i) Union of attributes of R1 and R2 must be equal to attribute of R. Each attribute of R
must be either in R1 or in R2.
Att(R1) ∪ Att(R2) = Att(R)
ii) Intersection of attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
iii) Common attribute must be a key for at least one relation (R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1)
or Att(R1) ∩ Att(R2) -> Att(R2)
Example 2.10.1 Consider the following relation R(A,B,C,D)and FDs A->BC, is the decomposition
of R into R1(A,B,C), R2(A,D). Check if the decomposition is lossless join or not.
Solution :
Step 1 : Here Att(R1) ∪ Att(R2) = Att(R) i.e R1(A,B,C) ∪ R2(A,D)=(A,B,C,D) i.e R.
Thus first condition gets satisfied.
Step 2 : Here R1 ∩ R2={A}. Thus Att(R1) ∩ Att(R2) ≠ . Here the second condition
gets satisfied.
Step 3 : Att(R1) ∩ Att(R2) -> {A}. Now (A)+={A,B,C}  attributes of R1. Thus the
third condition gets satisfied.
This shows that the given decomposition is a lossless join.
Database Management Systems 2 - 38 Database Design

Example 2.10.2 Consider the following relation R(A,B,C,D,E,F) and FDs A->BC, C->A,
D->E, F->A, E->D is the decomposition of R into R1(A,C,D), R2(B,C,D), and R3(E,F,D).
Check for lossless.
Solution :
Step 1 : R1 R2 R3=R. Here the first condition for checking lossless join is satisfied
as (A,C,D)∪ (B,C,D) ∪ (E,F,D)={A,B,C,D,E,F} which is nothing but R.
Step 2 : Consider R1∩ R2={CD} and R2∩R3={D}. Hence second condition of
intersection not being  gets satisfied.
Step 3 : Now, consider R1(A,C,D) and R2(B,C,D). We find R1∩R2={CD}
(CD)+ = {ABCDE}  attributes of R1 i.e.{A,C,D}. Hence condition 3 for checking
lossless join for R1 and R2 gets satisfied.
Step 4 : Now, consider R2(B,C,D) and R3(E,F,D) . We find R2∩R3={D}.
(D)+={D,E} which is neither complete set of attributes of R2 or R3.[Note that F is
missing for being attribute of R3].
Hence it is not lossless join decomposition. Or in other words we can say it is a
lossy decomposition.
Example 2.10.3 Suppose that we decompose schema R=(A,B,C,D,E) into (A,B,C) (C,D,E)
Show that it is not a lossless decomposition.
Solution :
Step 1 : Here we need to assume some data for the attributes A, B, C, D, and E.
Using this data we can represent the relation as follows –
Relation R
A B C D E
a 1 x p q
b 2 x r s

Relation R1 = (A,B,C)
A B C
a 1 x
b 2 x

Relation R2 = (C,D,E)
C D E
x p q
x r s
Database Management Systems 2 - 39 Database Design

Step 2 : Now we will join these tables using natural join, i.e. the join based on
common attribute C. We get R1 ⋈ R2 as

A B C D E
a 1 x p q
Here we get more rows or
a 1 x r s tuples than original
b 2 x p q relation R

b 2 x r s
Clearly R1 ⋈ R2  R. Hence it is not lossless decomposition.

2.10.2 Dependency Preservation


 Definition : A Decomposition D = {R1, R2, R3….Rn} of R is dependency
preserving for a set F of Functional dependency if - (F1 ∪ F2 ∪ … ∪ Fm) = F.
 If decomposition is not dependency-preserving, some dependency is lost in the
decomposition.

Example 2.10.4 Consider the relation R (A, B, C) for functional dependency set {A -> B and
B -> C} which is decomposed into two relations R1 = (A, C) and R2 = (B, C). Then check if
this decomposition dependency preserving or not.
Solution : This can be solved in following steps :
Step 1 : For checking whether the decomposition is dependency preserving or not
we need to check
following condition

F+ = (F1 F2)+
Step 2 : We have with us the F+ ={ A->B and B->C }
+ +
Step 3 : Let us find (F1) for relation R1 and (F2) for relation R2

R1(A,C) R2(B,C)
A->A Trivial B->B Trivial
C->C Trivial C->C Trivial
A->C In (F)+A->B->C and it is Nontrivial B->C In (F)+ B->C and it is Non-Trivial
AC->AC Trivial BC->BC Trivial
A->B but is not useful as B is not part of R1 We can not obtain C->B
set
We can not obtain C->A
Database Management Systems 2 - 40 Database Design

Step 4 : We will eliminate all the trivial relations and useless relations. Hence we
can obtain R1 and R2 as

R1(A,C) R2(B,C)
A->C Nontrivial

B->C Non-Trivial

(F1∪ F2)+ = {A->C, B->C} {A->B, B->C} i.e.(F)+


Thus the condition specified in step 1 i.e. F+=(F1 F2)+ is not true. Hence it is not
dependency preserving decomposition.

Example 2.10.5 Let relation R(A,B,C,D) be a relational schema with following functional
dependencies {A->B, B->C,C->D, and D->B}. The decomposition of R into (A,B), (B,C)
and (B,D). Check whether this decomposition is dependency preserving or not.
Solution :
Step 1 : Let (F)+ = {A->B, B->C, C->D,D->B}.
Step 2 : We will find (F1)+, (F2)+, (F3)+ for relations R1(A,B) , R2(B,C) and R3(B,D) as
follows -

R1(A,B) R2(B,C) R3(B,D)


A->A Trivial B->B Trivial B->B Trivial
B->B Trivial C->C Trivial D->D Trivial
A->B ∵ (F)+ B->C ∵ (F)+ and it’s B-> D ∵ (F)+ as and
and it’s non Trivial non Trivial B->C->D and it’s non
B->A can not be Trivial
C->B ∵ In (F)+ and
obtained D->B ∵ (F)+ and it’s
C->D->C and it is
AB->AB Nontrivial non Trivial
BC->BC Trivial BD->BD Trivial

Step 3 : We will eliminate all the trivial relations and useless relations. Hence we
can obtain R1 ∪ R2 ∪ R3 as

R1(A,B) R2(B,C) R2(B,D)


A->B B->C B-> D
C->B D->B
Database Management Systems 2 - 41 Database Design

Step 4 : As from above FD’s we get

Step 5 : This proves that F+=(F1 F2 F3)+. Hence given decomposition is


dependency preserving.

University Question
1. Differentiate between lossless join decomposition and dependency preserving decomposition.
AU : Dec.-17, Marks 7

2.11 Normal Forms AU : Dec.-14, 15, May-18, Marks 16

 Normalization is the process of reorganizing data in a database so that it meets


two basic requirements:
1) There is no redundancy of data (all data is stored in only one place), and
2) data dependencies are logical (all related data items are stored together)
 The normalization is important because it allows database to take up less disk
space.
 It also help in increasing the performance.

2.11.1 First Normal Form


The table is said to be in 1NF if it follows following rules -
i) It should only have single (atomic) valued attributes/columns.
ii) Values stored in a column should be of the same domain
iii) All the columns in a table should have unique names.
iv) And the order in which data is stored, does not matter.
Consider following Student table
Student
sid sname Phone
1 AAA 11111
22222
2 BBB 33333
3 CCC 44444
55555
Database Management Systems 2 - 42 Database Design

As there are multiple values of phone number for sid 1 and 3, the above table is not in
1NF. We can make it in 1NF. The conversion is as follows -

sid sname Phone


1 AAA 11111
1 AAA 22222
2 BBB 33333
3 CCC 44444
3 CCC 55555

2.11.2 Second Normal Form


Before understanding the second normal form let us first discuss the concept of partial
functional dependency and prime and non prime attributes.

Concept of Partial Functional Dependency


Partial dependency means that a nonprime attribute is functionally dependent on part
of a candidate key.
For example : Consider a relation R(A,B,C,D) with functional dependency
{AB->CD,A->C}
Here (AB) is a candidate key because
(AB)+ = {ABCD}={R}
Hence {A,B} are prime attributes and {C,D} are non prime attribute. In A->C, the non
prime attribute C is dependent upon A which is actually a part of candidate key AB.
Hence due to A->C we get partial functional dependency.

Prime and Non Prime Attributes


 Prime attribute : An attribute, which is a part of the candidate-key, is known as a
prime attribute.
 Non-prime attribute : An attribute, which is not a part of the prime-key, is said to
be a non-prime attribute.
 Example : Consider a Relation R={A,B,C,D} and candidate key as AB, the Prime
attributes : A, B
Non Prime attributes : C, D

The Second Normal Form


For a table to be in the Second Normal Form, following conditions must be followed
i) It should be in the First Normal form.
ii) It should not have partial functional dependency.
Database Management Systems 2 - 43 Database Design

For example : Consider following table in which every information about a the
Student is maintained in a table such as student id(sid), student name(sname), course
id(cid) and course name(cname).

Student_Course
sid sname cid cname

1 AAA 101 C
2 BBB 102 C++
3 CCC 101 C
4 DDD 103 Java
This table is not in 2NF. For converting above table to 2NF we must follow the
following steps -
Step 1 : The above table is in 1NF.
Step 2 : Here sname and sid are associated similarly cid and cname are associated
with each other. Now if we delete a record with sid=2, then automatically the
course C++ will also get deleted. Thus,
sid->sname or cid->cname is a partial functional dependency, because {sid,cid}
should be essentially a candidate key for above table. Hence to bring the above table
to 2NF we must decompose it as follows :
Student
Here candidate key is
sid sname cid (sid,cid)
and
1 AAA 101
(sid,cid)->sname
2 BBB 102
3 CCC 101
4 DDD 103

Course
cid cname
Here candidate key is
101 C cid

102 C++ Here cid->cname

101 C
103 Java

Thus now table is in 2NF as there is no partial functional dependency


Database Management Systems 2 - 44 Database Design

2.11.3 Third Normal Form


Before understanding the third normal form let us first discuss the concept of
transitive dependency, super key and candidate key

Concept of Transitive Dependency


A functional dependency is said to be transitive if it is indirectly formed by two
functional dependencies. For example -
X -> Z is a transitive dependency if the following functional dependencies hold true :
X->Y
Y->Z

Concept of Super key and Candidate Key


Superkey : A super key is a set or one of more columns (attributes) to uniquely
identify rows in a table.
Candidate key : The minimal set of attribute which can uniquely identify a tuple is
known as candidate key. For example consider following table

RegID RollNo Sname

101 1 AAA
102 2 BBB
103 3 CCC
104 4 DDD

Superkeys
 {RegID}
 {RegID, RollNo}
 {RegID,Sname}
 {RollNo,Sname}
 {RegID, RollNo,Sname}

Candidate Keys
 {RegID}
 {RollNo}

Third Normal Form


A table is said to be in the Third Normal Form when,
i) It is in the Second Normal form.(i.e. it does not have partial functional dependency)
ii) It doesn't have transitive dependency.
Database Management Systems 2 - 45 Database Design

Or in other words
In other words 3NF can be defined as : A table is in 3NF if it is in 2NF and for each
functional dependency
X-> Y
at least one of the following conditions hold :
i) X is a super key of table
ii) Y is a prime attribute of table
For example : Consider following table Student_details as follows -

sid sname zipcode cityname state

1 AAA 11111 Pune Maharashtra


2 BBB 22222 Surat Gujarat
3 CCC 33333 Chennai Tamilnadu
4 DDD 44444 Jaipur Rajastan
5 EEE 55555 Mumbai Maharashtra
Here
Super keys : {sid},{sid,sname},{sid,sname,zipcode}, {sid,zipcode,cityname}… and so on.
Candidate keys : {sid}
Non-Prime attributes : {sname,zipcode,cityname,state}
The dependencies can be denoted as
sid->sname
sid->zipcode
zipcode->cityname
cityname->state
The above denotes the transitive dependency. Hence above table is not in 3NF. We can
convert it into 3NF as follows :
Student
sid sname zipcode
1 AAA 11111
2 BBB 22222
3 CCC 33333
4 DDD 44444
5 EEE 55555
Database Management Systems 2 - 46 Database Design

Zip
zipcode cityname state
11111 Pune Maharashtra
22222 Surat Gujarat
33333 Chennai Tamilnadu
44444 Jaipur Rajasthan
55555 Mumbai Maharashtra

Example 2.11.1 Consider the relation R = {A, B, C, D, E, F, G, H, I, J} and the set of


functional dependencies F= {{A, B} C, A {D, E}, B F, F {G, H}, D {I, J} }
1. What is the key for R ? Demonstrate it using the inference rules.
2. Decompose R into 2NF, then 3NF relations.
Solution : Let,
A  DE (given)
 A  D, A  E
As D  I J, A  I J
Using union rule we get
A  DEIJ
As AA
we get A  ADEIJ
Using augmentation rule we compute AB
AB  ABDEIJ
But AB  C (given)
 AB  ABCDEIJ
B  F (given) F  GH  B  GH (transitivity)
 AB  AGH is also true

Similarly AB  AF ∵ B  F (given)
Thus now using union rule
AB  ABCDEFGHIJ
 AB is a key
The table can be converted to 2NF as
Database Management Systems 2 - 47 Database Design

R1 = (A, B, C)

R2 = (A, D, E, I, J)

R3 = (B, F, G, H)

The above 2NF relations can be converted to 3NF as follows


R1 = (A, B, C)

R2 = (A, D, E)

R3 = (D, I, J)

R4 = (B, E)

R5 = (E, G, H).

University Questions
1. What is database normalization ? Explain the first normal form, second normal form and third
normal form. AU : May-18, Marks 13; Dec.-15, Marks 16

2. What are normal forms. Explain the types of normal form with an example.
AU : Dec.-14, Marks 16

2.12 Boyce / Codd Normal Form (BCNF)


Boyce and Codd Normal Form is a higher version of the Third Normal form. This
form deals with certain type of anomaly that is not handled by 3NF.
A 3NF table which does not have multiple overlapping candidate keys is said to be in
BCNF.
Or in other words,
For a table to be in BCNF, following conditions must be satisfied :
i) R must be in 3rd Normal Form
ii) For each functional dependency ( X → Y ), X should be a super Key. In simple
words if Y is a prime attribute then X can not be non prime attribute.
For example - Consider following table that represents that a Student enrollment for
the course -

Enrollment Table
sid course Teacher
1 C Ankita
1 Java Poonam
Database Management Systems 2 - 48 Database Design

2 C Ankita
3 C++ Supriya
4 C Archana
From above table following observations can be made :
 One student can enroll for multiple courses. For example student with sid=1 can
enroll for C as well as Java.
 For each course, a teacher is assigned to the student.
 There can be multiple teachers teaching one course for example course C can be
taught by both the teachers namely - Ankita and Archana.
 The candidate key for above table can be (sid,course), because using these two
columns we can find
 The above table holds following dependencies
o (sid,course)->Teacher
o Teacher->course
 The above table is not in BCNF because of the dependency teacher->course. Note
that the teacher is not a superkey or in other words, teacher is a non prime
attribute and course is a prime attribute and non-prime attribute derives the prime
attribute.
 To convert the above table to BCNF we must decompose above table into Student
and Course tables

Student
sid Teacher
1 Ankita
1 Poonam
2 Ankita
3 Supriya
4 Archana

Course
Teacher course
Ankita C
Poonam Java
Ankita C
Supriya C++
Archana C
Now the table is in BCNF
Database Management Systems 2 - 49 Database Design

Example 2.12.1 Consider a relation(A,B,C,D) having following FDs.{AB->C, AB->D,


C->A, B->D}. Find out the normal form of R.
Solution :
Step 1 : We will first find out the candidate key from the given FD.
(AB)+ = {ABCD} = R
(BC)+ = {ABCD} = R

(AC) + = {AC} R
There is no involvement of D on LHS of the FD rules. Hence D can not be part of any
candidate key. Thus we obtain two candidate keys (AB)+ and (BC)+. Hence
prime attributes = {A,B,C}
Non prime attributes = {D}
Step 2 : Now, we will start checking from reverse manner, that means from BCNF,
then 3NF, then 2NF.
Step 3 : For R being in BCNF for X->Y the X should be candidate key or super key.
From above FDs consider C->D in which C is not a candidate key or super key.
Hence given relation is not in BCNF.
Step 4 : For R being in 3NF for X->Y either i) the X should be candidate key or super
key or ii) Y should be prime attribute. (For prime and non prime attributes refer
step 1)

o For AB->C or AB->D the AB is a candidate key. Condition for 3NF is


satisfied.
o Consider C->A. In this FD the C is not candidate key but A is a prime
attribute. Condition for 3NF is satisfied.
o Now consider B->D. In this FD, the B is not candidate key, similarly D is
not a prime attribute. Hence condition for 3NF fails over here.
Hence given relation is not in 3NF.
Step 5 : For R being in 2NF following condition should not occur.
Let X->Y, if X is a proper subset of candidate key and Y is a non prime attribute. This
is a case of partial functional dependency.
For relation to be in 2NF there should not be any partial functional dependency.

o For AB->C or AB->D the AB is a complete candidate key. Condition for


2NF is satisfied.
Database Management Systems 2 - 50 Database Design

o Consider C->A. In this FD the C is not candidate key. Condition for 2NF is
satisfied.
o Now consider B->D. In this FD, the B is a part of candidate key(AB or BC),
similarly D is not a prime attribute. That means partial functional
dependency occurs here. Hence condition for 2NF fails over here.
Hence given relation is not in 2NF.
Therefore we can conclude that the given relation R is in 1NF.
Example 2.12.2 Consider a relation R(ABC) with following FD A->B, B->C and C->A.
What is the normal form of R ?
Solution :
Step 1 : We will find the candidate key
(A)+ = {ABC} =R
(B)+ = {ABC} =R
(C)+ = {ABC} =R
Hence A, B and C all are candidate keys
Prime attributes = {A,B,C}
Non prime attribute{}
Step 2 : For R being in BCNF for X->Y the X should be candidate key or super key.
From above FDs

o Consider A->B in which A is a candidate key or super key. Condition for


BCNF is satisfied.
o Consider B->C in which B is a candidate key or super key. Condition for
BCNF is satisfied.
o Consider C->A in which C is a candidate key or super key. Condition for
BCNF is satisfied.
This shows that the given relation R is in BCNF.
Example 2.12.3 Prove that any relational schema with two attributes is in BCNF.
Solution : Here, we will consider R={A,B} i.e. a relational schema with two attributes.
Now various possible FDs are A->B, B->A.
From the above FDs

o Consider A->B in which A is a candidate key or super key. Condition for


BCNF is satisfied.
Database Management Systems 2 - 51 Database Design

o Consider B->A in which B is a candidate key or super key. Condition for


BCNF is satisfied.
o Consider both A->B and B->A with both A and B is candidate key or super
key. Condition for BCNF is satisfied.
o No FD holds in relation R. In this {A,B} is candidate key or super key. Still
condition for BCNF is satisfied.
This shows that any relation R is in BCNF with two attributes.
2.13 Multivalued Dependencies and Fourth Normal Form
AU : May-14, Dec.-16, Marks 16

Concept of Multivalued Dependencies


 A table is said to have multi-valued dependency, if the following conditions are
true,
1) For a dependency A  B, if for a single value of A, multiple values of B
exists, then the table may have multi-values dependency.
2) Also, a table should have at-least 3 columns for it to have a multi-valued
dependency.
3) And, for a relation R(A,B,C), if there is a multi-valued dependency between,
A and B, then B and C should be independent of each other.
If all these conditions are true for any relation(table), it is said to have multi-valued
dependency.
 In simple terms, if there are two columns A and B - and for column A if there are
multiple values of column B then we say that MVD exists between A and B
 The multivalued dependency is denoted by
 If there exists a multivalued dependency then the table is not in 4th normal form.
 For example : Consider following table for information about student

Student
sid Course Skill

1 C English
C++ German
2 Java English
French
Here sid =1 leads to multiple values for courses and skill. Following table shows this
Database Management Systems 2 - 52 Database Design

sid Course Skill

1 C English
1 C++ German

1 C German

1 C++ English

2 Java English

2 Java French

Here sid and course are dependent but the Course and Skill are independent. The
multivalued dependency is denoted as :
sid Course
sid Skill

Fourth Normal Form


Definition : For a table to satisfy the Fourth Normal Form, it should satisfy the following
two conditions :
1) It should be in the Boyce-Codd Normal Form(BCNF).
2) And, the table should not have any multi-valued dependency.
For example : Consider following student relation which is not in 4NF as it contains
multivalued dependency.

Student Table
sid Course Skill

1 C English
1 C++ German
1 C German
1 C++ English
2 Java English
2 Java French
Now to convert the above table to 4NF we must decompose the table into following
two tables.
Database Management Systems 2 - 53 Database Design

Student_Course Table

Key : (sid,Course)
sid Course
1 C
1 C++
2 Java

Student_Skill Table

Key : (sid,Skill)
sid Skill
1 English
1 German
2 English
2 French
Thus the tables are now in 4NF.

University Questions
1. Explain first normal form, second normal form, third normal form and BCNF with example.
AU : Dec.-16, Marks 13
2. Explain Boyce Codd Normal form and fourth normal form with suitable example.
AU : May-14, Marks 16

2.14 Join Dependencies and Fifth Normal Form


Concept of Join Dependencies
o Join decomposition is a further generalization of Multivalued
dependencies.
o If the join of R1 and R2 over C is equal to relation R, then we can say that a
Join Dependency (JD) exists.
o Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a
given relations R (A, B, C, D).
o Alternatively, R1 and R2 are a lossless decomposition of R.
o A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a
lossless-join decomposition.
Database Management Systems 2 - 54 Database Design

o The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is


equal to the relation R.
o Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are
a JD of R.

Concept of Fifth Normal Form


The database is said to be in 5NF if -
i) It is in 4th Normal Form
ii) If we can decompose table further to eliminate redundancy and anomalies and
when we rejoin the table we should not be losing the original data or get a new
record(join Dependency Principle)
The fifth normal form is also called as project join normal form
For example - Consider following table

Seller Company Product

Rupali Godrej Cinthol


Sharda Dabur Honey
Sharda Dabur HairOil
Sharda Dabur Rosewater
Sunil Amul Icecream
Sunil Britania Biscuits
Here we assume the keys as{Seller, Company, Product}
The above table has multivalued dependency as
Seller {Company, Product}. Hence table is not in 4th Normal Form. To make the
above table in 4th normal form we decompose above table into two tables as
Seller_Company Seller_Product
Seller Company Seller Product
Rupali Godrej Rupali Cinthol
Sharda Dabur Sharda Honey
Sunil Amul Sharda HairOil
Sunil Britania Sharda RoseWater
Sunil Icecream
Sunil Biscuits
Database Management Systems 2 - 55 Database Design

The above table is in 4th Normal Form as there is no multivalued dependency. But it
is not in 5th normal form because if we join the above two table we may get

Seller Company Product


Rupali Godrej Cinthol
Sharda Dabur Honey
Sharda Dabur HairOil
Sharda Dabur Rosewater
Sunil Amul Icecream
Sunil Amul Biscuits
Sunil Britania Icecream
Sunil Britania Biscuits
Newly added records
which are not present in
original table

To avoid the above problem we can decompose the tables into three tables as
Seller_Company, Seller_Product, and Company Product table
Seller_Company Seller_Product Company_Product
Seller Company Seller Product Company Product
Rupali Godrej Rupali Cinthol Godrej Cinthol
Sharda Dabur Sharda Honey Dabur Honey
Sunil Amul Sharda HairOil Dabur HairOil
Sunil Britania Sharda RoseWater Dabur RoseWater
Sunil Icecream Amul Icecream
Sunil Biscuit Britania Biscuit

Thus the table in in 5th normal form.


Database Management Systems 2 - 56 Database Design

2.15 Two Marks Questions with Answers

Q.1 Explain Entity Relationship model. AU : May-16


Ans. :  The ER data model specifies enterprise schema that represents the overall
logical structure of a database.
 The E-R model is very useful in mapping the meanings and interactions of real-
world entities onto a conceptual schema.

Q.2 Give the limitations of E-R model ? How do you overcome this ? AU : May-07
Ans. : 1) Loss of information content : Some information be lost or hidden in ER
model
2) Limited relationship representation : ER model represents limited relationship as
compared to another data models like relational model etc.
3) No representation of data manipulation : It is difficult to show data manipulation
in ER model.
4) Popular for high level design : ER model is very popular for designing high level
design.

Q.3 List the design phases of Entity Relationship model.


Ans. : 1) Requirement Analysis, 2) Conceptual Database Design, 3) Logical
Database Design, 4) Schema Refinement, 5) Physical Database Design,
6) Application and Security Design.

Q.4 What is an entity ? AU : May-14


Ans. :  An entity is an object that exists and is distinguishable from other objects.
 For example - Student named “Poonam” is an entity and can be identified by her
name. Entity is represented as a box, in ER model.

Q.5 What do you mean by derived attributes ?


Ans. :  Derived attributes are the attributes that contain values that are calculated
from other attributes.
 To represent derived attribute there is dotted ellipse inside the solid ellipse. For
example –Age can be derived from attribute DateOfBirth. In this situation,
DateOfBirth might be called Stored Attribute.

Q.6 What is a weak entity ? Give example. AU : Dec.-16, May-18


Ans. : Refer section 2.3.4

Q.7 What are the problems caused by redundancy ? AU : Dec.-17


Database Management Systems 2 - 57 Database Design

Ans. : Problems caused by Redundancy : Following problems can be caused by


redundancy -
i) Redundant Storage : Some information is stored repeatedly.
ii) Update Anomalies : If one copy of such repeated data is updated then
inconsistency is created unless all other copies are similarly updated.
iii) Insertion Anomalies : Due to insertion of new record repeated information get
added to the relation schema.
iv) Deletion Anomalies : Due to deletion of particular record some other important
information associated with the deleted record get deleted and thus we may lose
some other important information from the schema.

Q.8 Define functional dependency. AU : Dec 04,05, May 05,14,15


Ans. : Let P and Q be sets of columns, then : P functionally determines Q, written
P → Q if and only if any two rows that are equal on (all the attributes in) P must be equal
on (all the attributes in) Q.
In other words, the functional dependency holds if
T1.P = T2.P, then T1.Q=T2.Q
Where notation T1.P projects the tuple T1 onto the attribute in P.

Q.9 Why certain functional dependencies are called trivial functional dependencies ?
AU : May-06,12
Ans. :  A functional dependency FD : X → Y is called trivial if Y is a subset of X.
This kind of dependency is called trivial because it can be derived from common
sense. If one "side" is a subset of the other, it's considered trivial. The left side is
considered the determinant and the right the dependent.
 For example - {A,B} –> B is a trivial functional dependency because B is a subset of
A,B. Since {A,B} –> B includes B, the value of B can be determined. It's a trivial
functional dependency because determining B is satisfied by its relationship to
A,B

Q.10 Define normalization. AU : May -14


Ans. : Normalization is the process of reorganizing data in a database so that it meets
two basic requirements :
1) There is no redundancy of data (all data is stored in only one place), and
2) data dependencies are logical (all related data items are stored together)

Q.11 State anomalies of 1NF. AU : Dec.-15


Ans. : All the insertion, deletion and update anomalies are in 1NF relation
Database Management Systems 2 - 58 Database Design

Q.12 What is multivalued dependency ? AU : Dec. -06


Ans. : A table is said to have multi-valued dependency, if the following
conditions are true,
1) For a dependency A  B, if for a single value of A, multiple values of B exists, then
the table may have multi-values dependency.
2) Also, a table should have at-least 3 columns for it to have a multi-valued
dependency.
3) And, for a relation R(A,B,C), if there is a multi-valued dependency between,
A and B, then B and C should be independent of each other.

Q.13 Describe BCNF and describe a relation which is in BCNF. AU : Dec. -02
Ans. : Refer section 2.12.

Q.14 Why 4NF in normal form is more desirable than BCNF ? AU : Dec. -14
Ans. :
 4NF is more desirable than BCNF because it reduces the repetition of information.
 If we consider a BCNF schema not in 4NF we observe that decomposition into
4NF does not lose information provided that a lossless join decomposition is used,
yet redundancy is reduced.

Q.15 Give an example of a relation schema R and set of dependencies such that R is in
BCNF but not in 4NF. AU : May -12
Ans. : Consider relation R(A,B,C,D) with dependencies
AB C
ABC D
AC B
Here the only key is AB. Thus each functional dependency has superkey on the left.
But MVD has non-superky on its left. So it is not 4NF.

Q.16 Show that if a relation is in BCNF, then it is also in 3NF. AU : Dec.-12


Ans. :
 Boyce and Codd Normal Form is a higher version of the Third Normal form.
 A 3NF table which does not have multiple overlapping candidate keys is said to
be in BCNF. When the table is in BCNF then it doesn’t have partial functional
dependency as well as transitive dependency.
 Hence it is true that if relation is in BCNF then it is also in 3NF.
Database Management Systems 2 - 59 Database Design

Q.17 Why it is necessary to decompose a relation ? AU : May-07

Ans. :  Decomposition is the process of breaking down one table into multiple
tables.
 The decomposition is used for eliminating redundancy.

Q.18 Explain atleast two desirable properties of decomposition. AU : May-03,17, Dec.-05


Ans. :
There are two properties associated with decomposition and those are –
1) Loss-less Join or non Loss Decomposition : When all information found in the
original database is preserved after decomposition, we call it as loss less or non loss
decomposition.
2) Dependency Preservation : This is a property in which the constraints on the
original table can be maintained by simply enforcing some constraints on each of
the smaller relations.

Q.19 Explain with simple example lossless join decomposition. AU : May-03


Ans. : Refer section 2.10.1.



You might also like