0% found this document useful (0 votes)
4 views

Database Management System

The document provides an overview of Database Management Systems (DBMS), defining it as a collection of related data and programs for data access. It discusses the functions of DBMS, limitations of file processing systems, features of database systems, and various data models, including object-based and record-based models. The relational model is highlighted as the most commonly used due to its simplicity and strong mathematical foundation.

Uploaded by

D Pavan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Database Management System

The document provides an overview of Database Management Systems (DBMS), defining it as a collection of related data and programs for data access. It discusses the functions of DBMS, limitations of file processing systems, features of database systems, and various data models, including object-based and record-based models. The relational model is highlighted as the most commonly used due to its simplicity and strong mathematical foundation.

Uploaded by

D Pavan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 185

1

LECTURE NOTES

UNIT - 1

P S Gill
2

CHAPTER 1
INTRODUCTION

Overview of DBMS (Database Management System)


DBMS is generally defined as a collection of logically related data and a set of programs
to access the data. Strictly speaking, this is definition of “Database System”, which
comprises of two components i.e. (i) Database and (ii) DBMS.

USER QUERIES

Query Processing
Software

DBMS

Storage Management
Software

DATABASE
SYSTEM

Schema DATA
Definition

DATABASE

DATABASE A Database is a collection of logically related data that can be recorded.


The information stored in the database must have the following implicit properties:-

(a) It must represent some real-world aspect; like a college or a company etc.
The aspect represented by the database is called its “Mini-world”.

(b) It must comprise a logically coherent collection of data, which should


have well-understood inherent meaning (semantics).

P S Gill
3

(c) The repository of data must be designed, developed and implemented for a
specific purpose. There must exist an intended group of users, who must have
some pre-conceived applications of the data.

A Database System will have the following major organs:-

- Sources of information, from where it derives its data.


- Some related real-world events, which influence its data.
- Some intended users, who would be interested in its data.

For example, in the college database, sources of information will be students, faculty,
labs etc. The real-world events affecting the information in the database will be
admissions, exams, results & placements etc. The set of intended users will be faculty,
students, admin staff etc.

Database Management System (DBMS)

A Database Management System (DBMS) refers to a set of programs for defining,


creation, maintenance and manipulation of a database. A DBMS must facilitate the
following major functions:-

- Defining of Database Schema:- The DBMS must facilitate defining the


database structure i.e. defining of data types, relationships amongst the data and
specification of the integrity constraints to be enforced on the database. It should
also facilitate specifying the access rights of authorized users.

- Manipulation of the Database:- The DBMS must facilitate functions


like:-
Insertion of new data into the database
Update of changed information
Deletion of data, which might have been rendered defunct
Reading of stored information, including generation of reports

- Sharing of a database The DBMS must enable concurrent access of


shared data items by multiple users, while preserving the consistency of
the database.

- Protection of a database The DBMS must protect the database against


unauthorized/ malicious access.

- Database Recovery In the event of system failures, DBMS must facilitate


database recovery.

P S Gill
4

File Processing System

Before the evolution of DBMS, dedicated systems known as “File-Processing


Systems” were in vogue to handle the data repositories of organizations. Such systems
needed a dedicated set of application programs, to add information to the files, to extract
information from the files and to update the existing information. The structure of the
files used to be hard-coded in the application programs. Normally, such application
programs used to be written by different programmers in different languages, as and
when need arose. Also, the same information used to be stored at multiple places, in
different formats, on different machines, which were not even interconnected.

Limitations of a File Processing System

(i) Data Redundancy and Inconsistency:- Since the same information is stored
at multiple places, it causes data-inconsistency problems during updates.

(ii) Difficulty in Accessing of Data Suppose there exists some information in the
files, but the existing set of application programs do not support extraction of that
information. Under such situations, the application programs need to be updated and this
is very inconvenient, time consuming and costly solution.

(iii) Data Isolation The information is scattered over a large number of files,
on a number of stand-alone (not networked) machines, making it very difficult to process
certain queries, which need information to be extracted from multiple locations.

(iv) Difficulty in Enforcing Integrity Constraints: Enforcing of integrity


constraints has to be handled at application program level, making the programs very
complex. The redundancy of information makes this task all the more difficult.

(v) Atomicity Problems: Since the information needed to rollback a transaction may
not be readily available in a file-processing systems, ensuring atomicity of transactions
will be difficult.

(vi) Difficulty in Concurrency Control: It is complex to build in the


concurrency control features at the application programs level.

(vii) Security Problems: Since the information is scattered and does not have
centralized access path, effective enforcing of user access rights will not be fail-proof.

Features of a Database System

A DBMS will support the following features:-

(a) Data Dictionary A Database System will support a Data Dictionary (or Data
Directory or DBMS Catalog), which contains information like Data Types,

P S Gill
5

Relationships amongst the data and Data Constraints of the underlying


database. In addition, it also contains the information about Authorized Users
of the database like their Access Rights. Since, this information defines the
nature of the data stored in the database, it is called metadata (data about
data). This information makes the DBMS software independent of its
underlying database. When a need arises to change the structure of data, no
changes need to be made to the DBMS software; only the dictionary is
updated to reflect the changes. Whereas in a file processing system, the
application programs would need to be changed. Also, this feature makes the
DBMS software generic. The same DBMS can be used for different
organization having entirely different set of data; the distinguishing feature
will be the information stored in the Data Directory. This feature of DBMS is
generally referred to as ‘Self Describing Nature of a Database’, since the
information stored in the Data Dictionary fully describes the nature of the data
stored in the Database.

(b) Storage Management DBMS supports a File Manager to manage the


allocation of disk space for the DBMS files. Also, it supports a Buffer
Manager to manage the memory buffers, used for processing database
information. Whenever, some information is to be updated, it is first read from
the files into the buffer, where it is manipulated and then the updated
information is written back into the files.

(c) Language Interfaces DBMS supports language interfaces with 4GL


languages like PL/SQL for data manipulation applications.

(d) Transaction Management DBMS ensures atomicity of transaction


processing. A Transaction, when executed transforms the database from one
consistent state to another consistent state. During its execution, a log is
maintained in a system Log File of all the operations performed by the
Transaction. If a Transaction fails during its execution, then the log file is
used to rollback the transaction during recovery of the database. This ensures
atomicity of transaction processing.

(e) Concurrency control DBMS will support concurrency control tools for
permitting multiple users or application programs to access the database
concurrently, while preserving the consistency of database.

(f) Security Management Security Mechanism of a Database System will


ensure that only the authorized users can access the database; and that too
only to the extent, which is explicitly authorized by the Database
Administrator. The authorized Access Rights are explicitly stored in the Data
Dictionary. The access by each user and the type of operations performed on
various data will be monitored and controlled by the DBMS. This will protect
database against the authorized/ malicious access.

P S Gill
6

(g) Database Recovery Since DBMS maintains a log of all transactions


being executed, it will enable recovery of the underlying database, in the event
of failures. For example, if a Transaction fails during its execution, it is rolled
back to initial state; thus reverting back to the consistent state that existed
prior to the commencement of the failed transaction. This is made possible by
the information stored in the system log file. Also, DBMS will support taking
of periodic backups, which are used to recover databases in case of
catastrophic failures; like Disk Crash.

DATA MODELS

A Data Model defines the underlying structure of a Database. It comprises a collection of


conceptual tools for describing the Data, the Data Relationships, the Data Semantics and
the Data Integrity Constraints.

CATEGORIES OF DATA MODELS

Basically, there are three categories of data models:-

(a) Object Based Logical Models.


(b) Record Based Logical Models.

(a) Object Based Logical Models.

The Object Based Logical Models view the universe as a collection of objects.
(i) Entity-Relationship Model.

- An Entity refers to a real-world ‘object’ or a ‘concept’ that is distinguishable


from other objects and other concepts in the real-world. For example, a
person, a bank-account, a payment are all entities of different kinds.

- An Entity will have a set of properties, known as Attributes; for example, the
Entity “Account” may have attributes like “Account-Number”, “Current-
Balance” etc

- Each attribute will have a set of permitted values, called its Domain; for
example the domain of Balance of an account can be the set of +ve real
numbers.

- A collection of entities of the same kind, having same set of attributes, is


called an “Entity Set”.

- A relationship refers to the association amongst entities. For example, in a


banking database, an entity ‘Customer’ can have relationship ‘Depositor’ with
another entity ‘Account’.

P S Gill
7

- A set of Relationships of the same kind, having the same set of attributes is
called a Relationship Set.

- A database in E-R Model is modeled as a collection of Entity Sets and


Relationships Sets.

- E-R Model also specifies certain constraints, like Mapping Cardinalities i.e.
whether the relationship is one-to-one, one-to-many, many-to-one or many-to-
many.

- The E-R Diagram below depicts two Entity Sets “STUDENT”, “COURSE”
and a relationship set “RESULT” indicating the marks obtained by students in
different Courses.

S_Name
Sub_Code
S-Address
Roll_No

Marks Sub_Title

STUDENT RESULT COURSE

(ii) Object-Oriented Model. Like the E-R Model, this model also
models a database as a Collection of Objects. An Object Body
encapsulates Data (Variables) as well as Methods (Functions) to
manipulate the Data (Variables). The Objects that contain same Type of
Data Variables and same Type of Functions are grouped together as a
Class. Thus, a Class may be viewed as a Type Definition of the Objects.
The only way an Object “A” can access the Data Items of another Object
“B” is by invoking the Methods of “B”. “A” can accomplish this by
making calls to the methods of “B”, through B’s Interface. The methods
defined within an object are made visible to the external world, through its
Interface.

P S Gill
8

Variables

Functions
Interface

OBJECT

The structure of an object-oriented database is modeled as a set of classes


and database will comprise of objects belonging to those classes.

(b) Record Based Logical Models. These models describe data at the Logical
level, as a collection of fixed-format Records of different types. Each Record Type can
have a fixed number of Fields (or Attributes) and each Field is usually of fixed length.
Use of fixed-length Records simplifies the Physical Level implementation of a database.
The most widely used Record Based Logical Models are:-

(i) Hierarchical Model. This is one of the oldest models, dating back to
1960s. The first commercial DBMS, based on this model, was “Information
Management System” (IMS), released by IBM in 1966. At one time, it was the
most used DBMS. In the Hierarchical Model, the Data is represented as Records;
and the Records are organized as a collection of Trees. The relationships among
the data are represented by Links, which can be viewed as pointers. The tree
structure permits that each record can have only one parent record. Thus, it
permits modeling of only one-to-many relationship (not many-to-many
relationship) amongst the Records.

The following diagram, showing an Academic Database in Hierarchical Model,


represents Records of three types “Course”, “Teacher” and “Student”; and links
indicating relationship “Offered By” from “Course” to “Teacher”- indicating the
faculty offering a course and the relationship “Attended By” from “Course” to
“Student”- indicating the students attending a course.

Course

Offered By Attended By

Student
Teacher

P S Gill
9

HIERARCHICAL MODEL

It does not indicate the relationships “ What are the courses being offered
by a faculty”, “What are courses being attended by a student”, “who are the
students being taught by a faculty” and “who are the faculty teaching a student”.
This is due the limitation of tree structure that a node can have only one parent
node; and thus we can represent only one-to-many relationship but not many-to-
many relationship.

(ii) Network Model. Like the Hierarchical Model, this Model also
models a database as a collection of Records; and the Records are organized as a
collection of arbitrary graphs (or Networks). Thus a Record can have any number
of parent records; and thus supports many-to-many relationship amongst records.
The relationships among the records are represented as links (pointers). Since, this
Model supports many-to-many relationship amongst the records, it is considered
more versatile as compared to Hierarchical Model.

The above database can be better modeled in Network as indicated below. It


contains additional information i.e. relationship “Offers” from “Teacher” to
“Course” and relationship “Attends” from “Student” to “Course”. Since, the
Hierarchical Model can strictly model only Tree Structures, it was not possible to
depict “Offers” and “Attends” in the Hierarchical Model. Also, it depicts
relationships “Teaches” and “Taught By” between “Teacher” and “Taught”.

Course
Offered By Attends
Offers Attended By

Teaches
Teacher Student
Taught By

NETWORK MODEL

(iii) Relational Model. This is most modern and most commonly used
model amongst the Record Based Models. It has been widely accepted. The
Relational Model models a database as a collection of Tables to represent both
data and the relationships amongst the data. Each Table is called a Relation,
which is assigned a unique name. Each relation has a number of Columns,
representing the Fields (or Attributes) of the relation. Each Field is also uniquely

P S Gill
10

named. A Relation (or Table) can have an un-limited number of Rows and each
Row represents an Instance of the Relation. A Row is also termed as a Tuple.
Each Tuple will be unique in a Relation. So, a Relation can be viewed as a set of
Tuples of the same type. The relationships amongst the tables will be modeled as
Foreign Key- Primary Key Relationships.

The “Course-Student-Teacher” Database Schema in Relational Model


will be represented by six Tables- three tables to represent entities i.e. STUDENT
(giving details of all students), TEACHER (giving details of all teachers),
COURSE (giving details of all courses); and three tables to represent relationships
i.e. COURSE-TEACHER (indicating relationships – OFFERED BY and
OFFERS), COURSE-STUDENT (indicating relationships ATTENDS and
ATTENDED BY) and TEACHER-STUDENT (indicating relationships
TAUGHT BY and TEACHES).

STUDENT
Roll_No S_Name Branch Semester Section S_Address

COURSE
Sub_Code Sub_Title Semester Branch Contact_Hrs

TEACHER
Fac_Code Fac_Name Desig Dept Fac_Address

COURSE-TEACHER
Sub_Code Fac_Code

COURSE-STUDENT
Sub_Code Roll_No

TEACHER-STUDENT
Fac_Code Roll_No

P S Gill
11

The Relational Model has become extremely popular because:-

(a) It is extremely simple and easy to implement.


(b) It has a strong mathematical foundation.
(c) It has been highly standardized.

SCHEMAS AND INSTANCES

Schema. Database Schema refers to the overall structure of a database. Once


defined, the schemas are rarely changed. A Database System will have several Schemas,
partitioned according to the levels of its abstraction.

Instance. It refers to the actual collection of data (a Snapshot of data) existing in the
database at a particular moment of time. Since, a database will continuously experience
insertion of new data, deletion of defunct data and update of changed data, the Instance
will be under continuous change.

DATA ABSTRACTION & VARIOUS SCHEMAS OF A DATABASE

There are three levels of data abstraction in a database; and each level is described by a
schema as explained below:-

(a) Physical Level. This is the lowest level of abstraction. At this level, a
Physical Schema describes “how data is physically stored”. The Physical Schema may
describe complex structures, used to store the data, with the sole aim of achieving an
efficient access of the data.

(b) Logical Level. This is the intermediate level of abstraction. At this level, a
Logical Schema (or Conceptual Schema) would describe “what data is stored in the
database” and “what are the relationships amongst the data”. This Schema is used by
Database Administrators, who decide what information is to be kept in the Database. It
would describe the logical structure of database, data types and integrity constraints. As
compared to Physical Level, Database at Logical Level is described by relatively smaller
number of simpler structures. But, the implementation of these simple structures may be
quite complex at the Physical Level. The user operating at Logical Level need not be
aware of the complexities at the Physical Level.

(c ) View Level. This is the highest level of abstraction. At this level, there will be
many Views, defined for different categories of users. A View for a certain group of
users describes “what subset of the database is to be made visible” to that group. A view
will describe only a subset of the underlying database. This is the subset, which the
intended group of users needs to access. There may be many Views, tailored to the
specific needs of various users. At the view level, the main goal is to provide an efficient
and a user-friendly human-interaction with the system. So, the interface at this level is

P S Gill
12

made as simple and user-friendly as possible. A user doesn’t have to be aware of the
complexities at the conceptual level and physical level.

DATA INDEPENDENCE

The ability of a DBMS to modify its Schema definition at one level, without affecting a
Schema definition at the next higher level, is called Data Independence. There are two
levels of Data Independence:-

(a) Physical Data Independence. It is the ability of DBMS to modify the


Physical Schema without causing any changes in the schema at the logical level and at
the view level. Modifications at Physical Level are driven by advancements in hardware
technology and by the requirements to upgrade hardware for improving system
performance.

(b) Logical Data Independence. This refers to the ability of DBMS to modify the
Logical Schema without causing any changes in the application programs at the view
level. Modifications at Logical Level are necessitated by need to alter the Logical
Structure of the database. The Logical Data Independence is much more difficult to
achieve than the Physical Data independence, since the application programs are heavily
dependent on the logical structure of the database.

DATABASE LANGUAGES

A DBMS will support two kinds of languages; one called Data Definition Language
(DDL) to specify the Database Schema and the other called Data Manipulation Language
(DML) to enable accessing and manipulation of the data stored in the database.

(a) DDL. A database schema is specified by a set of definitions expressed in DDL.


In a Relational Database, the result of interpretation of DDL statements will be a set of
Tables that are stored in a special file called Data Dictionary or Data Directory or DBMS
Catalog. This data stored in Data Dictionary is called Metadata i.e. data about data.
Whenever the database is to be accessed, the DBMS will first make a reference to the
Data Dictionary with a view to determine the structure of data to be accessed; only then it
will access the actual data in the database. Thus the data dictionary is accessed during
processing of each query.

The storage structure and access methods used by the database system are
specified by a set of definitions in a special type of DDL called Data Storage and
Definition Language. The result of interpretation of these definitions will be a set of
physical schema structures and a set of access methods supported by the system. These
details are usually hidden from the database-users.

(b) DML. A DML is a language that enables users to access and manipulate the data
stored in the database. A DML query is a statement specifying information to be accessed

P S Gill
13

for retrieval or insert/update/delete. The portion of a DML that involves information


retrieval is called a query language. The goal of a DML is to provide an efficient and
friendly human interface for the following operations in a database:-

(i) Retrieval on information stored in the database.


(ii) Insertion of new information into the database.
(iii) Deletion of information from the database.
(iv) Update of information stored in the database.

There are two types of DMLs:-

(i) Procedural DMLs. A query in procedural DML requires the user to


specify not only “what data is required to be extracted from the database” but also
to specify “how to extract those data”.

(ii) Non-Procedural DMLs. A Query in Non-Procedural DML requires


the user to specify only “what data is needed”, without specifying how to get
those data.

Non-procedural DMLs are easier to learn and to use than the procedural DMLs.
However, since non-Procedural DMLs do not specify “how to get the data”, the
queries in Non-Procedural DMLs may not generate as efficient code as the
equivalent queries in Procedural DMLs. This limitation of Non-Procedural DMLs
is overcome by performing query optimization at the System Level.

P S Gill
14

OVERALL STRUCTURE OF DBMS

Users
Unskilled Application DML DBA
Users Programmers Users

Application Program DML Tools DBA Tools


Interfaces Development
Tools
DML DDL
Statements
4GL Programs Queries

Application Pre-Compiler DML DDL


Programs For 4GL Compiler Interpreter
Object Code Programs

Query Evaluation Engine Query Processor

Buffer Authorization
Manager & Integrity Transaction
Manager Manager

File Manager
Storage Manager

Index Query Evaluation


information Statistics

Application Data Dictionary


Database Data Files (Schema + Access
Rights)
DATA MODELING USING E-R MODEL
Disk Storage

P S Gill
15

Functions of a Database Administrator (DBA)

DBA is the custodian of the Database System placed under his control and is responsible
for the following functions:-

1. Creation of Conceptual Schema and its periodic update to adapt to the changed
requirements.

2. Implementation of efficient Storage Structure and Access Methods.

3. Liaise with the Users to ensure that the information required by the Users is made
available.

4. Ensure system security, through Grant and Revoke of Access Rights to the Users.
A user must have only as much rights as required by his role in the organization- nothing
more, nothing less.

5. To ensure Physical Security of Database against malicious access and accidents


like fire etc.

6. Take periodic backups and keep the archived data safely.

7. Execute immediate recovery procedures in case of failures.

8. Monitor the system performance. In case of degradation in system performance,


perform tuning procedures. If necessary, upgrade the system (hardware / software) to
meet the changed requirements of the organization.

9. Ensure sufficient Disk Space is always available. If needed, upgrade the Disk
Drives to meet the increased requirements.

10. To liaise with the DBMS vendor to obtain necessary technical supports and to
obtain the necessary tools & software upgrades, whenever made available by the vendor.

Characteristics of a Database System, which distinguish it from a conventional File-


Processing System

In a traditional file system, each user defines & implements the files needed for a specific
application, as a part of programming the application itself. Multiple users of the same set
of data will create replicated sets of files, specific to their respective applications. This
redundancy in defining & storage of data results in higher storage costs and database
inconsistencies during updates. On the other hand, in a database approach, a single
repository of data is maintained, which is defined once and then accessed by various
users of the data.

P S Gill
16

The main characteristics of a database approach, which distinguish it from a file-


processing approach are:-

(i) Self-Describing nature of a Database System A database contains not only


the data, but also a complete definition of the data structure, data types & data
constraints. This additional information is called meta-data, which is stored in
a file called Data-Dictionary (also called DBMS Catalog). The information
stored in the Data Dictionary is accessible to the DBMS software. This
additional information makes the DBMS software independent of its
applications. When a new need arises to change the structure of data, no
changes need to be made to the DBMS software; only the meta-data in the
Data Dictionary needs to be changed, to reflect the changes. This feature
enables the DBMS software to be adapted for any application. The same
DBMS will work for a college, a bank or a factory. Whereas in a traditional
file processing system, the application programs would need major changes
while shifting from one application to another.

(ii) Data Abstraction In a traditional file processing system, the structure of the
data files is hard coded in the application programs; thus any changes in
structure would need the related application programs to be modified
accordingly. Whereas in a Database System, the application programs are
insulated from the data stored in the database. The application programs are
only concerned with ‘what data’ is stored in the database and not concerned
with ‘how the data is stored’. As long as the contents of data remain
unchanged, the database structure can be changed, without affecting the
existing application programs. This feature is called Data Abstraction.

(iii) Support for Multiple Views of the Data Depending on different


needs and different levels of authorizations, different users would be provided
different perspectives of the same data, called Views. A View refers to a
subset of the stored data or a set of Virtual Data i.e. data derived from the
stored data. A View is not explicitly stored in the Database; only its Definition
is stored in the DBMS Catalog. Whenever a user or a program submits a
query to access a View, the View is instantly computed and presented to the
User or the Program. Next time, when the same view is again accessed, it is
re-computed fresh.

(iv) Multi-User Access & Concurrency Control A Multi-User DBMS allows


multiple users to access the same database concurrently. This is achieved by
including Concurrency Control Software in the DBMS, to ensure that
database remains consistent, despite access by multiple users concurrently.

(v) Effective System Protection through grant of Access Rights Access Rights
are granted to the users, to the extent required for their roles in the
organization. These rights are stored in the data dictionary itself. When a
query is to be processed, the DBMS will first ensure that the user submitting

P S Gill
17

the query has sufficient rights for the processing of that query; only then the
query is processed.

(vi) Support for efficient Recovery. When a system is restarted after a failure,
log-based recovery recovers the database efficiently.

Advantages of using a DBMS vis-à-vis File Processing System

(a) Controlling Redundancy While designing a database, various Views


of different users are integrated into a single database, thus controlling
redundancy. This results in reduced effort and reduced storage space. Also, it
ensures database consistency, in case of updates.

(b) Restricting unauthorized access The user access rights are stored in
the data dictionary. Whenever, any query is received from any user, it is
checked for valid access rights. If access rights exist, the query is processed
else it is rejected as ‘Invalid Query’. This prevents unauthorized access of
data.

(c) Providing Multiple User-Interfaces A DBMS provides various types of


user interfaces for various categories of users:-

- Query Languages (like SQL) for skilled users

- Programming Languages (like PL/SQL) for application programmers

- Menus, Forms for Naive Users

- DDL for Database Administrator

(d) Enforcing of Data Integrity Constraints The Data Integrity


Constraints are stored in the data dictionary itself. Whenever, some data is
inserted/updated/deleted, the data constraints are automatically applied to the
related data items and invalid operations are rejected.

(e) Supporting Concurrent Access A DBMS supports concurrent access by


multiple users. Despite concurrent access by multiple users, database
consistency is maintained.

(f) Providing backup & recovery A DBMS supports data backup & recovery
in case of failures.

(g) Reduced Application Development Time Development time of a new


application using DBMS is of the order of 15 – 25% as compared to the time
needed in development of equivalent applications in a traditional file
processing system.

P S Gill
18

(h) Easy Adaptability A database system can be easily adapted to changed


requirement, with minimal time and cost implications.

(i) Potential for enforcing Standards It permits the Database


Administrator (DBA) to define & enforce standards among the database users.
The standards can be defined for naming conventions, formats of data items,
display formats or report structures etc.

P S Gill
19

Exercises

Ex.1.1 Explain three level of data abstraction. Distinguish between Physical Data
Independence and Logical Data Independence. Which is more difficult to achieve and
why?

Ex.1.2 Explain the characteristics of DBMS that distinguish it from a File


Processing System. Explain how the application development is mush shorter in a DBMS
environment than in a File Processing Environment.

Ex.1.3 Compare the three data models: Hierarchical, Network and Relational.
What are the distinguishing features of Relational Model that make it so popular?

Ex.1.4 Distinguish between:-

(a) DDL & DML


(b) Schema & Instance
(c) Procedural DML & Non-Procedural DML

Ex.1.5 Explain the role of the following components of DBMS:-

(a) DML Compiler


(b) Query Processing Engine
(c) Buffer Manager
(d) Transaction Manager

Ex.1.5 What is the roe of a Data Dictionary in DBMS? How does this feature
make the DBMS independent of the underlying database?

Ex.1.6 Explain major functions of a Database Administrator (DBA)?

Ex.1.7 Explain what is implied by the statement“ In DBMS, views of different


users can be integrated into a single database”.

Ex.1.8 What is meant by “Self-describing nature of a database”?

Ex.1.9 Compare Procedural DMLs and Non-Procedural DMLs from the


viewpoints of (i) User Friendliness (ii) Query Optimization.

P S Gill
20

CHAPTER 2

ENTITY-RELATIONSHIP MODELING
The Entity Relationship Model (ER Model) models the real world situations as a
collection of entities and relationships amongst the entities.

Entity An Entity is an object (like a “CAR”) or a concept (like an “ACCOUNT”)


from the real world, which is distinguishable from other objects and other concepts. Each
Entity will be defined by a set of properties (called Attributes). For example entity
“ACCOUNT” may be defined by Attributes like “ACCOUNT-NUMBER”, “BRANCH-
NAME” and “BALANACE” etc.

Entity Set An Entity-Set refers to a collection of entities of the same kind. Each
entity in an Entity-Set will have the same set of attributes and the set of attributes will
distinguish it from other Entity Sets. No other entity set will have exactly the same set of
attributes. Some of the attributes of an entity set may overlap with other entity sets.

Relationship A Relationship refers to an association amongst Entity Sets. Like there


may be relationship “DEPOSITOR” between Entity Set “CUSTOMER” and Entity Set
“ACCOUNT”.

Relationship Set A Relationship Set refers to the collection of Relationships of the


same kind (i.e. having exactly same set of Attributes). A Relationship Set will inherit
some of the Attributes (properties) of the associating Entity Sets. Like the Relationship
Set “DEPOSITOR” between Entity Sets “CUSTOMER” and “ACCOUNT” will inherit
Attributes “CUSTOMER-ID” from “CUSTOMER” and Attribute “ACCOUNT-
NUMBER” from “ACCOUNT”. In addition, a Relationship Set may have some of its
own attributes called “Descriptive Attributes”; for example the relationship set
“DEPOSITOR” may have a descriptive attribute “DATE-OF-OPERATION”, indicating
the date on which a customer has last operated an account.

Domain of an Attribute

Each attribute has a set of permitted values called its domain or value set, like the
attribute ‘NAME’ may have a domain that is set strings of characters of specified
maximum length.

A database will consist of a set of Entity-sets and Relationship-Sets, each of


which will contain a number of entities of the same type or Relationships of the same
type. An entity in a database may be described by a set of (attribute, data value) pairs;
like a student in Entity-Set “STUDENT” may be described by {(ROLL-NUMBER,
0990013010), (NAME, ‘Karan Singh’), (DATE-OF-BIRTH, ‘10-DEC-1985’)}.

P S Gill
21

Attribute Types:-

(i) Simple Vs Composite Attributes. A Simple attribute is the one, which


is not divisible into sub-parts like ‘BRANCH’. On the other hand, a Composite
attribute is the one, which can be divided into sub-parts like ‘DATE-OF-BIRTH’,
which may be divided into ‘birth-date’, ‘birth-month’ & ‘birth-year’.

(ii) Single-Valued Vs Multi-Valued Attributes. An attribute, which


can assume one value at a time, is called Single-Valued attribute; like ‘name’ of
an EMPLOYEE entity. On the other hand, an attribute, which may assume a set
of values at a time, is called multi-valued attribute; like attribute ‘dependant’ of an
Attribute Set “EMPLOYEE”, which may have none or one or multiple values,
depending upon the number of dependents of an employee.

(iii) Null Attribute. A null value is assigned to an attribute under any of


the following three conditions:-

(d) If the attribute value is not applicable to an entity; like SPOUSE-NAME


will not be applicable if an employee is unmarried.

(e) If value is applicable, but not specified; like TEL#- an employee may not
be owning a Telephone.

(f) If value is applicable and specified but not known to the agency entering
the information; like an employee may be owning a Telephone but the
number may not be known to the organization.

Null value can only be assigned to an Attribute, if assigning value to that attribute
is optional (not mandatory). The Mandatory attributes cannot be assigned a
“Null” value.

(iv) Derived Attribute Vs Stored Attribute. A derived attribute is the one,


whose value is not stored in the database, but is derived from the value of other
stored attributes; like the value of attribute ‘age’ can be derived from attribute
‘date-of-birth’ and current date obtained from the system.

Degree of Relationship Sets. Degree of a Relationship Set refers to the number of


Entity Sets participating in the Relationship. Most of the relationships are binary.

E-R Diagram Notations

Rectangle represents an entity set.

P S Gill
22

Ellipse represents an attribute.

Diamond represents a relationship set.

Line links an attribute to an entity set or an entity set to a


relation set.

Double Line indicates total participation of an entity set in a


relation set.

Dashed Ellipse indicates derived attribute.

Double Ellipse indicates multi-valued attribute.

Double Rectangle indicates weak entity set.

Double Diamond indicates a relationship set with participation


of some weak entity sets.

RELATIONSHIP constraints

- Mapping Cardinalities
- Participation Constraint

Mapping Cardinalities. For a binary relationship set R between entity sets A and B,
the mapping cardinalities can be on of the following:-

(a) One-to-one. An entity in A is associated with at most one entity in B and an


entity in B is associated with at most one entity in A. It is represented in E-R Model as
follows:-

R
A B

One-to-one cardinality is represented by directed lines drawn from R to A & B both.

P S Gill
23

(b) One-to-many. One to many cardinality from A to B implies than an entity in A is


associated with any number (Nil/ one/ many) of entities in B; however, an entity in B is
associated with at most one entity in A. It is represented in E-R Model as follows:-

R
A B

(c) Many-to-one. Many to one cardinality from A to B implies that an entity in A is


associated with at most one entity in B; however, one entity in B can be associated with
any number of entities in A. It is represented in E-R Model as follows:-

R
A B

(d) Many-to-many. Many to many cardinality from A to B implies that an


entity in A can be associated with any number of entities in B and one entity in B can be
associated with any number of entities in A. It is represented in E-R Model as follows:-

R
A B

Example:-

One-to-One relationship from CUSTOMER to ACCOUNT implies that each customer


can have only one account and each account has to be Single.

DEPOSITOR
CUSTOMER ACCOUNT

(One-to-One Relationship)

One-to-Many relationship from CUSTOMER to ACCOUNT implies that each customer


can have any number (NIL or One or More than One) of accounts, but each account has
to be Single.

DEPOSITOR
CUSTOMER ACCOUNT

(One-to-Many Relationship)

Many-to-One relationship from CUSTOMER to ACCOUNT implies that each customer


can have only one accounts, but each account can be Joint (held by one or more).

P S Gill
24

DEPOSITOR
CUSTOMER ACCOUNT

(Many-to-One Relationship)

Many-to-Many relationship from CUSTOMER to ACCOUNT implies that each


customer can have any number (Nil or One or More than One) accounts and each account
can be Joint (held by one or more).

DEPOSITOR
CUSTOMER ACCOUNT

(Many-to-Many Relationship)

Participation Constraints in Relationship Sets

- Total Participation
- Partial Participation

Total Participation

An Entity Set E is said to have total participation in relationship set R if each entity in E
is participating at least in one relationship through R. In E-R Diagram, the Total
Participation is represented by a “Double Line” drawn between the Entity Set symbol and
the Relationship Set symbol.

Partial Participation

An Entity Set E is said to have partial participation in relationship set R if some of the
entities in E are not participating in any relationship through R. In E-R Diagram, the
Partial Participation is represented by a “Single Line” drawn between the Entity Set
symbol and the Relationship Set symbol.

Example:- Suppose Entity Sets “CUSTOMER” and “ACCOUNT” are related by


Relationship Set “DEPOSITOR” and Entity Sets “CUSTOMER” and “LOAN” are
related by Relationship Set “BORROWER”. Suppose it is possible that a customer may
have only account or only loan or both, then the situation can be modeled as follows:-

P S Gill
25

CUSTOMER Partial DEPOSITOR Total Participation ACCOUNT


Participation
Partial Participation

BORROWER

Total Participation

LOAN

Concept of Key
Super Key. A Super Key of an Entity Set or Relationship Set refers to the set of
attributes, which when taken collectively, will uniquely determine an entity within the
Entity Set or a Relationship within the Relationship Set. If K forms a Super Key (SK) of
an Entity Set E then any super set of K will also be a Super Key of E. So, a Super Key
may have some extraneous (unnecessary) attributes, which if removed, the balance set
may still form a Super Key of R.

Example :- Suppose each student in the Entity Set STUDENT (ROLL_NO, NAME,
BRANCH, FATHERS-NAME, ADDRESS, DOB, TEL-NO) has a unique value of
ROLL-NO. This implies that no two students can have same ROLL-NO. Then {ROLL-
NO, NAME} forms a super key of Entity-Set STUDENT. In this, the attribute NAME is
extraneous; which if removed, the balance set i.e. {ROLL-NO} still forms a Super Key of
STUDENT.

Candidate Key. A Super Key, whose no proper subset forms a Super Key, is called
a Candidate Key. Thus, Candidate Key is a minimal Super Key (i.e. a Super Key having
no extraneous attributes). An Entity Set may have more than one Candidate Keys.

Example:- The Entity Set STUDENT will have at least two Candidate Keys i.e.
{ROLL-NO} and {NAME, FATHERS-NAME, DOB, ADDRESS}.

Primary Key. Primary Key is one of the Candidate Keys that is designated by the
database designers as primary means of identifying entities within an entity set. In the E-
R Diagram, the Primary Key Attributes are underlined with a firm line.

P S Gill
26

Primary Key of a Relationship Set

Let R be a binary relationship set between Entity Sets E 1 and E2. Let K1 and K2 be the
respective Primary Keys of E1 and E2. Then the Primary Key of Relationship Set R will
depend upon the cardinality mapping of the relationship set, as explained below:-

(i) One to One Relationship


PK (R) = PK (E1)
or
= PK (E2)

(ii) One to Many Relationship from E1 to E2


Here E2 is called “Many-Side” Entity Set and E1 is called “One-Side” Entity-Set.
PK (R) = PK (E2) i.e. Primary Key of “Many-Side” Entity-Set.

(iii) Many to One Relationship from E1 to E2


Here E1 is called “Many-Side” Entity Set and E2 is called “One-Side” Entity-Set.
PK (R) = PK (E1) i.e. Primary Key of “Many-Side” Entity-Set.

(iv) Many to Many Relationship from E1 to E2

PK (R) = PK (E1)  PK (E2)


Example
CN AN
(i)

DEPOSITOR
CUSTOMER ACCOUNT

(One-to-One Relationship)
PK (DEPOSITOR) = CN or AN

CN AN
(ii)

DEPOSITOR
CUSTOMER ACCOUNT

(One-to-One Relationship)

PK (DEPOSITOR) = AN i.e. PK of ACCOUNT

P S Gill
27

CN AN
(iii)

DEPOSITOR
CUSTOMER ACCOUNT

(One-to-One Relationship)

PK (DEPOSITOR) = CN i.e. PK of CUSTOMER

CN AN
(iv)

DEPOSITOR
CUSTOMER ACCOUNT

(One-to-One Relationship)

PK (DEPOSITOR) = {CN, AN} i.e. PK (CUSTOMER)  PK (ACCOUNT)

Concept of Weak Entity Set

An Entity Set is said to be a Weak Entity Set if it does not have sufficient attributes to
form its Primary Key. On the other hand, an entity set having a primary key of its own is
called a Strong Entity Set. A Weak Entity Set (say E 2) will be dependent for its existence
on a Strong Entity Set (say E 1) to form its Candidate Key. Then Entity Set E2 is said to be
“Existence-Dependent” on E1 and E1 is said to be the “Owner Entity Set” of E 2. The
relationship R between E2 and E1 is called “Identifying Relationship”. The Weak Entity
Set E2 will have a set of attributes called its “Discriminator”, which together with the
Primary Key of E1 will form the Primary Key of E2.

E1
E2

Owner Entity Set Identifying Relationship Weak Entity Set

P S Gill
28

Example:-Suppose an Entity Set EMPLOYEE (EMP_ID, EMP_NAME, SALARY,


DEPENDENTS) has an attribute DEPENDENT which is multi-valued i.e. an employee
may have none or one or more than dependents. This situation can be best modeled as
follows:-
EMP-NAME
D-NAME RELATION

EMP-ID
DOB
SALARY

EMPLOYEE DEPENDENT

Owner Entity Set Identifying Relationship Weak Entity Set

- The Weak Entity Set DEPENDENT is Existence Dependent on the Strong Entity
Set EMPLOYEE.

- The Weak Entity Set “DEPENDENT” has a Discriminator Attribute D-NAME,


which along with primary key EMP-ID of EMPLOYEE, forms Primary Key of the weak
entity set DEPENDENT. In E-R Diagram, the Discriminator (also called Partial Key) of
a weak entity set is marked by underlining with a broken line.

Special Features of an Identifying Relationship


Normally, a situation modeled by Weak Entity Set will have following features:-

(i)The Identifying Relationship will be one-to-many from Owner Entity Set to Weak
Entity Set.

(ii) The Participation of Owner Entity Set in the Identifying Relationship will be
partial and the participation of the Weak Entity Set in the Identifying Relationship
will be Total.

Modeling of a Multi-valued Attribute as Weak Entity Set

In the above example, the Weak Entity Set DEPENDENT can also be modeled as a
multi-valued attribute of Entity Set EMPLOYEE. The multi-valued attribute can be used
to indicate the names of the dependents of employees. But suppose we want to indicate
other parameters of dependents like dependent’s relationship with the employee then the
multi-valued approach will not be suitable. In this case, the Weak Entity approach will be
the ideal choice, since then the weak entity set DEPENDENT can have any number of
attributes.

P S Gill
29

Extended E-R Features


Specialization. An entity set E may include some sub-groups of entities (say E1,
E2, ….. En), such that each of these sub-groups may have some distinct attributes
different than the other sub-groups. There will be some attributes that will be common to
all sub-groups. The process of designating these sub-groups within an entity set is called
specialization;
A1 A2

Higher Level Entity Set E


Or Super Class

A2 ISA C2
C1
A1 B1

En
E1 E2

Lower Level Entity Sets or Sub Classes

In the above example, an Entity Set E has been specialized into Sub-groups designated as
E1 , E2 ….. En. E is called “Super Class” or “Higher Level Entity Set” and the entity sets
E1 , E2 ….. En are called “Sub Classes” or “Lower Level Entity Sets” of E. The common
attributes of all sub entity sets are represented with the super entity sets. And the distinct
attributes of each sub entity set are represented with the sub entity set.

The relationship of Higher Level Entity Set with its Lower Level Entity Sets is called
ISA relationship. It is read as “is a”.

Inheritance of Attributes in Specialization

Each Sub Class will inherit the Attributes of its Super Class; plus it will have its own
distinct Attributes. Like in the above case, each lower entity set will inherit attributes A1
and A2 of the Super Class E.

Example:- Consider an entity set ACCOUNT with attributes Account-Number and


Balance. The Entity Set ACCOUNT may be specialized into different types of accounts
like SAVINGS-ACCOUNT, CURRENT-ACCOUNT, FIXED-DEPOSIT (FD) and
RECURRING-DEPOSIT (RD). The SAVINGS-ACCOUNT may have an attribute Interest-
Rate and CURRENT-ACCOUNT may have attribute Over-Draft. Similarly, FD and RD
have distinct attributes of their own.

P S Gill
30

Account-Number Balance

ACCOUNT Mat-Date

Int-Rate Installment
Interest-Rate
ISA

RD
Over-Draft
SAVINGS-
ACCOUNT Int-Rate

Mat-Date
CURRENT-
ACCOUNT
FD

Specialization Constraints

Disjoint Vs Overlapping Specialization

Disjoint. It implies that an entity does not belong to more than one lower-
level entity set i.e. an account is either savings-account or current-account but not
both.

Overlapping. In overlapping generalizations, an entity may belong to more than


one lower-level entity sets within a single generalization.

Total Vs Partial Specialization

Total Each higher level entity must belong to a lower-level entity set.
Partial. Some higher-level entities may not belong to any lower-level entity set.

Generalization. Specialization is a top-down approach; whereas Generalization is


exactly inverse of that. Generalization refers to the process of fusing several distinct
entity sets into a single Higher Level Entity Set, on the basis of commonality of their
attributes. Then the fused sets form sub classes or lower level entity sets. The common
attributes of the Lower Level Entity Sets will be assigned to the Higher Level Entity Set.
Thus, generalization is a process, which proceeds in a bottom-up manner, in which
multiple entities are synthesized into a single higher-level entity set, on the basis of their
common features. The higher-level entity set is termed as super-class and lower level
entity set is termed as sub-class. As regards E-R Diagram, both Specialization and
Generalization are represented exactly in the same manner.

P S Gill
31

Aggregation. One limitation of E-R Model is that it fails to express relationships


among relationship sets or relationship between a relationship set on one side and an
entity set on the other side. Aggregation provides a solution in this case. Aggregation is
an abstraction through which relationships are treated as higher-level entities, which can
then participate in relationships with other Entity Sets or with other relationship sets. For
example the relationship between R1 and E 3 as indicated below.

A1 A2 B1 B2

E1 R1 E2

Aggregated Higher Level Entity Set “R1”

R2

E3

C1 C2 C3

Here, the Relationship Set R1 between Entity Set E1 and Entity Set E2 has been
aggregated as Higher Level Entity Set “R1”. This Higher Level Entity Set is participating
in a Relationship R2 with Entity Set E3. Thus, through aggregation, we are able to
represent a Relationship between Relationship Set R1 and Entity Set E3.

Example:- Suppose, we have Entity Sets “EMPLOYEE”, “BRANCH” and “JOB”


which are related through a Relationship “EBJ” which indicates, “which employee” is
performing “what jobs” at “which branch”. There will be multiple jobs at each at each
branch and assume that each employee may be performing multiple jobs at one of the
branches. Suppose, we want to relate another Entity Set “MANAGER” to indicate:-

(i)The set of Employees managed by a manager.

(ii)The set of jobs managed by a manager.

(iii)The Branches managed by a manager (assume a manager can manages only


one branch).

P S Gill
32

If we represent this scenario without use of aggregation, then the E-R Diagram will be as
follows:-
BRANCH
EMPLOYEE
EBJ

JOB
EM BM

JM

MANAGER

The above Scenario can be better modeled by aggregating the Relationship Set “EBJ” a a
higher level Entity Set and the creating a relationship between this higher level entity set
and the Entity Set “MANAGER”, as indicated below:-

BRANCH

EMPLOYEE JOB
EBJ

Aggregated Higher-Level-Entity-Set “EBJ”

EBJM

MANAGER
P S Gill
33

This modeling represents the situation more realistically, wherein the Relationship Set
“EBJM” indicates “which combinations of employee-branch-job” are being managed by
each manager.

Reduction of E-R Schema to Tables


An E-R Diagram can be reduced to a set of Tables, as explained below:-

(a) Tabular representation of a Strong Entity Set. A Strong Entity Set E will be
represented by a Table named “E”. The Table will have columns as follows:-

(i) Simple, Single-valued Attributes There will be a column for each


simple, single-valued attribute of Entity Set E.

(ii) Composite Attributes There will be a column for each sub-part of


a Composite Attribute; no column needs to be assigned for composite attribute as
such. For example for NAME comprising of First Name (FN), Middle Name
( MN) and Last Name (LN) there will be three columns for FN, MN and LN. No
column needs to be assigned for NAME. If NAME needs to be produced, it can
be done by combining the sub-parts.

(iii) Derived Attributes No column needs to be assigned for the derived


attributes; since the values of these attributes are not stored in database.

(iv) Multi-Valued Attribute Each Multi-Valued Attribute (say M) will be


represented by a separate Table (say named E-M) which will have a column each
for the primary key attributes of E and a column for Attribute M. Each value of
the multi-valued attribute will be represented in a separate row in this table.

Let E be a Strong Entity Set with simple single-valued attributes a1,a2,……,an. This
Entity Set will be represented by a Table called E with n distinct columns, each of which
will correspond to one of the attributes. Let D1,D2,…Dn be the domains of attributes
a1,a2,….,an respectively. The Table E will comprise of a set of rows, which will be a
subset of the Cartesian Product D1 X D2 X…….Dn.

P S Gill
34

Example Age
DOB Tel_No

Name
Univ_Roll_No City

Street
H-No Pin
STUDENT

Address

The derived attribute Age will not be represented in the STUDENT table. When required,
its value will be derived from DOB.

The Tel-No will be represented in a separate table (say named STUDENT-TEL-NO),


which will have a column for Primary Key of STUDENT i.e. Roll-No and a column for
Tel-No. Suppose, a student has more than one Tel-No then his Roll-No will appear that
many times in this table.

The Above E-R Diagram will be reduced to following two Tables:-

STUDENT
Univ_Roll_No Name DOB H-No Street City Pin

STUDENT-TEL-NO
Univ_Roll_No Tel_No

(b) Tabular representation of Relationship Sets. Let R be a Relation Set and


let a1, a2,…..am be the set of attributes formed by the union of the primary keys of all the
Entity Sets participating in Relation R and let the descriptive attributes of R (if any) be
b1,b2,…..bn. Then the Relation R will be represented by a Table named say “R”, which
will (m+n) columns, each column representing one of the attributes from the set {a 1,
a2,……am} U {b1,b2,….bn}.
Date-of-Operation

P S Gill
35

Example
C-Address
Account-No Branch-Name
C-Id

Balance
C-Name

ACCOUNT
CUSTOMER DEPOSITOR

The Relationship Set DEPOSITOR will be represented by a table named DEPOSITOR.


The Entity Sets CUSTOMER and ACCOUNT have Primary Keys C-Id and Account-No
respectively, which will also form part of the DEPOSITOR table. In addition, the
DEPOSITOR table will have a column for its Descriptive Attribute “date-of-Operation”.
The above E-R Diagram will be reduced to the following set of tables:-

CUSTOMER
C-Id C-Name C-address

ACCOUNT
Account-Number Balance Branch-Name

DEPOSITOR
C-Id Account-Number Date-of-Operation

Shifting of Descriptive Attributes of a Relationship Set and Merging of Relationship


Set Table with the tables of participating Entity Sets. Depending on the Cardinality
Mapping of the participating Entity Sets, the Descriptive Attributes of the Relationship
set can be shifted to one of the participating Entity Sets. Also, a Relationship Set Table
can be combined with the table of one of the participating Entity Sets, as per the
following conditions:-

P S Gill
36

(1) One-to-One Relationship Suppose there is a One-to-One relationship between


two entity sets, then the rows in the Relationship Set table will have one-to-one mapping
with the rows in the tables of the participating entity sets. Under this condition, it is
possible to shift the descriptive attributes of the relationship set to any of the participating
Entity Sets and also it is possible to merge the table of the Relationship Set with the table
of any of the participating Entity Sets, without loss of any information.

Example:-

Date-of-Operation

C-Address
Account-No Branch-Name
C-Id

Balance
C-Name

ACCOUNT
CUSTOMER DEPOSITOR

As indicated above, there is One-to-One Relationship between CUSTOMER and


ACCOUNT i.e. Each Customer has at most one account and each account is “Single”
(i.e. owned by only one customer).
CUSTOMER
C-Id C-Name C-address
C-001 Ajay 320, Sector-26, Noida
C-220 Vijay 110,Sector-8, RKP
C-310 Ram 120,Sector-25, Noida
C-505 Shyam 303,Sector-22, RKP

ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26
A-305 50000 CP
A-310 25000 RKP

P S Gill
37

DEPOSITOR
C-Id Account-Number Date-of-Operation
C-001 A-310 10-Jan-2007
C-220 A-101 23-Dec-2006
C-310 A-203 03-Feb-2007
C-505 A-305 27-Dec-2007

As obvious, the rows in DEPOSITOR table are having one-to-one mapping with the rows
in the CUSTOMER Table and also with the rows in the ACCOUNT Table. That is, the
first row of DEPOSITOR maps onto the fourth row of ACCOUNT, the second row of
DEPOSITOR maps onto the first row of ACCOUNT, the third row of DEPOSITOR
maps onto the second row of ACCOUNT and the last row of DEPOSITOR maps onto the
third row of ACCOUNT. Thus, the descriptive attribute Date-Of-Operation of the
Relationship Set DEPOSITOR can be shifted to either CUSTOMER or ACCOUNT.
Also, the DEPOSITOR Table can be combined either with the CUSTOMER Table or
with the ACCOUNT Table, without losing any information. The combined table will
have union of the columns of the two merged tables. Suppose, DEPOSITOR Table is
merged with the CUSTOMER Table, then the CUSTOMER Table will also include
attributes Account_Number and Date_Of_Operation . The resulting set of tables will then
be:-

CUSTOMER
C-Id C-Name C-address Account- Date-of-
Number Operation
C-001 Ajay 320, Sector-26, Noida A-310 10-Jan-2007
C-220 Vijay 110,Sector-8, RKP A-101 23-Dec-2006
C-310 Ram 120,Sector-25, Noida A-203 03-Feb-2007
C-505 Shyam 303,Sector-22,RKP A-305 27-Dec-2007

ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26
A-305 50000 CP
A-310 25000 RKP

The combined CUSTOMER Table now includes the Primary Key (AN) of ACCOUNT
and descriptive attribute Date_Of_Operation of DEPOSITOR.

(2) One-To-Many Relationship Suppose there is a One-to-Many relationship


between CUSTOMER and ACCOUNT i.e. each customer can have many accounts, but
each account has to be single.

P S Gill
38

Date-of-Operation
Example
C-Address
Account-No Branch-Name
C-Id

Balance
C-Name

ACCOUNT
CUSTOMER DEPOSITOR

CUSTOMER
C-Id C-Name C-address
C-001 Ajay 320, Sector-26, Noida
C-220 Vijay 110,Sector-8, RKP
C-310 Ram 120,Sector-25, Noida
C-505 Shyam 303,Sector-22,RKP

ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26
A-305 50000 CP
A-310 25000 RKP
A-550 35000 CP
A-670 60000 Sec-18

DEPOSITOR
C-Id Account-Number Date-of-Operation
C-001 A-310 10-Jan-2007
C-220 A-101 23-Dec-2006
C-310 A-203 03-Feb-2007
C-505 A-305 27-Dec-2007
C-101 A-550 22-Dec-2006
C-310 A-670 01-Jan-2007

The rows in the DEPOSITOR table have one-to-one mapping onto the rows in
ACCOUNT Table i.e. with the “Many-Side Entity Set” Table. That is, the first row of
DEPOSITOR maps onto the fourth row of ACCOUNT, the second row of DEPOSITOR
maps onto the first row of ACCOUNT, the third row of DEPOSITOR maps onto the
second row of ACCOUNT, the fourth row of DEPOSITOR maps onto the third row of
ACCOUNT, the fifth row of DEPOSITOR maps onto the fifth row of ACCOUNT and
the last row of DEPOSITOR maps onto the last row of ACCOUNT table. Thus, the

P S Gill
39

descriptive attribute Date-Of-Operation can be shifted to ACCOUNT (The “Many-Side”


Entity Set) and the DEPOSITOR Table can be with the ACCOUNT Table (i.e. with the
table of the “Many-Side” Entity Set), without losing any information. The resultant
ACCOUNT table will also include the Primary Key C-Id of CUSOMER table and
descriptive attribute DOO of the DEPOSITOR table. The resulting set of tables will then
be:-

CUSTOMER
C-Id C-Name C-address
C-001 Ajay 320, Sector-26, Noida
C-220 Vijay 110,Sector-8, RKP
C-310 Ram 120,Sector-25, Noida
C-505 Shyam 303,Sector-22,RKP

ACCOUNT
Account- Balance Branch-Name Customer_Id Date_of_Operation
Number
A-101 10000 Sec-18 C-220 23-Dec-2006
A-203 30000 Sec-26 C-310 03-Feb-2007
A-305 50000 CP C-505 27-Dec-2007
A-310 25000 RKP C-101 10-Jan-2007
A-550 35000 CP C-101 22-Dec-2006
A-670 60000 Sec-18 C-310 01-Jan-2007

(3) Many-to-One Relationship Suppose there is many-to-one relationship between


CUSTOMER and ACCCOUNT, which implies that each account can be “Joint” but each
customer can hold only one account. In this case, the table DEPOSITOR can be
combined with “Many-Side” Entity-Set table CUSTOMER.

Date-of-Operation
Example
C-Address
Account-No Branch-Name
C-Id

Balance
C-Name

ACCOUNT
CUSTOMER DEPOSITOR

CUSTOMER
C-Id C-Name C-address
C-001 Ajay 320, Sector-26, Noida
C-220 Vijay 110,Sector-8, RKP

P S Gill
40

C-310 Ram 120,Sector-25, Noida


C-505 Shyam 303,Sector-22, RKP

ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26

DEPOSITOR
C-Id Account-Number Date-of-Operation
C-001 A-101 10-Jan-2007
C-220 A-203 23-Dec-2006
C-310 A-101 03-Feb-2007
C-505 A-203 27-Dec-2007
The rows in the DEPOSITOR table have one-to-one mapping onto the rows in
CUSTOMER Table i.e. with the “Many-Side Entity Set” Table. Thus, the descriptive
attributes of DEPOSITOR can be shifted to “Many-Side” Entity Set CUSTOMER and
the DEPOSITOR Table can be with the CUSTOMER Table, without losing any
information. The resultant CUSTOMER table will also include the Primary Key
Account_Number of ACCOUNT table and descriptive attribute DOO of the
DEPOSITOR table. The resulting set of tables will then be:-

CUSTOMER
C-Id C-Name C-address Account_Number DOO
C-001 Ajay 320, Sector-26, Noida A-101 10-Jan-2007
C-220 Vijay 110,Sector-8, RKP A-203 23-Dec-2006
C-310 Ram 120,Sector-25, Noida A-101 03-Feb-2007
C-505 Shyam 303,Sector-22, RKP A-203 27-Dec-2007

ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26

(4) Many-to-Many Relationship Suppose there is many-to-many relationship


between CUSTOMER and ACCCOUNT, which implies that each account can be “Joint”
but each customer can hold many accounts. In this case, the table DEPOSITOR cannot be
combined with any Entity Set and it has be created as a separate table. Since, if we
combine then we have to combine with both the Entity Sets and that would add
unnecessary data redundancy, which is not acceptable.

P S Gill
41

Date-of-Operation
Example
C-Address
Account-No Branch-Name
C-Id

Balance
C-Name

ACCOUNT
CUSTOMER DEPOSITOR

CUSTOMER
C-Id C-Name C-address
C-001 Ajay 320, Sector-26, Noida
C-220 Vijay 110,Sector-8, RKP
C-310 Ram 120,Sector-25, Noida
C-505 Shyam 303,Sector-22, RKP

ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26
A-305 50000 CP
A-310 25000 RKP

DEPOSITOR
C-Id Account-Number Date-of-Operation
C-001 A-101 10-Jan-2007
C-220 A-203 23-Dec-2006
C-310 A-101 03-Feb-2007
C-505 A-203 27-Dec-2007
C-101 A-305 30-Dec-2007
C-505 A-310 02-Jan-2007

Now, the rows in the DEPOSITOR table do not have one-to-one mapping with
CUSTOMER table and also with the ACCOUNT table. So, the DEPOSITOR table can
neither be merged with CUSTOMER table nor with ACCOUNT table. Thus, there has to
be a separate table for DEPOSITOR as indicated above. Also, the descriptive attributes of
the Relationship Set cannot be shifted to the participating Entity Sets; the descriptive
attributes have to remain with the relationship set itself.

P S Gill
42

(c) Tabular representation of Weak Entity Sets. Let A be a Weak Entity Set
with descriptive Attributes a1,a2,……,am. Let B be the Strong Entity Set on which A is
existence dependent. Let the primary key of B consist of attributes b1,b2,….bn. The
Entity Set A is represented by a Table called A with (m+n) columns, each column
representing one of the attributes from the set {a1,a2,……am} U {b1,b2,….bn}.

Example: Payment-Date
Payment-No
Amount
Loan-No Installment

LOAN LOAN- PAYMENT


PAYMENT

There will be Tables LOAN and PAYMENT; the PAYMENT table will also include the
Primary Key of Loan i.e. Loan-No. The Primary Key of table PAYMENT will be
{Loan-No, Payment-No} where the attribute Payment-No is called a “Discriminator” or
“Partial Key” of the table PAYMENT.

Redundancy of Tables in Weak Entity Sets The Table for Identifying


Relationship LOAN-PAYMENT is not required because if we create such a table, it will
have only two attributes i.e. Loan-No and Payment-No, which as such form part of table
PAYMENT. Thus, no table needs to be created for an Identifying Relationship. In case
there exists a Descriptive Attribute of an Identifying that can be shifted to the “Many-
Side Entity Set” i.e. the Weak Entity Set.

(e) Tabular representation of Generalization. The steps involved are:-

Create a Table each for the higher-level entity set and for each lower-level entity
set. The table for lower-level entity set will include its own attributes plus all the
Primary-Key attributes of its higher-level entity set.

P S Gill
43

Account-Number Balance

ACCOUNT Mat-Date

Int-Rate Installment
Interest-Rate
ISA

RD
Over-Draft
SAVINGS-
ACCOUNT Int-Rate

Mat-Date
CURRENT-
ACCOUNT
FD

For example, in the above case there will five tables i.e. ACCOUNT, SAVINGS-
ACCOUNT, CURRENT-ACCOUNT, FD and RD. The table ACCOUNT will have
columns Account-Number and Balance; and table SAVINGS-ACCOUNT will have
columns Account-Number and Interest Rate; and table CURRENT-RATE will have
columns Account-Number and Over-Draft. Same is applicable to the tables FD and RD.

Combining of Tables in Generalization If a generalization is “Total”, which implies


that each entity in the super-class (higher-level entity set) is a member of at least one sub-
class (lower-level entity set), no table is required to be created for the higher-level entity
set. Instead a table needs to be created for each lower-level entity set; and each such table
will also include all the attributes of higher-level entity set, in addition to its own distinct
attributes. For example, the table SAVINGS-ACCOUNT will also have columns Account-
Number, Balance and Interest-Rate; and the table CURRENT-ACCOUNT will also have
the columns Account-Number, Balance and Over-Draft. The same is applicable for FD
and RD tables.

P S Gill
44

(f) Tabular representation of Aggregation. Take the following Example:-

B-Name
E-Name B#
E#
BRANCH J#

EMPLOYEE JOB

EBJ

Aggregated Higher-Level-Entity-Set “EBJ”

Mgr-Id EBJM

MANAGER

In the above scenario, there will be tables for Entity Sets EMPLOYEE, BRANCH, JOB
and MANAGER. There will be one table for Relationship Set EBJM having Attributes
E#, B#, J# and Mgr-Id. No table is required for the Relationship Set EBJ because this
table would be a subset of table EBJM.

EBJM
E# B# J# Mgr-Id

P S Gill
45

E-R DIAGRAM OF AN AIRLINE RESERVATION SYSTEM


CAPACITY
AC_NO AC_TYPE

AIRCRAFT
TO_PLACE

FROM_PLACE ETD DATE ATD


CREW_NAME

FLT_NO ATA
ETA
DESIGNATION

FLT_SCHEDULE FLIGHT CREW-ID

FLT_
CREW
CREW
C_DATE

CONFIRMED

CANCELLATION RESERVATION
SEAT_NO

TICKET_NO
ISSUE_DATE
AMOUNT
VOUCHER_
NO
FARE

TICKET
REFUND
P_ADDRESS

P_TEL_NO
P_NAME

PASSENGER

P S Gill
46

Where ETD: Estimated Time of Departure (i.e. scheduled take-off time).


ETA: Estimated Time of Arrival (i.e. scheduled landing time at the destination)
ATD: Actual Time of Departure. Initially, it will have a NULL value. It will get
defined after aircraft actually takes-off.
ATA: Actual Time of Arrival. Initially, it will have a NULL value. Its value will
get defined only after aircraft actually lands at the destination.

The above E-R Diagram can be reduced to the following set of tables:-

FLIGHT_SCHEDULE (FLT_NO, FROM_PLACE, TO_PLACE, ETD, ETA)


AIRCRAFT (AC_NO, AC_TYPE, CAPACITY)
CREW (CREW_ID, CREW_NAME, DESIGNATION)
FLIGHT (FLT_NO, DATE, AC_NO, ATD, ATA)
FLT_CREW (FLT_NO, DATE, CREW_ID)
TICKET (TICKET_NO, ISSUE_DATE, FARE, P_NAME, P_ADDR, P_TEL_NO)
RESERVATION (TICKET_NO, FLT_NO, DATE, CONFIRMED, SEAT_NO)
CANCELLATION (TICKET_NO, FLT_NO, DATE, VOUCHER_NO, C_DATE)
REFUND (VOUCHER_NO, AMOUNT)

Since, there is one-to-one relationship between TICKET and PASSENGER, a common


table TICKET will suffice for these two entity sets. Even the CANCELLATION table
can be combined with RESERVATION table, since there is many-to-one relationship
between RESERVATION and REFUND.

The SEAT_NO will get defined only after a passenger checks in for a flight.

P S Gill
47

E-R DIAGRAM FOR VEHICLE INSURANCE

O_ADDRESS
O_TEL_NO

O-NAME

OWNER PREMIUM

COLOR EXPIRY_DATE

MODEL
REG_NO
BONUS
POLICY_NO
MAKE

VEHICLE INSURANCE_POLICY

P_DATE
A_DATE
A_REPORT_NO PAYMENT_
VOUCHER_
NO P_AMOUNT

PLACE

ACCIDENT CLAIM_PAYMENT

S_REPORT_NO
ASSESSED
_DAMAGE

SURVEYOR REPORT

REPAIR_ITEM
COST

REF_NO

REPAIRS

P S Gill
48

Exercises

Ex.2.1 Explain the concept of Entity Sets and Relationship Sets.

Ex.2.2 Distinguish between the following:-

(a) Simple and Composite Attributes


(b) Single-Valued and Multi-Valued Attributes
(c) Stored and Derived Attributes

Ex.2.3 Explain the concept of Super Keys, Candidate Keys and Primary Keys of
an Entity Set. Explain the determination of Primary Key of a binary Relationship set.
How is it influenced by the Cardinality Mapping of the Entity Sets participating in the
Relationship Set?

Ex.2.4 Explain the concept Weak Entity Sets and Identifying Relationships.
Explain how a multi-valued attribute can be better modeled as a Weak Entity Set.

Ex.2.5 Explain the concepts of Specialization and Generalization. What are the
different types constraints involved in specialization? Distinguish between Total &
Partial Specialization and between Disjoint & Overlapping Specialization.

Ex.2.6 With an example illustrate how a Relationship Set can be aggregated as a


higher-level entity set, which can then participate in relationships with other relationship
sets and entity sets.

Ex.2.7 Draw E-R diagrams to indicate the following relationships between entity set
Operator and entity set Machine:-

(a) Each Machine can be operated by many Operators but each Operator can
operate only one machine.

(b) An operator can operate many machine and each machine can be operated
by many Operators.

Ex.2.7 Make E-R Diagrams for the following real-world situations (Indicate clearly
entity sets, relationship sets, cardinalities, attributes and candidate Keys. Also indicate
any weak entity sets, specialization, generalization and aggregation etc.) Also reduce the
E-R diagrams to Tables:-

(a) An organization having a set of employees to execute a set of projects.


Each employee may be working on more than one project, each project is
managed by a manager and a manager is also one of the employees.

P S Gill
49

(b) A Tourist Management System catering to booking of hotels, taxies &


guides. It should also cater for cancellations, billing, payments etc.

(c) Preparation of time table of an Engineering College, catering for a number


of Sections (Year/Branch/Section), a number of courses, a number of
faculty members teaching the courses and a number of class rooms (ignore
labs). Make use of Aggregation and identify candidate keys.

[1] Wang, H., Naghavi, M., Allen, C., Barber, R.M., Bhutta, Z.A., Carter, A., Casey,
D.C., Charlson, F.J., Chen, A.Z., Coates, M.M. and Coggeshall, M., 2016. Global,
regional, and national life expectancy, all-cause mortality, and cause-specific mortality
for 249 causes of death, 1980–2015: a systematic analysis for the Global Burden of
Disease Study 2015. The lancet, 388(10053), pp.1459-1544.

P S Gill
1

LECTURE NOTES

UNIT - 2

Pradeep Kumar Kushwaha


2

CHAPTER 3

RELATIONAL DATA MODEL

Relational Database Concepts

A Relational Database is a collection of Tables called Relations.

Relation Schema

A Relation Schema refers to the structure of a Table. It indicates the Name of the
Relation Schema and the Names its Attributes (represented by Columns in the
table). For example R (A1, A2, ……, An) represents a Relation Schema named
“R” having n columns, representing the Attributes A 1,A2, …. An.

Domain of an Attribute The Domain of an Attribute Ai is the set of permitted


(legal) values, that can be assigned to Attribute A i. It is possible for more than
one Attributes to have the same Domain or to have overlapping Domains.

- The domains of all attributes of a relation should be atomic (indivisible).

- One value that is member of all domains is null, which implies that the
value is either unknown or non-existent; for example suppose an entity has
attribute telephone-number. It may have null value if an entity does not have a
telephone number or if the entity has telephone number, but number is not
known.

Degree of a Relation Schema

The Degree of a Relation Schema “R” refers to the number of Attributes


(Columns) in the Schema. For example, the Relation Schema R (A1, A2, ……,An)
has a Degree of n.

Relation or Relation Instance A Relation or relation instance, denoted as


r(R), refers to an actual Table named “r” created on the Relation-Schema “R”. A
table will comprise a set of rows (called tuples).

Let r(R) = {t1, t2 , ……, tm} where t1, t2 , ……, tm are the tuples in the relation.

Each tuple t will be an ordered list of n values.

Let t = <v1, v2, …..vn>

The n-values in a tuple will represent an entity or a relationship. Thus, all the n
values in the tuple will be related to each other. That is why the resulting
database is called a “relational database”.

Pradeep Kumar Kushwaha


3

Note that a tuple is an Ordered List (not a Set) of n values. That is why the
order of the values in the tuple matters. A value V i (1 < I < n) in the tuple t will be
from the Domain of attribute Ai.

i.e. vi  Domain (Ai)

A Relation can also be defined as a subset of the Cartesian Product of all the
Domains of its Attributes.
i.e. r (R)  Domain (A1) x Domain (A2) X …… X Domain (An)

Cardinality of a Relation The Cardinality of a Relation r(R) at any moment is


defined as the numbers of rows (tuples) existing in the Relation (Table) at that
moment of time.

The cardinality of relation r(R)= {t1, t2 , ……, tm} will be m.

There can be more than one Relations (Tables) defined on the same Schema,
For example r1(R) and r2(R) are two independent relations defined on the same
Schema R. Both will have same Degree but may have different cardinalities.

The Cardinality of a relation will vary from moment to moment; but degree will
change very rarely.

Some Notations

Since a relation is defined as a set of tuples, the notation t  r implies that tuple t
is in relation r

t[Ai] denotes the value of tuple t on attribute Ai.

Consider the following set of Schemas, representing information about


Students, Subjects and Results:-

STUDENT (Univ_Roll_No, Class_Roll_No, S_Name, DOB, Year,


Branch, Section, Father_Name, Address, Tel_No)

It represents a Relation Schema “STUDENT” with attributes


Univ_Roll_No, Class_Roll_No, S_Name, DOB, Year, Branch, Section,
Father_Name, Address, Tel_No. Let us assume that each student has a
unique Univ_Roll_No. Also, Class_Roll_No of a student is unique in a
class which is defined by the combination of {Year, Branch, Section}

Pradeep Kumar Kushwaha


4

SUBJECT (Sub_Code, Title)

It represents a Relation Schema “SUBJECT” having attributes Sub_Code


(like “TCS-402”), Title (like “DBMS”) such that each subject has a unique
Sub_Code.

RESULT (Univ_Roll_No, Sub_Code, Semester, Marks)

It represents relation Schema “RESULT” having attributes Roll_No,


Sub_Code, Semester and Marks.

Super Key of a Relation Schema R

The Super Key of a Relation Schema R refers to the set of attributes K (K  R),
which when taken collectively, will uniquely identify each tuple in a relation r(R).
The superset of a super-key will also be a super-key. Thus, a super-key may
have some extraneous attributes, without which the balance set still remains a
super-key. Such extraneous attributes, which need not be there in a super-key,
can be eliminated from the key.

For example in the above schemas, {Univ_Roll_No, S_Name} will form a


super-key of Relation Schema “STUDENT”. This super-key has an extraneous
attribute “S_Name”, without which the balance set i.e. {Univ_Roll_No} still
remains a super-key, since each student has a unique Univ_Roll_No.

Candidate Key of a Relation Schema R

A Candidate Key is defined as a super-key, whose no proper subset is a super-


key i.e. a Candidate Key is a minimal super key (having no extraneous
attributes).

A Relation Schema may have more than one Candidate Keys.

In the above schemas, the relation schema “STUDENT” has the following
Candidate Keys:-

{Univ_Roll_No}
{Class_Roll_No, Year, Branch, Section}
{S_Name, DOB, Father_Name, Address} Assuming that twins will not
have same name

And Relation Schema “SUBJECT” has {Sub_Code} as its Candidate Key and
Relation Schema “RESULT” has {Roll_No, Sub_Code} as its candidate key.

Pradeep Kumar Kushwaha


5

Primary Key of a Relation Schema R

It refers to one of the Candidate Keys of R, which is designated as primary


means of uniquely identifying tuples in a relation r(R).

For example, {Univ_Roll_No} will be the most appropriate choice of


Primary Key of Relation Schema “STUDENT”.

Since the other two relation schemas have only one candidate key each,
the same will be designated as their respective Primary Keys.

Composite Key A Key, that includes more than one attributes, is called a
Composite Key.

For example, the relation schema “RESULT” has Composite Primary


Key {Univ_Roll_No, Sub_Code}.

Foreign Key A relation schema R may include among its attributes some primary
key of another schema S. This is called a Foreign Key in schema R.

For example, {Univ_Roll_No} forms a Primary Key in STUDENT but it


forms Foreign Key in RESULT. Similarly {Sub_Code} forms a Primary Key of
SUBJECT but a Foreign Key in RESULT.

Integrity Constraints of a Relational Database

The following three types of Integrity Constraints are applicable to all Relational
Databases:-

(i) Domain Constraint


(ii) Key Constraint
(iii) Foreign Key (FK) Constraint or Referential Integrity Constraint

(i) Domain Constraint


Let there be a Relation Schema R (A1, A2, ……, An).
A Relation r (R) would be legal only if for each tuple t  r and for each
value t[Ai] in t, it satisfies t[Ai]  Domain (Ai); else the Relation will be
illegal (invalid).
This means that in a legal relation r(R) , all the values appearing under
each attribute must be from the domain of that attribute.

Specifying Domain Constraints in SQL The domain constraints in


SQL are specified by the data types of the attributes or by the CHECK

Pradeep Kumar Kushwaha


6

value clause or by “NOT NULL” clause. Various data types are NUMBER,
CHAR, VARCHAR, DATE etc.

CREATE TABLE STUDENT


( Univ_Roll_No CHAR (10),
Class_Roll_No INT,
S_Name VARCHAR (20) NOT NULL,
DOB DATE NOT NULL,
Year INT,
Branch CHAR(3),
Section INT,
Father_Name VARCHAR(20) NOT NULL,
Address VARCHAR (30) NOT NULL,
Tel_No CHAR (11),
PRIMARY KEY(Univ_Roll_No)
CHECK (Year BETWEEN 1 AND 4)
CHECK (Section BETWEEN 1 AND 2));

CREATE TABLE SUBJECT


( Sub_Code CHAR(7),
Title VARCHAR(20) NOT NULL,
PRIMARY KEY (Sub_Code));

For each attribute, the permitted domain is specified in the schema definition;
for example the domain of “Year” is the set of Integers.

Additionally, the ‘CHECK’ clause can be used to further specify a sub-domain


within the specified domain; for example the permitted domain of “Year” in
“STUDENT” is set of integers between 1 and 4.

(ii) Key Constraint


Let R (A1, A2,…, An) be a relation schema, then for each key K of R
(KR) and for each legal Relation r(R) and for each tuple-pair {t1, t2} 
r, it must satisfy t1[K]  t2 [K]. Else the Relation will be illegal (invalid).
This means that the values of no two in a legal relation must agree on
the Keys of that schema.

In SQL, a CREATE TABLE statement must explicitly specify PRIMARY


KEY of each Table.

Suppose a relation r(R) with Key {A,B} has the following state:-

Pradeep Kumar Kushwaha


7

r(R)
A B C
A1 b1 c3
A2 b1 c1
A3 b2 c5
A1 b2 c7
A3 b2 c1

This relation state is invalid, since there are two tuples with value of {A,B}
equaling {a3, b2}. One of these two tuples must be deleted, only then it will be a
valid relation.

(iii) Foreign Key (FK) Constraint OR Referential Integrity Constraint

Let r(R) and s(S) be two relations and let  (s) be a Foreign Key
(FK) in S referencing Primary Key “K” of R. The relation s(S) would be
legal (valid) only if for each tuple ts  s, there exists a tuple tr  r such
that tr [K] = ts []. A tuple ts  s that does not satisfy this condition will
be called a “Dangling Tuple”, in the sense, that such tuples do not
have necessary support from relation r. Such Dangling Tuples are
invalid; and DBMS has to ensure that such tuples do not exist in the
database.

Specification of Primary Keys and Foreign Keys in SQL

In SQL, the FOREIGN KEY – PRIMARY KEY mapping is explicitly


specified in a CREATE TABLE statement.

CREATE TABLE RESULT


( Roll_No CHAR (10),
Sub_Code CHAR (7),
Semester INT,
Marks INT,
PRIMARY KEY (Roll_No, Sub_Code),
FOREIGN KEY (Roll_No) REFERENCES STUDENT (Roll_No),
FOREIGN KEY (Sub_Code) REFERENCES SUBJECT (Sub_Code),
CHECK (Semester BETWEEN 1 AND 8),
CHECK (MARKS BETWEEN 0 AND 100));

Pradeep Kumar Kushwaha


8

Example:- Let there be two relations subject (SUBJECT) and result


(RESULT). Attribute Sub_Code in RESULT is a Foreign Key referencing
primary key Sub_Code of relation SUBJECT.

subject
Sub_Code Title
TCS-401 Comp Org
TCS-402 DBMS
TCS-301 Data Structure

result
Roll_No Sub_Code Semester Marks
0209130010 TCS-301 3 66
0209130010 TCS-401 4 57
0209130010 TCS-403 4 78

The relation result has a dangling tuple <”0209130010”, “TCS-403”, 78>,


which is attempting to reference a “Sub_Code” value of “TCS-403” which
does not exist in the relation subject, where “Sub-Code” is Primary Key.
Thus, this tuple is a dangling tuple, which is invalid. This tuple must be
deleted from table result, or a tulpe must be added to relation “subject”
with “Sub_Code” value = “TCS-403”.

How to determine the Primary Keys of Relation Schemas?

1. Strong Entity Set. The Primary key of the entity set forms the primary
key of the relation.

2. Weak Entity Set. Primary key of the relation comprises the union of the
primary key of the strong entity set on which the weak entity set is existence-
dependent and the discriminator of the weak entity set.

3. Relationship Set. The union of the primary keys of the related entity
sets becomes a super key of the relation. If the relationship is many to many,
then this super key is also the primary key of the relation. If the relationship is
many-to-one, then primary key of the “many-side” entity set is the primary key of
the relation. If the relationship is one-to-one, then primary key of any of the
related entity sets can be the primary key of the relation.

4. Combining of Tables. A many-to-one binary relationship R from A to


B can be represented by a table comprising of the attributes, that will be union of
the Primary Keys of A and B and the descriptive attributes of R. Such a table can
be combined with the table of “Many-Side” Entity Set. The attributes of the
combined table will be a union of the attributes of the two fused tables. So, the
Primary Key of the “One-Side” Entity Set will also form part of the table for

Pradeep Kumar Kushwaha


9

“Many-Side” Entity Set. For example, the table for ACCOUNT also contains the
key of Entity Set BRANCH.

For a one-to-one relationship set, the relationship table can be combined with the
table of any of the participating entity sets.

5. Multi-valued Attributes. A multi-valued attribute M is represented by a


table consisting of the primary key of the entity set of which M is an attribute plus
a column C holding individual values of attribute M. Entire set of attributes of the
resultant relation forms a primary key of the relation.

A SAMPLE DATABASE “ACADEMICS DB”

Let us consider an Academics Database “ACADEMICS DB”, with following Entity


Sets & Relationship Sets:-

Entity Sets:-

(i) STUDENT with Attributes Roll_No & S_Name. Suppose attribute


Roll_No has a unique value for each student.
(ii) SUBJECT with Attributes Sub-Code,Title and Semester
(iii) FACULTY with Attributes Fac_Code & Fac_Name
(iv) DEPT with Attributes D_Name & HOD

Relationship Sets:-

(i) SUB-OFFERED A many-to-many relationship between entity


sets FACULTY & SUBJECT. This implies that a Faculty can offer any
number of subjects in a semester and a subject can be offered by any
number of faculty members.

(ii) RESULT A many-to-many relationship between the entity sets


STUDENT and the aggregated relationship SUB_OFFERED with
descriptive attribute MARKS. This relationship represents the Marks
obtained by the students in various subjects. Assume that each student
can take many of the offered subjects and each offered subject can be
taken by many students.

(iii) FAC-DEPT A many-to-one relationship between the entity sets


FACULTY & DEPT with descriptive attribute DOJ (Date of Joining). This
implies that a Dept can have many faculty but a faculty can belong to only
one Dept.

Pradeep Kumar Kushwaha


10

The above Database can be represented by the following E-R Diagram:-

Aggregated Relationship “SUB-OFFERED” S_Name


Roll_No
Sub_Code
Semester
S_Addr
Title
Marks

SUBJECT STUDENT
RESULT

SUB-
OFFERED

Dept_Name
HOD
Fac_Code

Fac_Name

FACULTY DEPT
FAC-DEPT
Relational Database:-

The above scenario can be represented by the following Relational


Schemas:-

STUDENT (Roll_No, S_Name)


SUBJECT(Sub_Code, Title, Semester)
DEPT (D_Name, HOD)
FACULTY (Fac_Code, Fac_Name, D_Name)
SUB_OFFERED (Fac_Code, Sub_Code)
RESULT (Roll_No, Fac_Code, Sub_Code, Marks)

No table is required for the “one-to-many” relationship set “FAC-DEPT”; it


is combined with the “many-side” entity-set “FACULTY”; that is why the table for
“FACULTY” also includes the Primary Key “Dept_Name” of entity set “DEPT”.
The attribute “Dept_Name” forms a Foreign Key in table “FACULTY”, referencing
the Primary Key of table “DEPT”.

Pradeep Kumar Kushwaha


11

“SUB_OFFERED” represents a “many-to-many” relationship between the


entity sets “FACULTY” and “SUBJECT”; that is why the table “SUB_OFFERED”
contains primary keys of both participating entity sets. The attributes “Fac_Code”
and “Sub_Code” form foreign keys in this table; “Fac_Code” is referencing the
primary key of “FACULTY” and “Sub_Code” is referencing the primary key of
“SUBJECT”.

The relationship set “RESULT” represents a “many-to-many” relationship


between aggregated relationship set “SUB_OFFERED” and entity set
“STUDENT”. The table “RESULT” includes its descriptive attribute “Marks” and
the primary keys of both “SUB-OFFERED” and “STUDENT”. The attributes
“Roll_No”, “Sub_Code” and “Fac_Code” form foreign keys in table “RESULT”.

Schema Diagram The Schema diagram represents diagrammatically the Relation


Schemas of a database. A Relation Schema is depicted by a rectangle, divided into two
parts- the top part depicts key-attributes and the bottom part depicts the non-key
attributes. As compared to E-R Diagram, the Schema Diagram also depicts PK-FK
relationship between various relation schemas, by a directed edge drawn from FK to PK.
The above E-R Diagram can be represented by a Schema Diagram, as follows:-

SUBJECT RESULT STUDENT


Sub_Code Roll_No Roll_No
S_Name
Title Sub_Code

Fac_Code
Semester S_Addr

Marks

SUBJECT-OFFERED FACULTY DEPT


Fac_Code Dept_Name
Sub_Code
Fac_Name

Dept_Name HOD
Fac_Code

Pradeep Kumar Kushwaha


12

Assignment: 2

Ex.3.1 Explain mathematically a relation (table) “r” defined on a Schema R (A1,


A2, ….., An). Define its degree and cardinality.

EX.3.2 Explain the concepts of Domain Constraint, Key Constraint and


Referential Integrity Constraint.

Ex.3.3 Make Schema diagrams for the following set of schemas:-

(a) Student (Roll_No, S_Name, S_Address)


Subject (Sub_Code, Title, Credits, Semester)
Dept (D_Code, D_Name, HOD)
Faculty (F_Code, F_Name, D_Code)
Assigned (F_Code, Sub_Code)
Result (Roll_No, Sub_Code, Semester, Marks)

(b) Emp (E#, E_Name, Salary, D#)


Dept (D#, D_Name, Mgr#, Total_Salary)

Where D# is foreign key in Emp referencing D# of Dept and Mgr# is


foreign key in Dept referencing E# of Emp.

(c ) Supplier (S#, S_Name, S_City)


Part (P#, P_Name, P_Weight)
Project (J#, J_Name, J_City, Manager)
Order (S#, P#, J#, Qty)

Ex.3.4 What additional information is conveyed by a Schema Diagram as


compared to an E-R Diagram?

Pradeep Kumar Kushwaha


13

CHAPTER 4
RELATIONAL ALGEBRA

The Relational Algebra is a Procedural query language. A query in Relational


Algebra (RA) has to specify not only ‘what information is required’ but also ‘how
to extract this information’. It is capable performing operations pertaining to
information retrieval and information update. A Query in Relational Algebra will
involve the following operations:-

Basic Operations

1. Select ( )
2. Project ()
3. Set Union ()
4. Set Difference (-)
5. Cartesian Product (X)
6. Rename ()

Additional Operations

7. Set Intersection ()


8. Natural Join ( / * )
9. Theta Join ( )
10. Divide ()
11. Assignment ()

Extended RA Operations

12. Generalized Project ()


13. Aggregate Functions (G)
14. Outer Join
(a) Left Outer Join ( )
(b) Right Outer Join ( )
(c) Full Outer Join ( )

1. Select (  ) The Select operation P (r) selects those tuples from relation
r, which satisfy predicate P.

The predicate P will involve:-


(a) Attributes from Schema R of r
(b) Literals
(c) Comparison Operators: <, > , < , >, =, 
(d) Logical Operator: , , 

Pradeep Kumar Kushwaha


14

Degree of the resultant relation will be equal to degree of argument


relation r = Degree (R).

Cardinality of the resultant relation will be < cardinality (r).

Example: Consider the schema:-


EMP (E#, E_Name, E_City, E_Street, Salary, D#)
DEPT (D#, D_Name, D_City, Total_Sal)

Where Total-Sal is the total salary of all employees of a


Department.

And consider the following relations (tables) on the above


schemas:-

Emp
E# E_Name E_City E_Street Salary D#

001 Ajay Noida Sec-25 50000 03


003 Vijay G Noida Alpha 15000 03
004 Ram Delhi RKP 15000 02
005 Shyam Noida Sec-26 45000 03
007 Vishal Noida Sec-25 37000 02
010 Raju G Noida Beta 32000 01
012 Vikash Noida Sec-27 35000 01

Dept
D# D_Name D_City Total_Sal

01 Marketing Delhi 67000


02 Dispatch G Noida 52000
03 Finance Delhi 1100000

Query 1: Get information of those employees of Department Number 03,


who are drawing salary more than 25000.
D# = ‘03’  Salary > 25000 (emp)

The resultant relation will be:-

E# E_Name E_City E_Street Salary D#

001 Ajay Noida Sec-25 50000 03


005 Shyam Noida Sec-26 45000 03

Pradeep Kumar Kushwaha


15

2. Project The Project operation  S (r) projects attributes’ list S from


r(R), (Where S  R). Any duplicate Tuples in the result are automatically
eliminated.

Degree of the resultant relation will be < degree(R)


Cardinality of the resultant relation < cardinality (r). The cardinality of resultant
relation will be equal to the cardinality of r if S includes a key of R.

Query 2: Get E# and E_City of all employees.

 E#, E-City (emp)

The result will be:-

E# E_City E_Street

001 Noida Sec-25


003 G Noida Alpha
004 Delhi RKP
005 Noida Sec-26
007 Noida Sec-25
010 G Noida Beta
012 Noida Sec-27

Query 3: Get D_Name of those departments, Which have total salary


more than 1000000.

 D-Name (Total-Sal > 1000000 (dept))

Result:-

D# D_Name

03 Finance

3. Set Union () The ‘Set Union’ operation r  s is a binary operation


that takes two argument relations r and s as input and produces a single
resultant relation, which is the union of the set of tupes of the two argument
relations. Duplicate tuples are automatically eliminated from the result. This
implies that a tuple will appear in r  s if t exists in r or in s or in r and s both.
For r  s operation to be feasible, the relations r and s must be compatible,
which implies that:-

Pradeep Kumar Kushwaha


16

(a) Both r and s must be of same degree i.e. they must have same
number of attributes.

(b) For all i, the domain of ith attribute of r must be same as the domain of
the ith attribute of s.

Degree (r  s) = degree(r) = degree(s)


Cardinality (r  s) = cardinality (r) + cardinality (s) – cardinality (rs)

Example Consider the Schema:-

DEPOSIT (Cust_Name, Account_No)


LOAN (Cust_Name, Loan_No)

Let the Tables on the above schemas be:-

Deposit
Cust_Name Account_No

Ajay A-101
Vijay A-103
Ram A-107

Loan
Cust_Name Account_No

Vishal L-103
Ram L-102

Query 4: Get the names of those customers, who have either account
or loan in the bank.

Cust-Name (Deposit)   Cust-Name (Loan)

Result:-

Cust_Name

Pradeep Kumar Kushwaha


17

Ajay
Vijay
Ram
Vishal

4. Set Difference The ‘Set Difference’ operation r – s, between two


relations ‘r’ and ‘s’, produces a relation with tuples which are there in ‘r’ but not
there in ‘s’. For the operation r - s to be feasible, the relations r and s must be
compatible as in the case of Set Union.

Degree (r - s) = degree(r) = degree(s)


Cardinality (r - s) = cardinality (r) – cardinality (rs)

Query 5: Get the names of those customers who have account in the bank,
but do not have a loan.

Cust-Name (Deposit) - Cust-Name (Loan)

Result:-

Cust_Name

Ajay
Vijay

5. Cartesian Product The Cartesian Product of two relations r and s is


expressed as r x s .

The resultant relation will be on schema, that will be concatenation of the two
schemas R and S, expressed as (R,S).

For each tuple tr  r and each tuple tss, there will be a tuple t in r x s, such that
t [R] = tr and t [S] = ts.

Degree (r x s) = degree (R) + degree (S)


And Cardinality (r x s) = cardinality (r) * cardinality (s)

Since, same attribute names may appear in R and S, notation r.attribute-name or


s.attribute-name is used to distinguish such attributes. For those attributes, which
appear in one of the schemas, the relation prefix is not required.

Query 6:
Deposit X Loan

Pradeep Kumar Kushwaha


18

This will produce a resultant relation having 04 attributes, 02 from


DEPOSIT and 02 from LOAN i.e. Deposit.Cust_Name, Account_No,
Loan.Cust_Name and loan_No. The resultant Table will have 6 tuples,
as shown below:-

Result:- Deposit X Loan

Deposit. Account_No Loan. Loan_No


Cust_Name Cust_Name

Ajay A-101 Vishal L-103


Ajay A-101 Ram L-102
Vijay A-103 Vishal L-103
Vijay A-103 Ram L-102
Ram A-107 Vishal L-103
Ram A-107 Ram L-102

If the names of the argument relations are not distinct (which is the case when
Cartesian Product of a relation with itself is specified), rename operation, as
explained below, is used to rename one of the arguments.

6. Rename The rename operation, denoted by Greek letter rho (  )


enables to rename a relation.

Example x (A1,A2,…..,An) (E)

The result of relational-algebra expression E is returned under the name x. The n


attributes of the resultant relation are named as A1, A2,…..,An respectively.

Query 7:

Loan k (CN, LN)(Loan)

Result:-

Cust_Name Account_No CN LN

Vishal L-103 Vishal L-103


Vishal L-103 Ram L-102
Ram L-102 Vishal L-103
Ram L-102 Ram L-102

Pradeep Kumar Kushwaha


19

Query 8: Determine the largest salary amongst all employees.

Salary (emp) - emp.Salary (emp.Salary < k.Salary (emp k(emp)))

Result:-

Salary

50000

Formal definition of Basic Relational Algebra

A basic expression in relational algebra consists of either of the following:-

(a) A relation in the database


(b) A constant relation

The expression is constructed out of small sub-expressions. Let E1 and E2 be


relational-algebra expressions. Then, the following are also expressions:-

(a) E1  E2
(b) E1 – E2
(c) E1  E2
(d) P (E1) where P is a predicate on the attributes in E1.
(e) S (E1) where S is a list consisting of some of the attributes in E 1.
(f) x (E1) where x is the new name for the result of E1.

Additional Operations of Relational Algebra

The following are additional operations, which do not add any power to the
relational algebra, but simplify common queries.

7. Set Intersection The ‘Set Intersection’ operation r  s, between two


relations ‘r’ and ‘s’, produces a relation with tuples which are there in ‘r’ as well as
in ‘s’. For the operation r  s to be feasible, the relations r and s must be
compatible as in the case of Set Union and Set Difference.

This operation is called additional, since it can be expressed in terms of the basic
operations, as shown below:-

r  s = r – (r-s) i.e. eliminate those tuples from r, which exist in r but not in s.
OR
r  s = s – (s-r) i.e. eliminate those tuples from s, which exist in s but not in r.

Pradeep Kumar Kushwaha


20

Query 9: Get the names of those customers who have account as well as
loan in the bank.

Cust-Name (Deposit)   Cust-Name (Loan)

Result:-

Cust_Name

Ram

8. Natural Join Natural Join is equivalent to the following sequence of


operations on two argument relations:-

(a) A Cartesian product of the two argument relations.

(b) A Selection forcing equality on common attributes of the two


argument relations.

(c) Removing duplicate attributes from the resultant relation.

It is denoted by the symbol *.

Consider two relation schemas R and S, which can be treated as sets of


attributes. Then R  S denotes the set of attributes, which are common to both
R and S. And R  S denotes the set of attributes, which are there either in R or in
S or in both. R-S denotes the set of attributes, which are there in R but not in S.
Then, natural join of relations ‘r’ and ‘s’, denoted by r * s, is a relation on schema
R  S and is formally defined as:-

r * s = R  S(r.A1 = s.A1  r.A2 = s.A2 ……. r.An = s.An ( r  s ))


Where R S = { A1, A2 , ………, An}

Query 10: Get Cust_Name, Account_No and Loan_No of the customers having
account as well as loan.

Deposit * Loan

The above query is equivalent to:-


 Deposit.Cust-Name, Account_No, Loan_No ( Deposit.Cust_Name = Loan.Cust_Name (Deposit 
Loan))

Just note, the expression has become extremely user friendly with the use of
“Natural Join” operation.

Pradeep Kumar Kushwaha


21

9. Assignment Sometimes, it is convenient to write a relational algebra


expression, by dividing it into sub-expressions and assigning the intermediate
results to temporary relation-variables. The assignment operation, denoted by 
, assigns result of a relation-algebra sub-expression to a temporary relation-
variable. The relation-variable may be used in subsequent expressions, just like
any permanent relation. Assignment to a permanent relation would cause its
modification.

Ex. The Divide () can be divided into parts as follows:-

temp1  R-S (r)

temp2  R-S ((temp1  s) - R-S,S (r) )

result = temp1 – temp2

The result, of the expression on the right, is assigned to the variable on the left.

Division () Let r(R) and s(S) be two relations and let S  R, that is every
attribute in schema S is also there in schema R. The relation obtained by dividing
relation ‘r’ by relation ‘s’ i.e. r  s is a relation on schema R-S (i.e. schema
containing those attributes of R which are not there in S.

A tuple t will appear in r  s , if and only if the following two conditions are
satisfied.
1. t  R-S(r)

2. For every tuple ts  s, there must be a tuple tr  r such that:-

(a) tr[S] = ts
(b) tr[R-S] = t

Let r( R ) and s (S) be two relations such that schema S  R. Then, the DIVIDE
operation r  s is defined as:-

r  s = R-S (r) -R-S (( R-S (r) x s ) -R-S,S (r))

Let temp1  R-S (r)


temp2  (( temp1 x s) -R-S,S (r) )
r  s = temp1 - R-S (temp2)

Let r (R) =

Pradeep Kumar Kushwaha


22

A B C D
a1 b1 c1 d1
a2 b2 c2 d2
a1 b2 c1 d2

And s (S) =
B D
b1 d1
b2 d2

temp1 = R-S (r)

A C
a1 c1
a2 c2

(temp1 x s)
A C B D
a1 c1 b1 d1
a1 c1 b2 d2
a2 c2 b1 d1
a2 c2 b2 d2

R-S,S (r)
A C B D
a1 c1 b1 d1
a2 c2 b2 d2
a1 c1 b2 d2

Temp2 = ( temp1 x s) -R-S,S (r)

A C B D

a2 c2 b1 d1

R-S (temp2)

A C

a2 c2

Pradeep Kumar Kushwaha


23

r  s = temp1 - R-S (temp2)

A C

a1 c1

Example: Find the Names of all customers, who have accounts in all
branches of Delhi.

r1   Branch-Name (  Branch-City = “Delhi” ( Branch))


r2   Customer-Name , Branch-name ( Depositor  Account)
result = r2  r1

EXTENDED RELATIONAL-ALGEBRA EXPRESSIONS

Generalized Projection
It permits arithmetic functions to be used in the projection list.
Ex. customer-name, limit – credit (credit-info) where credit-info is a relation
on the schema(customer-name, limit, credit)

Outer Join In Natural Join, the resultant output relation contains tuples
corresponding to only those tuples of input relations which satisfy the equality
criteria on the values of their common attributes. The information pertaining to
the other tuples of input relation does not appear in the output relation. The Outer
Join operation enables to join such tuples also. There are three types of Outer
Join- Left Outer Join, Right Outer Join and Full Outer Join. The attributes with
missing values in some attributes would contain NULL values in those attributes.
The symbols for the three outer joins are- Left Outer join:  ,Right outer join: 
and Full Outer Join: 
Ex.

Name Sector City


Ajay Sec-26 Noida
Vijay Sec-24 G-Noida
Ram Sec-24 Faridabad

Relation customer_residence

Name Account branch


Ajay A-100 Noida
Sharma A-200 G-
Noida
Vijay A-300 Delhi

Pradeep Kumar Kushwaha


24

Relation bank_account

Natural Join

Name Sector City account branch


Ajay Sec-26 Noida A-100 Noida
Vijay Sec-24 G-Noida A-300 Delhi

Left Outer Join

Name Sector City account branch


Ajay Sec-26 Noida A-100 Noida
Vijay Sec-24 G-Noida A-300 Delhi
Ram Sec-24 Faridabad null null

customer_residence  bank_account

Right Outer Join

Name sector City account branch


Ajay Sec-26 Noida A-100 Noida
Vijay Sec-24 G-Noida A-300 Delhi
Sharma null null A-200 G-Noida

Customer_residence  bank_account

Full Outer Join

Name sector city account Branch


Ajay Sec-26 Noida A-100 Noida
Vijay Sec-24 G-Noida A-300 Delhi
Ram Sec-24 Faridabad Null Null
Sharma null null A-200 G-Noida

customer_residence  bank_account

Aggregate Functions

Aggregate Functions are the functions which take a collection of values as input
and return a single value as result; like sum, count, avg, min, max.

Ex. G SUM (amount) (loan) – computes the total of all loan amounts
G MAX (amount) (loan) - determines the max amongst loan amounts

Pradeep Kumar Kushwaha


25

G MIN (amount) (loan) - determines the min amongst loan amounts


G AVG (amount) (loan) - computes the average of all loan amounts
G COUNT (amount) (loan) - determines the number of loans held
G COUNT DISTINCT (amount) (loan) - determines the no. of distinct loan amounts

Grouping

The following query will compute the total and max of loan amounts at
each branch and list the results branch-wise.

branch-name G sum (amount), max (amount) (loan)

The result of this query is a relation on schema(branch-name, sum of


amount, max of amount). G indicates that relation loan must be divided
into Groups, based on value of Branch-name.

DATABASE MODIFICATION

Deletion
In relational algebra, it is expressed as r  r – E where r is a relation and E is
relational-algebra expression.

Ex. account  account -  customer-name=”Ajay” (account)


loan  loan -  amount > 0 and amount < 500 (loan)

Delete all accounts at branches located in Delhi.

r1   branch-city = “Delhi” (account  branch)


r2  branch-name, account-number, balance (r1)
account  account – r2

Insertion
It is expressed as r  r U E

Ex. account  account U {“Noida”, A-500, 5000}

Insert a new account for all loan holders of Noida branch, with account-
number same as loan- number and an initial balance of 1000.

r1  (  branch-name = “Noida” (borrower  loan))

r2   branch-name, loan-number (r1)

Pradeep Kumar Kushwaha


26

account  account  (r2  {(1000)})


depositor  depositor  customer-name, loan-number (r1)

Update it is expressed as r  F1,F2,……Fn (r) where Fi is either ith


attribute of r if it is not updated or an expression involving constants and
attributes of r, which gives new value to the ith attribute.

Ex. account   branch-name, account-number, balance balance*1.06( balance > 10000 (account))

 branch-name, account-number, balance balance*1.05( balance < 10000 (account))

If we want to update only selected tuples from r, we use the following


operations:-
r  F1,F2,…….Fn ( p(r))  (r- p(r))

Referential Integrity Constraints in Database Modification

Let r1 and r2 be two relations with K as primary key of R 1 and  as such that
Foreign Key in R2 referencing K1 in R1.

Insert If a tuple t2 is inserted in r2, the system must ensure that there exists a
tuple t1  r1 such that t1 [K] = t1 [] that is, t2 []  K1(r1).

Delete If a tuple t1 is deleted from r1, the system must compute a set of
tuples in r1 that reference t1, that is set S = K1(r2)

If set S is not empty an empty set, then either the Delete Command should be
rejected as an error or all tuples that reference t 1 (directly or indirectly) must also
be deleted. As obvious, this would result in a cascading delete, since the tuples
relations may reference tuples in r2, that further reference t1  r1.

Update

Case I If a tuple t2 is updated in r2 such that the update effects the attribute set 
and t2’ is the modified tuple, the system must ensure that there is a tuple t1  r1
such that t1[K] = t2 [K] i.e. t2 []  K1(r1) must be satisfied.

Case II If a tuple t1 is modified in r1 such that the update effects the primary
key attributes K, the system must compute a set of tuple in r 2 that reference t1,
that is set S = K1(r2).

If this set S is not empty, then either the update command should be rejected as
an error or all tuples that reference t1 (directly or indirectly) must also be

Pradeep Kumar Kushwaha


27

updated. As obvious, this would result in a cascading update since the tuples
may reference tuples that reference t1.

VIEWS

The entire logical schema of a database is not visible to each and every user of
the database. Security considerations may dictate that certain data be hidden
from certain users. Beside the security reasons, the designers may wish to
create user-friendly set of relations, customized to the specific requirements of
different categories of users.

Any relation, that does not form part of the logical schema, but is made visible to
a set of users, as a virtual relation, is called a view. The view is not stored as a
physical table in the database. Only a definition of the view is stored in the data
dictionary.

The syntax for creation of a view is:-

create view v as <query expression>

Ex. create view all-customer as  branch-name, customer-name (depositor *


account)  branch-name, customer-name (borrower * loan)

Once a view has been defined, it can be referenced in queries just like a
physical table.

Create view noida-customer as customer-name (  branch-name= “Noida” (all-


customer))

Whenever a view is defined, the definition is stored in the data dictionary.


Whenever a reference is made to the view in a query, a table is created on the
view schema, the query is answered. Thereafter, the view table is deleted. Since
a view table needs to be created every time a reference is made to a view, it has
its associated overheads.

Materialized Views To obviate these overheads, some database systems


support storing of view tables like physical relations. In this case, the view tables
need to be updated as and when the parent physical relation is updated. Such
views are called materialized views. The applications that involve frequent use of
a view or the applications requiring fast response to view based queries, would
prefer materialized views. But the materialized views suffer from the update
overheads.

Pradeep Kumar Kushwaha


28

Dependency of Views Consider the following View Dependency Graph.

noida-customer

all-customer

borrower loan

One View may be used in the expression defining another View. A View relation
v1 is said to depend directly on a View relation v2 if v2 is used directly in the
expression defining v1. As shown in the above View Dependency graph, all-
customer is directly dependent on borrower and loan. A View relation v1 is said
to depend on other View relation v2, if a path exists in the View Dependency
Graph from v2 to v1. A View relation v is said to be recursive if it depends on
itself.

View Expansion View Expansion is used to define the meaning of Views in


terms of other views. Let View v1 be defined by an expression e1 that may itself
contain uses of View Relations. A View relations in e1is replaced by the
expressions defining it. The definition itself may contain reference to View
Relations, which are further replaced by their definitions. This process is
repeated till there are no more uses of View Relations in e1. This process is
called View Expansion.

Pradeep Kumar Kushwaha


29

Assignment: 3

Ex.4.1 Consider the following schema:-

Student (Roll_No, S_Name, S_Address, S_DOB)


Subject (Sub_Code, Title, Credits, Semester)
Dept (D_Code, D_Name, HOD)
Faculty (F_Code, F_Name, D_Code, Designation)
Assigned (F_Code, Sub_Code)
Result (Roll_No, Sub_Code, Semester, Marks)

Write the following queries in Relational Algebra:-

(i) Get the Titles of the subjects offered in “Even” Semester.

(ii) Get the names of the students born after “17-APR-1980”.

(iii) Get the Names of “CSE” department faculty.

(iv) Get the Titles of the subjects assigned to “ECE” department


faculty.

(v) Get Average Marks, Max Marks, Min Marks and Total Marks in
the Result.

(vi) Get Average Marks scored by each student.

(vii) Get Average Marks scored in each subject.

(viii) Get the name(s) of the students scoring highest Average Marks.

(ix) Get the title(s) of the subjects in which students have scored
highest average marks.

(x) Get the number of subjects assigned to each faculty.

(xi) Delete the tuples of result having marks less than 30.

(xii) Delete the tuple of Faculty “Ajay”.

(xiii) Change the Designation of Faculty “Vijay” to “Asst Prof”.

(xiv) Add a new faculty with F_Code =”AKG”, F_Name= “Ajay Kr


Garg”, D_Code =”CSE” and Designation = “Lecturer”.

Pradeep Kumar Kushwaha


30

Ex.4.2 Consider the following schema:-

Emp (E#, E_Name, Salary, D#)


Dept (D#, D_Name, Mgr#)

Where D# is foreign key in Emp referencing D# of Dept and Mgr# is


foreign key in Dept referencing E# of Emp.

Write the following queries in Relational Algebra:-

(i) Get the names of the employees working in “Manufacturing”


Department.

(ii) Get the names of the employees drawing salary between 20000 and
50000.

(iii) Get Min Salary, Max Salary and Average Salary of each
department.

(iv) Get the name(s) of the department(s) having highest Average


salary.

(v) Get the name of each employee along with its manager.

(vi) Get the number of employees working in each department.

(vii) Get the name(s) of the department with total salary of employees
more than 50000000.

(viii) Get the name(s) of the department(s) having least number of


employees.

(ix) Get the names of the employees getting salary more than the
average salary of all the employees.

(x) Increase the salary of all employees by 20%.

(xi) Increase the salary of all managers by 50%.

(xii) Department Name “Scrap” has been closed down. Transfer all
employees of this department to “Absorb” department and delete
the information of “Scrap”.

Pradeep Kumar Kushwaha


31

Ex4.3 Consider the following schema:-

Supplier (S#, S_Name, S_City)


Part (P#, P_Name, P_Weight)
Project (J#, J_Name, J_City, Manager)
Order (S#, P#, J#, Qty)

Write the following queries in Relational Algebra

(i) Get the Names of suppliers located in “Noida” or “Delhi”.

(ii) What is the weight of Part Number “P110”?

(iii) Who are the suppliers, supplying Part “Clutch Assembly” to Project Name
“Vehicle R&D”?

(iv) Get the names of the suppliers supplying parts to all projects in
“Mumbai”.

(v) Get the names of the suppliers, supplying all the parts, listed in table Part.

(vi) What is the total quantity of each part being supplied?

(vii) What are the parts, whose total quantity being supplied is larger than the
average quantity being supplied of part name “Ignition Switch”?

Ex.4.4 Consider the following current state of table Result:-

Roll_No Sub_Code Marks

101 TCS-401 70
102 TCS-401 80
105 TCS-401 64
110 TCS-401 70
102 TCS-402 92
103 TCS-402 70
105 TCS-402 70
110 TCS-402 68
101 TCS-403 82
102 TCS-403 64
103 TCS-403 72
110 TCS-403 80

What will be the results of the following queries:-

(a) G MAX(Marks), MIN (Marks), SUM(Marks), AVG(Marks) (Result)

Pradeep Kumar Kushwaha


32

(b) Roll_No G AVG (Marks), MAX(Marks), MIN(Marks), COUNT(Sub_Code) (Result)


(c) Sub_Code G AVG (Marks), MAX(Marks), MIN(Marks), COUNT(Roll_No) (Result)
(d) Result  Sub_Code (Result)

Ex.4.5 Consider the following state of tables Depositor and Borrower

Depositor

Cust_Name Account_Number
Ajay A102
Vijay A110
Ram A111
Vikram A112

Borrower

Cust_Name Loan_Number
Vijay L102
Shyam L111
Ram L110
Ajeet L103

Determine the results of the following operations

(a) Depositor X Borrower


(b) Depositor * Borrower
(c) Left Outer Join of Depositor and Borrower
(d) Right Outer Join of Depositor and Borrower
(e) Full Outer Join of Depositor and Borrower

Pradeep Kumar Kushwaha


33

CHAPTER5
TUPLE RELATIONAL CALCULUS

Tuple Relational Calculus is a non-procedural query language. It describes the


desired information, without giving a specific procedure for obtaining that
information.

A query in the tuple relational calculus is expressed as { t | P(t) } which means


‘ a set of all tuples t such that predicate P is true for t.’

We use t[A] to denote the value of tuple t on attribute A


and t  r to denote that tuple t is in relation r.

Examples

Get information of the loans having loan amount more than 100000.

{ t | t  loan  t[amount] > 1000}

Find the loan numbers of the loans having amount more than 100000.

{t | sloan (t[loan-number] = s [loan-number]  s[amount] > 100000)}

Find names of the customers who have a loan from Noida branch.

{t |  s borrower (t[customer-name] = s[customer-name] 


 uloan (u[loan-number] = s[loan-number] 
u[branch-name]= “Noida”))}

Find the names of customers having loan or account or both.

{t |  s  borrower (t[customer-name] = s[customer-name]


  u  depositor (t[customer-name = u[customer-name])}

Find names of the customers having both loan and account.

{t |  s  borrower (t[customer-name] = s[customer-name])


 u depositor (t[customer-name] = u [customer-name]) }

Find the names of the customers who have account but not loan.

{t |  u  depositor (t[customer-name] = u[customer-name])


   s  borrower (t[customer-name] = s [customer-name]) }

Pradeep Kumar Kushwaha


34

Find names of all customers who have accounts at all branches located in Delhi.

{t |  x  customer (x[customer-name] = t [customer-name]) 


(u  branch (u[branch-city] = “Delhi” 
 v  account (v[branch-name] = u[branch-name]
  w  depositor (w[account-number] = v [account-number]
 t[customer-name] = w [customer-name]))))}

Formal definition of Tuple Relational Calculus

A Tuple-Relational-Calculus expression is of the form {t | P(t) } where P is a


formula. A formula in Tuple-relational-calculus is made out of atoms. An atom
has one of the following forms:-

(a) s  r , where s is a tuple variable and r is a relation.

(b) s[x]  u[y], where s and u are tuple variables , x is an attribute on which s
is defined, y is an attribute on which u is defined, and  is a comparison
operator (<, <, >, >, =, ). The attributes x and y should have domains that
can be compared by .

(c) s[x]  c, where s is a tuple variable, x is an attribute on which s is defined,


 is a comparison operator and c is a constant from the domain of
attribute x.

A formula is built from atoms using the following rules:-

(a) An atom is a formula.

(b) If P1 is a formula then  P1 and (P1) are also formulae.

(c) If P1 and P2 are formulae, then P1  P2, P1  P2 and P1  P2 are also


formulae.

(d) If P1(s) is a formula containing free tuple variable s, and r is relation, then
 sr (P1(s)) and  sr (P1(s)) are also formulae.

The following formulae are equivalent:-


(a) P1  P2 is equivalent to (P1  P2).
(b)  tr (P1(t)) is equivalent to  tr ( P1(t)).
(c) P1  P2 is equivalent to P1 P2.

Pradeep Kumar Kushwaha


35

Safety of Expressions.

There is a possibility that an unsafe tuple-relational-calculus expression may


generate an infinite expression; like {t | (tloan)}. There are infinitely many
tuples which are not there in loan. Thus, we define domain of a tuple relational
formula. The domain of P i.e dom(P) is the set of all values referenced by P.
These include the values that appear in P and the values that appear in a
tuple of a relation referenced in P.
For example dom( t loan ^ t[amount] > 1200) is the set of values appearing in
loan and value 1200. And the dom(  (t loan)) is the set of values appearing in
loan.
An expression { t | P(t) } is safe if all values that appear in its result are from its
domain dom(P). Thus the expression {t | (tloan)} is not safe, since its result
can have values outside the domain of the expression (the domain of the
expression being the values appearing in relation loan).

Pradeep Kumar Kushwaha


36

Exercises

Ex.5.1 Let there be schemas:-


Account (AN, BN, Bal)
Branch (BN, BC)

Write the following queries in Tuple Relational Calculus:-

(a) AN (Account)

(b)  Bal >= 50000 (Account)

(c) AN (BC=”Noida” Bal >=100000 (Account * Branch))

(d) BN (Branch) - BN (Account)

Also, state the above queries in plain English.

Ex.5.2 Consider the following schema:-

Student (Roll_No, S_Name, S_Address, S_DOB)


Subject (Sub_Code, Title, Credits, Semester)
Dept (D_Code, D_Name, HOD)
Faculty (F_Code, F_Name, D_Code, Designation)
Assigned (F_Code, Sub_Code)
Result (Roll_No, Sub_Code, Semester, Marks)

Write the following queries in Tuple Relational Calculus:-

(i) Get the Titles of the subjects offered in “Odd” Semester.

(ii) Get the names of the students born after “10-DEC-1978”.

(iii) Who is the HOD of faculty “Ajay”?

(iii)Get the Names of “ME” department faculty.

(iv) Get the Titles of the subjects assigned to “IC” department faculty.

(v) Get names of the students scoring more than 80 marks in “DBMS”.

Ex5.3 Consider the following schema:-

Pradeep Kumar Kushwaha


37

Supplier (S#, S_Name, S_City)


Part (P#, P_Name, P_Weight)
Project (J#, J_Name, J_City, Manager)
Order (S#, P#, J#, Qty)

Write the following queries in Tuple Relational Calculus:-

(i) Get the Names of suppliers located in “Noida”.

(ii) What is the weight of Part Number “P333”?

(iii) Who are the suppliers, supplying Part “Ignition Switch” to Project Name
“Small Car”?

(iv) Get the names of the suppliers who are supplying parts to the projects
located in the same city as the city of the supplier.

(v) Get the names of the suppliers supplying parts to all projects in
“Mumbai”.

Ex.5.4 Consider the following schema:-

Emp (E#, E_Name, Salary, D#)


Dept (D#, D_Name, Mgr#)

Where D# is foreign key in Emp referencing D# of Dept and Mgr# is


foreign key in Dept referencing E# of Emp.

Write the following queries in Tuple Relational Calculus:-

(i) Get the names of the employees working in “Sales” Department.

(ii) Get the names of the employees drawing salary more than 20000.

(iii) Get the name of each employee along with its manager.

Pradeep Kumar Kushwaha


38

CHAPTER 6
DOMAIN RELATIONAL CALCULUS

An expression in the Domain Relational Calculus is of the form:-


{ <x1,x2,….xn> | P(x,x2,…..,xn) }
where x1,x2,…….xn represent domain variables.

P represents a formula, which is composed of atoms, as in the case of tuple-


relational-calculus.

An atom in the domain relational calculus has one of the following forms:-

(a) <x1,x2,…..xn>  r, where r is a relation on n attributes; and x1,x2,…..xn are


domain variables or domain constants corresponding to the respective domains
of n attributes of relation r.

(b) x  y, where x and y are domain variables and  is a comparison operator


(<, <, >, >, =, =/=). x and y should have domains which can be compared by .

(c) x  c, where x is a domain variable,  is a comparison operator, and c is a


constant in the domain of x.

A formula is built from atoms using the following rules:-

(a) An atom is a formula.

(b) If P1 is a formula, then so are P1 and (P1).

(c) If P1 and P2 are formulae, so are P1  P2, P1  P2 and P1  P2.

(d) If P1 (t) is a formula in t, where t is a free domain variable then


 t (P1(t)) and  t (P1(t)) are also formulae.

Examples.

Get information of those loans that have amount >100000.

{<l, b, a> | <l , b, a>  loan ^ a > 100000}

Find the loan-numbers of those loans that for which amount is more than
100000.

{<l> |  b, a ( <l, b, a>  loan  a > 100000) }

Find names of the customers who have a loan from Noida branch & find loan
amount.

Pradeep Kumar Kushwaha


39

{<c, a> |  l (<c, l> borrower   b (<l, b, a> loan ^b=”Noida” ))}

Find the names of the customers who having loan or account or both at Noida
branch.

{<c> |  l(<c,l>  borrower ^ b,a (<l, b, a>  loan ^ b=”Noida”))


 a (<c,a>  depositor ^  b,n (<a, b, n>  account ^ b=”Noida”)) }

Find names of all customers who have accounts at all branches located in Delhi.

{<c>| x,y,z ((<x,y,z>  branch  y = “Delhi”)  a,b (<a, x, b> 


account  <c,a>  depositor))}

Safety of Domain-relational-calculus expressions

An expression like {<l, b, a> |  (<l ,b ,a>  loan)} is unsafe, since it allows
values in the result which are not there in the domain of the expression. An
expression in domain-relational calculus {<x1,x2,….xn>| P(x1,x2,…xn)} is safe if all
of the following hold:-
(a) All values that appear in the result are from dom(P).
(b) For every “there exists” sub-formula of the form  x (P1(x)), the sub-
formula is true if and only if there is a value x in dom(P1) such that
P1(x) is true.
(c) For every “for all” sub-formula of the form x (P1(x)), the sub-formula
is true if and only if P1(x) is true for all values from dom(P1).

Equivalent expressions are feasible in the following:-

(a) Relational Algebra (without the extended operations like Aggregate,


Outer Join etc)
(b) Tuple Relational Calculus restricted to safe expressions.
(c) Domain Relational Calculus restricted to safe expressions.

Pradeep Kumar Kushwaha


40

Some Equivalent Queries in RELATIONAL ALGEBRA(RA), TUPLE


RELATIONAL CALCULUS (TRC) & DOMAIN RELATIONAL CLACULUS
(DRC):-

1. RA:  A = C (R (A,B,C))
TRC: { t  t R  t[A] = t[C] }
DRC: { A, B, C   A, B, C> R  A = C }

2. RA: A , B (R (A,B,C))


TRC: { t  uR ( t[A] = u[A]  t[B] = u[B] ) }
DRC: { A, B  C ( A, B, C> R ) }

3. RA: R (A,B,C)  S (C,D,E)


TRC: { t  uR ( t[A]=u[A]  t[B]=u[B]  t[C]=u[C] 
 v S (v[C]=u[C]  t[D]=v[D]  t[E]=v[E] )) }
DRC: { A, B, C1, D, E   A, B, C1> R 
C2 (C2,D,ES  C2 = C1) }
4. RA: R (A,B,C)  S (A,B,C)
TRC: { t  tR  tS }
DRC: { A,B,C   A,B,C> R  A,B,CS }

5. RA: R (A,B,C)  S (A,B,C)


TRC: { t  tR  tS }
DRC: { A,B,C   A,B,C> R  A,B,CS }

6. RA: R (A,B,C) - S (A,B,C)


TRC: { t  tR  tS }
DRC: { A,B,C   A,B,C> R  A,B,CS }

7. RA: R (A,B,C)  S (D,E,F)


TRC: { t  uR ( t[A]=u[A]  t[B]=u[B]  t[C]=u[C] 
 v S (t[D]=v[D]  t[E]=v[E]  t[F]=v[F] )) }
DRC: { A, B, C, D, E, F   A, B, C> R  (D,E,FS}

Pradeep Kumar Kushwaha


41

CHAPTER 7

STRUCTURED QUERY LANGUAGE

Structured Query Language (SQL) is a Language used for interaction with a Relational
Database Management System (RDBMS). It is not only a query language; but also used
for creation, update and maintenance of a database.

Characteristics of SQL

1. It is a non-procedural Language; wherein a Query has to only specify “what”


information is to be retrieved from the database, without specifying “how” the
information is to be retrieved.

2. Its syntax is “English-like”, which makes it very simple and user-friendly.

3. It is highly flexible. A Query in SQL may be written in a number of ways,


without affecting the end results. Also, there are no restrictions of starting a
Query at a particular column or to finish a Query in one line only.

4. SQL has a very small set of Commands, which makes it easy to learn.

5. Each SQL Query is parsed by RDBMS to check its syntax.

6. Each SQL Query is optimized, prior to execution.

Advantages of SQL

1. SQL, being a non-procedural language, provides a great degree of abstraction;


the user does not have to specify “how” the required information is to be
extracted from the database; this is taken care by the RDBMS.

2. Applications written in SQL can be easily ported from one system to another.
Such a need would arise when a system needs upgrade or change.

3. Since a query specifies only “what” information is to be extracted, not “how”


to extract it, a query in SQL would return same results, irrespective of the fact
whether is was optimized prior to its execution or not;

4. The expected results of a query are unambiguously defined.

5. It is not merely a query language used to retrieve information from database;


but also it is used to define schema, update database, insert new data, delete
defunct data, to define data integrity constraints and to define user access
rights etc.

Pradeep Kumar Kushwaha


42

6. The language, while being very simple, flexible and easy to learn, it has very
powerful features, which enable it to perform very complex operations in a
DBMS.

SQL Data Types & Literals

SQL supports the following Data Types:-

1. CHARACTER (n) Represents a fixed-length string of size n characters;


where “n” is an integer > 0. CHAR (n) is its abbreviation and CHAR is an
abbreviation for CHAR (1).

2. CHARACTER VARYING (n) Represents a varying-length string of


max size n characters; where “n” is an integer > 0. VARCHAR (n) is its
abbreviation.

3. BIT (n) Represents a fixed-length string of size n bits; where “n” is


an integer > 0.

4. BIT VARYING (n) Represents a varying-length string of max size n


bits; where “n” is an integer > 0.

5. NUMERIC (p, q) Represents a Decimal Number; having a total


number of “p” digits and sign, with “q” digits to the right of decimal point;
“q” must be <= “p”. NUMERIC (p) is abbreviation for NUMERIC (p,0).
NUMERIC is abbreviation for NUMERIC (p) where “p” is implementation-
defined.

6. DECIMAL (p, q) It is similar to NUMERIC (p, q). It represents a


Decimal Number; having a total number of “p” digits and sign, with “q” digits
to the right of decimal point; “q” must be <= “p”. DEC (p, q) is an
abbreviation for DECIMAL (p, q). DEC (p) is an abbreviation for DECIMAL
(p, 0). DEC is an abbreviation for DEC (p) where “p” is implementation-
defined.

7. INTEGER Represents a signed integer. INT is its abbreviation.

8. SMALLINT Represents a signed integer. INT is its abbreviation; its


precision will not increase that of INT.

9. FLOAT (p) Represents floating point number. FLOAT is an


abbreviation for FLOAT (p) where “p” is implementation defined. REAL is
alternative representation for FLOAT (s) where “s” is implementation-
defined. DOUBLE PRECISION is another representation for FLOAT (d)
where “d” is implementation-defined; but “d” > “s”.

Pradeep Kumar Kushwaha


43

10. BOOLEAN It can assume values TRUE or FALSE.

11. DATE This data type has ten positions embedded in single quotes
i.e. ‘DD-MM-YYYY’; for example ‘31-05-1950’ implies 31st May 1950.

12. TIME This data type has at least 8 positions embedded in single quotes
‘HH:MM:SS’; For example ’11:07:05’ implies 11.07.05 AM and ’23:07:05’
implies 11.07.05 PM.

13. TIMESTAMP It includes both DATE and TIME along with


minimum 6 digits representing decimal fraction of seconds ‘DD-MM-
YYYY HH:MM:SS mmmmmm’;for example ‘31-05-1950 01:02:05
567892’.

14. INTERVAL It specifies a time interval, a relative value that can be used
to increment or decrement an absolute value of DATE, TIME or
TIMESTAMP. The intervals are qualified either as YEAR/MONTH
intervals or DAY/TIME intervals.

The formats of DATE, TIME and TIMESTAMP are considered as


special type of String. So, string operators (say LIKE) can be applied to
these data types.

SQL supports the following Literal Types:-

1. Character String This is written as a sequence of characters enclosed


in single quotes; for example:-

‘DBMS’
‘Structured Query Language’

2. Bit String A bit string is written either as a sequence of 0s and 1s


enclosed in a single bracket and preceded by letter “B” or as a sequence of
Hexadecimal digits preceded by letter “X”; for example:-
B’1001001’
B’1’
B’0’
X’CA8’

3. Exact Numeric Written as signed or unsigned decimal number, may


be with a decimal point embedded in it. For example:-

77.00
+77.77
-69

Pradeep Kumar Kushwaha


44

900
0.9

4. Approximate Numeric Written as exact numeric, followed by letter


“E” and followed by signed/ unsigned integer; where mEn implies (m) n.

77.00E9
-7.7E8
+76.7E-4
+76.8E5

TYPES OF SQL COMMANDS

SQL Commands can be classified into the following categories:-

1. Data Definition Language (DDL)


2. Data Manipulation Language (DML)
3. Data Query Language (DQL)
4. Data Control Language (DCL)
5. Data Administrative Statements (DAS)
6. Transaction Control Statements (TCS)

1. Data Definition Language (DDL) It is used for defining the database schema
i.e. to CREATE, ALATER & DROP Tables, Views and Indexes; like CREATE
TABLE, ALTER TABLE, DROP TABLE, CREATE VIEW, DROP VIEW,
CREATE INDEX, DROP INDEX.

2. Data Manipulation Language (DML) The DML commands are used to


manipulate the information in the tables; like INSERT, UPDATE or DELETE
commands.

3. Data Query Language (DQL) This refers to SELECT command, which is


used to extract information from Tables.
Syntax:-

SELECT < list of attributes and/or aggregate of attributes of tables listed


below>
FROM <list of tables>
WHERE <predicate involving attributes of tables listed above and literals>
GROUP BY <list of attributes>
HAVING <predicate involving aggregated values>
ORDER BY ASC/DESC < list of attributes of tables listed above>

4. Data Control Language (DCL) These are security-related commands, which


control user access to the database. Database administrator grants or revokes user
privileges by using GRANT and REVOKE commands.

Pradeep Kumar Kushwaha


45

5. Data Administrative Statement (DAS) These are basically Audit commands


used to analyze the system performance. There are two commands START
AUDIT and STOP AUDIT.

6. Transaction Control Statements (TCS) These commands are used to control


transactions; like SET TRANSACTION, SAVEPOINT, COMMIT and
ROLLBACK.

SQL Operators

1. Arithmetic Operators:- Unary operators like positive or negative expression


(+, -) and binary operators like multiplication ( * ), division ( / ) , addition ( + ) and
subtraction ( - ) .

2. Comparison Operators:- =, > , < , >=, <=, ( != , <> , = ) , IN, NOT IN, IS
NULL, IS NOT NULL, LIKE, ALL, (ANY , SOME), EXISTS, NOT EXISTS,
BETWEEN x AND y .

3. Logical Operators:- AND, OR, NOT.

4. Set Operators:- UNION, UNION ALL, INTERSECT, MINUS.

Operator Precedence

() Enclosing Sub Queries


‘‘ Enclosing Literals
() Overrides normal operator precedence
+, - Unary Operators
*, / Multiplication and division
+, - Addition and Subtraction

NOT |
AND | Logical Operators
OR |

UNION |
INTERSECT | Set Operators
MINUS |

Tables, Views & Indexes

Creating a Table

The following DDL Statement will add a new Table STUDENT to the DBMS Catalog,
with the attributes and data types as explicitly clear from the statement. It indicates that

Pradeep Kumar Kushwaha


46

attribute REG_NO is primary key and attribute ROLL_NO is Unique, which implies that
ROLL_NO is a candidate key of STUDENT.

CREATE TABLE STUDENT


( REG_NO CHAR(10),
ROLL_NO CHAR (10),
S-NAME VARCHAR (25),
FATHERS_NAME VARCHAR (25),
BRANCH VARCHAR (05),
S_ADDRESS VARCHAR (50),
S_TELNO INT,
UNIQUE (ROLL_NO),
PRIMARY KEY (REG_NO));

Similarly, the following DDL Statements will add new Tables RESULT, EMPLOYEE
and DEPT to the DBMS Catalog.

CREATE TABLE RESULT


(ROLL_NO CHAR (10),
SUB_CODE CHAR (06),
MARKS INT,
CHECK (MARKS BETWEEN 0 AND 100),
PRIMARY KEY (ROLL_NO, SUB-CODE),
FOREIGN KEY (ROLL_NO) REFERENCES STUDENT (ROLL_NO));

CREATE TABLE DEPT


( DEPT_NO INT NOT NULL DEFAULT 1,
DEPT_NAME VARCHAR (30) NOT NULL,
DEPT_HEAD CHAR (8),
TOTAL_SAL INT,
PRIMARY KEY (DEPT_NO),
FOREIGN KEY (DEPT_HEAD) REFERENCES EMPLOYEE (EID));

CREATE TABLE EMPLOYEE


(EID CHAR (8) PRIMARY KEY,
ENAME VARCHAR (25),
DNO INT REFERENCES DEPT (DEPT_NO),
SALARY INT,
CHECK (SALARY BETWEEN 10000 AND 300000);

Altering an existing Table

The following DDL statement will add a new attribute STATUS of type INT to the
existing Table EMPLOYEE.

ALTER TABLE EMPLOYEE ADD STATUS INT;

Pradeep Kumar Kushwaha


47

Dropping an Existing Table

The following DDL statement will remove the existing Table RESULT from the DBMS
Catalog.

DROP TABLE RESULT;

Creating a View

The following statement will create a VIEW named DEPT_TOTAL_SAL with two
attributes D_NO and T_SAL by selecting DEPT_NO and TOTAL_SAL of existing Table
DEPT.

CREATE VIEW DEPT_TOTAL_SAL (D_NO, T_SAL)


AS SELECT DEPT_NO, TOTAL_SAL
FROM DEPT;

There will not be any table named DEPT_TOTAL_SAL; only its definition will be stored
in the DBMS Catalog. Whenever, a reference is made to DEPT_TOTAL_SAL in any
SQL Query, a table will be created with the help of the definition and the table will be
deleted after answering the query. For Example:-

SELECT DEPT_NO
FROM DEPT_TOTAL_SAL
WHERE T-SAL > 10000000;

The following SQL Statement will create VIEW named DEPT_AVG_SAL with
attributes D_NO and AVG_SAL from existing table EMPLOYEE. The attribute
AVG_SAL is computed by taking average of the salary of the employees of each
department.

CREATE VIEW DEPT_AVG_SAL (D_NO, AVG_SAL)


AS SELECT DNO, AVG (SAL)
FROM EMPLOYEE
GROUP BY DNO;

Dropping an existing View

The following statement will remove definition of VIEW named DEPT_TOTAL_SAL


from the DBMS CATALOG.

DROP VIEW DEPT_TOTAL_SAL;

Pradeep Kumar Kushwaha


48

Creating Indexes

Creating a Composite Index

The following statement will create a unique Index on the primary key EID of Table
EMPLOYEE.

CREATE INDEX E_INDX1


ON EMPLOYEE (EID, DNO);

Creating a Unique Index

The following statement will create a unique Index on the primary key EID of Table
EMPLOYEE.

CREATE UNIQUE INDEX E_INDX2


ON EMPLOYEE (EID);

Dropping an Index

DROP INDEX E-INDX1;

Queries & Sub-queries

A Query refers to a SELECT Statement used to extract information from the Tables.

Query Get Department Number and Average Salary of the employees of Dept Number 3
or more and having more than 10 employees; and order the information in descending
order of Average Salary.

SELECT DNO, AVG (SALARY) AS AVG_SAL


FROM EMPLOYEE
WHERE DNO > 3
GROUP BY DNO
HAVING COUNT (*) > 10
ORDER BY AVG_SAL DESC;

Query List Employee Names along with the Names of their respective Dept Heads.

SELECT E.ENAME AS EMP_NAME, K.E_NAME AS MGR_NAME


FROM EMPLOYEE E, DEPT D, EMPLOYEE K
WHERE E.DNO = D.DEPT_NO AND D.DEPT_HEAD = K.EID;

A Sub-query refers to a nested SELECT Statement as shown below:-

Pradeep Kumar Kushwaha


49

Query Get the name of Employee having highest salary.

SELECT DISTINCT ENAME


FROM EMPLOYEE
WHERE SALARY = ( SELECT MAX (SALARY)
FROM EMPLOYEE);

In the above query ( SELECT MAX (SALARY)


FROM EMPLOYEE) is a nested sub-query.

Aggregate Functions

Query Find Minimum, Maximum and Average Salary of the Employees.

SELECT MIN (SALARY), MAX (SALARY), AVG (SALARY)


FROM EMPLOYEE;

Query Find Average Salary of the Departments having at least 50 employees.

SELECT DNO, AVG (SALARY)


FROM EMPLOYEE
GROUP BY DNO
HAVING COUNT (*) >= 50;

Query Find Average Marks and Total Marks obtained by each Student

SELECT ROLL_NO, AVG (MARKS), SUM (MARKS)


FROM RESULT
GROUP BY ROLL_NO;

Query Find Minimum, Maximum and Average Marks obtained in each Subject.

SELECT MIN (MARKS), MAX (MARKS), AVG (MARKS)


FROM RESULT
GROUP BY SUB_CODE;

Insert, Update & Delete Operations

Insert information of a new employee with EID: ‘0013325K’, NAME: ‘VIJAY


KUMAR SHARMA’, DNO: 10 and SALARY: 50000 in the EMPLOYEE Table.

INSERT INTO EMPLOYEE

Pradeep Kumar Kushwaha


50

VALUES (‘0013325K’, ‘VIJAY KUMAR SHARMA’, 10, 50000);

Update the Salary of employee with EID ‘0012240L’ to 55000

UPDATE EMPLOYEE
SET SALARY = 55000
WHERE EID = ‘0012240L’;

Increase the Salary of each employee by 10%.

UPDATE EMPLOYEE
SET SALARY = SALARY * 1.1;

Delete Employee with EID ‘0022343C’ from the EMPLOYEE Table.

DELETE FROM EMPLOYEE


WHERE EID = ‘0022343C’;

JOINS

Query Get the Names of Students, who have appeared for Subject ‘TCS501’

SELECT DISTINCT S_NAME


FROM STUDENT, RESULT
WHERE STUDENT.ROLL_NO = RESULT.ROLL_NO
AND SUB_CODE = ‘TCS501’;

This Query involves a Natural Join of STUDENT and RESULT and in Relational
Algebra it can be written as:-

S_NAME (SUB_CODE = ‘TCS501’ (STUDENT * RESULT))

UNION, INTERSECT & MINUS

UNION

Query Get the Names of the Students, who have appeared for subject
‘TCS501’ or for ‘TCS503’ or for both.

(SELECT DISTINCT S_NAME


FROM STUDENT, RESULT
WHERE STUDENT.ROLL_NO = RESULT.ROLL_NO
AND SUB_CODE = ‘TCS501’)

Pradeep Kumar Kushwaha


51

UNION

(SELECT DISTINCT S_NAME


FROM STUDENT, RESULT
WHERE STUDENT.ROLL_NO = RESULT.ROLL_NO
AND SUB_CODE = ‘TCS503’);

This Query is equivalent to Relational Algebra query:-

T1  STUDENT * RESULT
S_NAME (SUB_CODE= “TCS501” (T1))  S_NAME (SUB_CODE = “TCS503” (T1))

INTERSECT

Query Get the Names of the Students, who have appeared both for
‘TCS501’ and ‘TCS503’.

(SELECT DISTINCT S_NAME


FROM STUDENT, RESULT
WHERE STUDENT.ROLL_NO = RESULT.ROLL_NO
AND SUB_CODE = ‘TCS501’)

INTERSECT

(SELECT DISTINCT S_NAME


FROM STUDENT, RESULT
WHERE STUDENT.ROLL_NO = RESULT.ROLL_NO
AND SUB_CODE = ‘TCS503’);

This Query is equivalent to Relational Algebra query:-

T1  STUDENT * RESULT
S_NAME (SUB_CODE= “TCS501” (T1))  S_NAME (SUB_CODE = “TCS503” (T1))

MINUS (SET DIFFERENCE)

Query Get the Names of the Students, who have appeared for subject
‘TCS501’ but not for ‘TCS503’.

(SELECT DISTINCT S_NAME


FROM STUDENT, RESULT
WHERE STUDENT.ROLL_NO = RESULT.ROLL_NO
AND SUB_CODE = ‘TCS501’)

Pradeep Kumar Kushwaha


52

MINUS

(SELECT DISTINCT S_NAME


FROM STUDENT, RESULT
WHERE STUDENT.ROLL_NO = RESULT.ROLL_NO
AND SUB_CODE = ‘TCS503’);

This Query is equivalent to Relational Algebra query:-

T1  STUDENT * RESULT
S_NAME (SUB_CODE= “TCS501” (T1)) - S_NAME (SUB_CODE = “TCS503” (T1))

Cursors in SQL

Cursor is a Construct in PL/SQL that enables a user to earmark a private memory area to
hold an SQL Statement for accessing later on.

Example

Suppose Total Marks Scored by a Student are to be extracted from RESULT and to be
entered into Table TOTAL_MARKS (ROLL_NO, T_MARKS).

DECLARE
CURSOR C_Student IS
SELECT ROLL_NO, SUM (MARKS)
FROM RESULT
GROUP BY ROLL_NO;
C_NO CHAR (10);
C_TOTAL INT;

BEGIN
OPEN C_Student;
LOOP
FETCH C_Student INTO C_NO, C_TOTAL;
EXIT WHEN C_Student%NOTFOUND;
INSERT INTO TOTAL_MARKS
VALUES (C_NO, C_TOTAL);
END LOOP;
CLOSE C_Student;
COMMIT;
END;

Pradeep Kumar Kushwaha


53

Navigating through SQL

DDL

Creation and Alteration of Tables

1. Create Tables to generate the following schema:-

Customer (C_Id, C_Name, C_Street, C_City)


Branch (B_Id, B_Name, B_City)
Account (AN, B_Id, Bal)
Loan (LN, B_Id, Amount)
Depositor (C_Id, AN)
Borrower (C_Id, LN)

CREATE TABLE Customer (C_Id Char(10) PRIMARY KEY,


C_Name Varchar (15) NOT NULL,
C_Street Varchar (15) NOT NULL,
C_City Varchar (15) NOT NULL);

CREATE TABLE Branch ( B_Id Char(10) PRIMARY KEY,


B_Name Varchar (15) NOT NULL,
B_City Varchar (15) NOT NULL);

CREATE TABLE Account ( AN Char(10) PRIMARY KEY,


B_Id Char (10) REFERENCES Branch (B_id),
Bal Number (10,2));

CREATE TABLE Loan ( LN Char(10) PRIMARY KEY,


B_Id Char (10) REFERENCES Branch (B_id),
Amount Number (10,2));

CREATE TABLE Depositor ( C_Id Char(10) REFERENCES Customer (C_Id),


AN Char (10) REFERENCES Account (AN),
Primary Key (C_Id, AN));

CREATE TABLE Borrower ( C_Id Char(10) REFERENCES Customer (C_Id),


LN Char (10) REFERENCES Loan (LN),
Primary Key (C_Id, LN));

Suppose there is a constraint that account balance should not be less than 1000,
it can be added to the table Account as follows:-

ALTER TABLE Account ADD CONSTRAINT Check_Bal CHECK (Bal >= 1000);

Pradeep Kumar Kushwaha


54

2. Create Tables on the following schema:-

Supplier (S#, S_Name, S_City)


Part (P#, P_Name)
Project (J#, J_Name, J_City)
Order (S#, P#, J#, Qty)

CREATE TABLE Supplier ( S# Char (10) PRIMARY KEY,


S_Name Varchar (30) NOT NULL,
S_City Varchar (30) NOT NULL);

CREATE TABLE Part ( P# Char (12) PRIMARY KEY,


P_Name Varchar (30) NOT NULL);

CREATE TABLE Project ( J# Char (10) PRIMARY KEY,


J_Name Varchar (30) NOT NULL,
J_City Varchar (30) NOT NULL);

CREATE TABLE Supply_Order ( S# Char (10) REFERENCES Supplier (S#),


P# Char (12) REFERENCES Part (P#),
J# Char (10) REFERENCES Project (J#),
Qty INT NOT NULL,
PRIMARY KEY (S#, P#, J#));

3. Create Tables on the following schema:-

Student (Roll_No, S_Name, S_DOB, S_Address)


Course (C_Code, Title, Credits)
Teacher (T_Code, T_Name, Desig, D_Code)
Department (D_Code, D_Name, HOD)
Offers (T_Code, C_Code, Semester)
Result (Roll_No, C_Code, T_Code, Semester, Marks)

CREATE TABLE Student ( Roll_No Char (10) PRIMARY KEY,


S_Name Varchar (20) NOT NULL,
S_DOB DATE,
S_Address Varchar(30));

CREATE TABLE Course ( C_Code Char(7) PRIMARY KEY,


Title Varchar (20) NOT NULL,
Credits INT NOT NULL);

CREATE TABLE Department ( D_Code Char (3) PRIMARY KEY,


D_Name Varchar (20) NOT NULL,

Pradeep Kumar Kushwaha


55

CHECK (D_Code IN (‘CSE’,’IT’,’ECE’,’IC’,’EE’,’ME’,’MT’)),


HOD Varchar (30));

CREATE TABLE Teacher ( T_Code Char(3) PRIMARY KEY,


T_Name Varchar (30) NOT NULL,
Desig Char (10),
D_Code Char (3) REFERENCES Department (D_Code),
CHECK (Desig IN (‘Lect’, ‘Sr Lect’, ‘Asst Prof’, ‘Prof’)));

CREATE TABLE Offers ( T_Code Char(3) REFERENCES Teacher (T_CoDE),


C_Code Char (7) REFERENCES Course (C_Code),
Semester Char(5),
PRIMARY KEY (T_Code, C_Code, Semester),
CHECK (Semester IN (‘Odd’,’Even’)));

CREATE TABLE Result ( Roll_No Char (10) REFERENCES Student (Roll_No),


C_Code Char(7),
T_Code Char(3),
Semester Char (5),
FOREIGN KEY (C_Code, T_Code, Semester)
REFERENCES Offers (C_Code, T_Code, Semester),
MARKS INT,
CHECK (MARKS BETWEEN 0 AND 100));

4. Create Tables on the following schema:-

Emp (E#, E_Name, Salary, D#)


Dept (D#, D_Name, Total_Sal, Mgr#)

Here D# in Emp is a foreign key referencing D# of Dept, Mgr# in Dept is a foreign key
referencing E# of Emp and Total_Sal in Dept is total salary of all the employees working
in a department. The situation is little tricky here:-

(i) Both the tables are referencing each other. If we create the Emp table first and
declare D# as foreign key referencing Dept (D#), the system will generate
exception “table or view does not exist” since the table Dept is non-existent.
Similar situation will occur if we attempt to create Dept first and declare
Mgr# as foreign key referencing Emp (E#).

(j) Insertion of data into the tables will also face problem. If we attempt into
Emp first, it will attempt to reference a non-existent tuple in table Dept for
D# and if we first insert a tuple in Dept, it will attempt to reference a non-
existent tuple in Emp for Mgr#.

Pradeep Kumar Kushwaha


56

The above two problem situations can be handled as follows:-

Step 1: Create the tables without foreign key constraints:-

CREATE TABLE Emp ( E# Char(12) PRIMARY KEY,


E_Name Varchar (30) NOT NULL,
Salary NUMBER (10,2),
D# Char (12));

CREATE TABLE Dept ( D# Char(12) PRIMARY KEY,


D_Name Varchar (30) NOT NULL,
Total_Sal NUMBER (15,2),
Mgr# Char (12));

Step 2: Add foreign key constraints to the above tables:-

ALTER TABLE Emp ADD CONSTRAINT Emp_FK FOREIGN KEY (D#)


REFERENCES Dept (D#) INITIALLY DEFERRED DEFERRABLE;

ALTER TABLE Dept ADD CONSTRAINT Dept_FK FOREIGN KEY (Mgr#)


REFERENCES Emp (E#) INITIALLY DEFERRED DEFERRABLE;

Note that the foreign key constraints added to the above tables are of type
“Initially deferred deferrable”. This implies that while inserting data, the check for
compliance of foreign key constraints will be deferred till the next COMMIT statement is
executed. This will enable data entry into the two tables in any sequence, as long as the
information in the two tables is compatible at the time of execution of next COMMIT
statement.

5. Create tables on the following schema:-

Class (Year, Branch, Section, Strength)

Subject (Sub_Code, Title, Credits)

Faculty (Fac_Code, Fac_Name, Dept_Code)

Class_Room (Bldg_No, Room_No, Floor, Capacity)

Time_Slot ( Day, Period)

Time_Table (Year, Branch, Section, Day, Period, Sub_Code, Fac_Code, Bldg_No,


Room_No)

The table Time_Table has following candidate keys:-

Pradeep Kumar Kushwaha


57

(Year, Branch, Section, Day, Period) Designated as Primary Key


(Fac_Code, Day, Period)
(Bldg_No, Room_No, Day, Period)

CREATE TABLE Class ( Year INT, Branch Char(3), Section INT, Strength INT,
PRIMARY KEY (Year, Branch, Section),
CHECK (Branch IN
(‘CSE’,’IT’,’ECE’,’IC’,’ME’,’EE’, ’MT’)),
CHECK (Year BETWEEN 1 AND 4),
CHECK (Section BETWEEN 1 AND 2));

CREATE TABLE Subject ( Sub_Code Char (7) PRIMARY KEY,


Title Varchar (20) NOT NULL,
Credits Int);

CREATE TABLE Faculty ( Fac_Code Char(3) PRIMARY KEY,


Fac_Name Varchar (3) NOT NULL,
Dept Char (4));

CREATE TABLE Class_Room ( Bldg_No Char(4), Room_No INT, Capacity INT,


PRIMARY KEY (Bldg_No, Room_No ));

CREATE TABLE Time_Slot ( Day Char (4), Period INT,


PRIMARY KEY (Day, Period),
CHECK (Day IN (‘Mon’, ’Tue’, ’Wed’, ’Thurs’, ‘Fri’, ‘Sat’)));

CREATE TABLE Time_Table


( Year INT, Branch Char(3), Section INT, Day Char(4), Period INT,
PRIMARY KEY (Day, Period, Year, Branch, Section),
Bldg_No Char(4) NOT NULL, Room_No INT NOT NULL,
Sub_Code Char (7) NOT NULL, Fac_Code Char(3) NOT NULL,
UNIQUE (Day, Period, Bldg_No, Room_No),
FOREIGN KEY (Year, Branch, Section)
REFERENCES Class (Year, Branch, Section),
FOREIGN KEY (Sub_Code) REFERENCES Subject (Sub_Code),
FOREIGN KEY (Fac_Code) REFERENCES Faculty (Fac_Code),
FOREIGN KEY (Bldg_No, Room_No)
REFERENCES Class_Room (Bldg_No, Room_No),
FOREIGN KEY (Day, Period) REFERENCES Time_Slot (Day, Period));

Creation of Indices

Since (Day, Period, Fac_Code) is a candidate key of Time_Table, we can create on this
combination of the columns.

Pradeep Kumar Kushwaha


58

CREATE INDEX Fac_Indx ON Time_Table (Day, Period, Fac_Code);

Dropping of Tables

DROP TABLE Student;

Dropping of Constraints

Drop Constraint Emp_FK;

Dropping of Columns

ALTER TABLE Student DROP Column S_Name;

Dropping of Indices

DROP INDEX Fac_Indx;

How to drop the tables that reference each other?

Take the case of tables Emp and Dept. Suppose table Emp is to be dropped, the system
would not permit this, since the dropping of Emp would violate the foreign key constraint
Dept_FK of table Dept. Dropping of Emp and Dept is achieved as follows:-

Step 1: First drop the constraints Emp_FK and Dept_FK.

DROP CONSTRAINT Emp_FK;


DROP CONSTRAINT Dept_FK;

Step 2:Now drop the tables.

DROP TABLE Emp;


DROP Table Dept;

Pradeep Kumar Kushwaha


59

DML

INSERTING INFORMATION INTO TABLES

1. Add a new customer to table Customer with C_Id = ‘C101’, C_Name = ‘Ajay’,
C_Street =’S-26’ and C_City = ‘Noida’.

INSERT INTO Customer VALUES (‘C101’,’Ajay’, ‘S-26’, ‘Noida’);

Entering a NULL value

2. Add a new student to the Student Table with Roll_No = ‘091010120’, S_Name =
‘Vijay’, S_Address = ‘S-27 Noida’. (Note that information about S_DOB is missing. It is
not a NOT NULL attribute, so it can be assigned a NULL value).

INSERT INTO Student VALUES (‘091010120’, ‘Vijay’, NULL, ‘S-27 Noida’);

The above NULL can also be inserted as follows (the attribute name S_DOB is omitted
from the attribute list specified with the table name):-

INSERT INTO Student (S_Name, Roll_No, S_Address) VALUES (‘Vijay’,


(‘091010120’, ‘S-27 Noida’);

Since attribute name S_DOB is not listed in the list of attributes listed with the table
Student, NULL value will be assigned to this attribute. As indicated, the attributes can be
listed in any order. Then the values have to specified in the same order.

RETRIEVING INFORMATION FROM TABLES

3. Get all rows of table Student.

SELECT *
FROM Student;

Restricting rows with a WHERE Clause

4. Get the information of those students who are born in 1995.


This query can be expressed in any of the following three forms:-

SELECT *
FROM Student
WHERE S_DOB >= ’01-JAN-1995’ AND DOB <= ’31-DEC-1995’;

Pradeep Kumar Kushwaha


60

SELECT *
FROM Student
WHERE S_DOB BETWEEN ’01-JAN-1995’ AND ’31-DEC-1995’;

SELECT *
FROM Student
WHERE S_DOB LIKE ‘%95’;

Restricting Attributes List

5. Get Roll_No and DOB of all students born before 01st Jan 1995.

SELECT Roll_No, S_DOB


FROM Student
WHERE S_DOB < ’01-JAN-1995’;

Use of Substitution Variables

Previous query can be recalled and re-executed by typing /. Suppose a query is to be


reused with different parameters then parameters can be specified by substitution
variables as follows:-

6. SELECT *
FROM Account
WHERE AN = &numb;

Here numb is a substitution variable, whose value will be accepted by the system by
displaying prompt ‘Enter Value for numb:’. Each time the query is executed, a different
value for numb can be entered like A101, A105 etc.

Retrieving information from more than one tables

7. Get the names of students who got more than 90 marks in any subject.

Here, we perform natural join of Result and Student and pick up names of those
students who have scored more than 90 marks in any subject.

SELECT DISTINCT S_Name


FROM Result, Student
WHERE Result.Roll_No = Student.Roll_No AND Marks > 90;

Since attribute name Roll_No appears in both the tables specified in the FROM
clause, we need to qualify by the table name, while using this attribute name in
subsequent clauses. However, this is not the problem with attribute Marks, since it
appears only in table Result.

Pradeep Kumar Kushwaha


61

The qualifier DISTINCT has been used in the SELECT clause to avoid duplicates
names from appearing in the result in the case of those students who have scored
more than 90 marks in more than one subjects.

The above query can be expressed more elegantly by declaring a tuple variable
say R on the table Result and another tuple variable S on the table Student, as
shown below:-

SELECT DISTINCT S_Name


FROM Result R, Student S
WHERE R.Roll_No = S.Roll_No AND Marks > 90;

8. Get the customer and account number of those customers, who are living in Noida
but having account in Delhi and have Balance more than 100000.

SELECT C_Name, D.AN


FROM Customer C, Depositor D, Account A, Branch B
WHERE C.C_Id = D.C_Id AND D.AN = A.AN AND A.B_Id = B.B_Id AND
C_City = ‘Noida’ AND B_City = ‘Delhi’ AND Bal > 100000;

9. Get the Roll_No of those students whose DOB is not specified in the Student
table.

SELECT Roll_No
FROM Student
WHERE S_DOB IS NULL;

Using Aliases

10. SELECT C_Name AS Customer_Name, D.AN AS Account_Number


FROM Customer C, Depositor D
WHERE C.C_Id = D.C_Id;

Here, the Attributes in the resulting table will be named as Customer_Name and
Account_Number.

Using Concatenation in SELECT clause

11. SELECT C_Name|| ‘ has an account number ’ || D.AN


FROM Customer C, Depositor D
WHERE C.C_Id = D.C_Id;

Here each tuple output will be as follows:-

Ajay has an account number A101


Vijay has an account number A310

Pradeep Kumar Kushwaha


62

Arithmetic Operations

12. Suppose there is a schema Emp (E_Id, E_Name, Basic_Pay, DA, HRA,
Deduction). We can have query to determine gross salary of each employee:-

SELECT E_Id, Basic_Pay+ DA + HRA – Deduction AS Gross_Salary


FROM EMP;

Sorting the results

13. Get information in Result, ordered by Marks in Ascending Order.

SELECT *
FROM Result
ORDER BY Marks;

By default Order By clause will order in Ascending Order.

14. Get information in Result, ordered by Marks in Descending Order.

SELECT *
FROM Result
ORDER BY Marks DESC;

Use of special Attribute ROWNUM

15. Get information of top five employees Salary-wise.

SELECT ROWNUM, E#, E_Name, Salary


FROM ( SELECT E#, E_Name, Salary
FROM Emp
ORDER BY Salary DESC)
WHERE ROWNUM <= 5;

16. Get information of three accounts with highest balance.

SELECT ROWNUM, AN, Bal


FROM ( SELECT AN, Bal
FROM Account
ORDER BY Bal DESC)
WHERE ROWNUM <=3;

Performing Aggregate Functions on the data

Pradeep Kumar Kushwaha


63

17. Get Min, Max, Total and Avg Marks in Result.

SELECT MIN (Marks), Max (Marks), SUM (Marks), AVG (Marks)


FROM Result;

Performing Aggregate Functions on grouped rows

18. Get Min Balance, Max Balance and Total Balance at each branch.

SELECT B.B_Id, MIN (Bal), MAX(Bal), SUM(Bal)


FROM Account A, Branch B
WHERE A.B_Id = B.B_Id
GROUP BY B.B_Id;

Restricting the Grouped Information using HAVING clause

19. Get the names and Total Marks of those students who have scored Average Marks
more than 80%.

SELECT S_Name, SUM (Marks)


FROM Student S, Result R
Where S.Roll_No = R.Roll_No
GROUP BY S.Roll_No
HAVING AVG (Marks) > 80;

This will display total marks of each student having average score > 80%.

Outer Join of Tables

20. Perform Left Outer join of tables Depositor & Borrower

SELECT D.C_ID, AN, LN


FROM Depositor D, Borrower B
WHERE D.C_Id = B.C_Id (+);

All tuples of Depositor will appear in the result. Wherever, LN is not defined, it
will be indicated by NULL.

21. Perform Right Outer join of tables Depositor & Borrower

SELECT B.C_ID, AN, LN


FROM Depositor D, Borrower B
WHERE D.C_Id (+) = B.C_Id;

Pradeep Kumar Kushwaha


64

All tuples of Borrower will appear in the result. Wherever, AN is not defined, it
will be indicated by NULL.

Nested Queries with Independent sub-queries

22. Get the Customer Id and name of those customers who have both account and
loan from the bank.

SELECT C.C_Id, C_Name


FROM Custmer C, Depositor D
WHERE C.C_Id = D.C_Id AND C_Id IN ( SELECT C_Id
FROM Borrower);

Here (SELECT C_Id FROM Borrower) is called inner sub-query and the main query
SELECT C.C_Id, C_Name FROM Custmer C, Depositor D WHERE C.C_Id = D.C_Id
AND C_Id IN ( ) is called outer query. The inner query can be evaluated independent of
the outer query. Such a query is evaluated in two steps:-

(i) First evaluate the inner sub query and save its output.
(j) Now evaluate the outer sub query wrt the result produced by inner sub-
query.

Evaluation of inner sub-query will produce a set of C_Id of those customers who have a
loan from the bank. For each C_Id existing in the depositor table, the outer sub-query will
examine whether that C-Id exists in the set produced by inner sub-query. If the answer is
“Yes” then that C_Id belongs to a customer having both account and loan and the
customer’s name appears in the final output table.

23. Get names of the customers having joint account with customer Ajay.

SELECT CN
FROM Customer C, Depositor D
WHERE C.C_Id = D.C_Id AND CN <>’Ajay’
AND AN IN ( SELECT AN
FROM Customer K, Depositor P
WHERE K.C_Id = P.C_Id AND CN =’Ajay’);

The inner sub-query will produce set of Account Numbers held by custmer ‘Ajay’. The
outer sub-query will determine the other customers who are having an account held by
‘Ajay’.

Pradeep Kumar Kushwaha


65

24. Get Branch Id and Name of the branch having highest average balance amongst
all branches.

SELECT B.B_Id, B_Name


FROM Account A, Branch B
WHERE A.B_Id = B.B_Id
GROUP BY B.B_Id
HAVING AVG (Bal) = ( SELECT MAX(AVG(Bal))
FROM Account
GROUP BY B_Id);

Nested Queries with Correlated Sub-queries

Here, the inner sub-query is not independent of the outer sub-query. So, inner sub-query
is evaluated for each tuple of the outer sub-query.

25. Get the names of the customers who have account in each branch located in
Noida.

SELECT C_Name
FROM Customer C
WHERE NOT EXISTS (( SELECT B_Id
FROM Branch
WHERE B_City = ‘Noida’)
MINUS
( SELECT B_Id
FROM Account A, Depositor D

WHERE A.AN = D.AN AND D.C_Id = C.C_Id));


Here the inner query is evaluated for each tuple C of table Customer being processed in
the outer query. The inner sub query produces the set difference of the set of B_Id of
branches in Noida and the set of B_Id where customer C.C_Id has accounts. If this set
produced by the inner sub query is an empty set then the predicate NOT EXISTS of the
outer sub query will be true and the customer name will appear in the result.

Updating of tables

26. UPDATE Account


SET Bal = &newbal
WHERE AN = &an;

Pradeep Kumar Kushwaha


66

Deletion of tuples from Tables

26. DELETE Account WHERE AN = &an;

Creation of Views

27. CREATE VIEW AN_BN


AS SELECT AN, B.B_Name
FROM Account A, Branch B
WHERE A.B_Id = B.B_Id;

Retrieving information from Views

28. SELECT B_Name FROM AN_BN;

Dropping of Views

29. DROP VIEW AN_BN;

Pradeep Kumar Kushwaha


67

Pradeep Kumar Kushwaha


1

CHAPTER 8
NORMALIZATION OF RELATIONAL SCHEMA

Normalization of a Relation Schema

Normalization of a Relation Schema refers to the process of:-

(a) Identifying those data dependencies in the schema, which would cause
anomalies during Insert, Update and Delete of data.

(b) And decomposing the schema into a set of sub-schemas, on the basis of
dependencies, causing anomalies.

The resulting schemas would permit representation of the intended information, while
maintaining the data redundancies to a minimal.

The process of normalization can be well understood only after understanding the
concept of various types of data dependencies occurring in a database.

Concept of Functional Dependencies (FDs)

Let there be a Relation Schema R, comprising sets of attributes , &  i.e.   R,  
R &   R. A Functional Dependency   (read as “ determines ”) is said
to be holding on the Schema R, if for every legal relation r(R) and for every tuple- pair
{t1,t2} r, if t1[] = t2[] then it must satisfy t 1[] = t2[]. This means that if any two
tuples of relation r(R) agree on the values of , then the two tuples must agree on the
values of .

, the “Left Side” of FD   is called “Determinant”; and , the “Right Side” of


the FD is called “Dependent”.

Example: The statement that “Knowing the Registration-number of a Student, we


can determine his/her Name” implies FD Registration-number  Name. Here, the
Registration-number is the Determinant of the FD and Name is the Dependent.

FDs holding on a Schema R

A set of FDs F is said to be holding on a schema R, if all FDs in F are satisfied by every
legal relation r(R).

Suppose, there is a Relational Schema “Student” having the following FDs holding on it:-

{Roll_No}  Registration_No, Branch, Section, Name, Father’s-name, Address, DOB


{Registarion_No}  Roll_No, Branch, Section, Name, Father’s-name, Address, DOB

P S Gill
2
{Name, Father_Name, DOB, Address}  Roll_No, Registration_No, Section, Branch

Trivial Functional Dependency

An FD   is called Trivial, if   . It is called trivial, because such an FD would


be satisfied by any relation r(R). Since, if two tuples of a relation agree on the values of
, the tuples will definitely agree on the values of , since  is a subset of .

For Example, {Roll_No, Name}  Name represents a trivial FD.

Extraneous Attributes on the “Left Side” of an FD

Suppose an FD   is holding on a Schema R. An attribute A (A ) is said to be


extraneous, if FD ( - A)  also holds on R. This means that attribute A is not
required in  to determine the value of . The subset ( - A) is sufficient to determine
the value of .

Left-Irreducible FD

An FD  , holding on a schema R, is said to be left-irreducible, if there exists no


proper subset of , which can determine  i.e. there exists no attribute A (A ) such
that ( - A)  holds on R.

Super Key (SK) of a Relation Schema

Suppose FD K R holds on schema R, where K R, then K forms a Super Key (SK)


of Schema R. Super-set of a SK will also form a SK.

Example: Suppose, we have a Schema Student (Roll_No, Registration_No, Branch,


Section, Name, Fathers-name, Address, DOB) and suppose the value of {Roll-no} is
distinct and also the value of {Registration_No} is distinct for each student. Thus, {Roll-
no, Class} will form a Super-Key of Schema Student. Similarly, {Roll-no, Class, Name}
will also form a Super-Key of Student. Similarly, there can be many Super Keys of
Student.

Prove that if K R holds, then K forms a Super Key of R

Proof by Contradiction

Let us assume that two tuples {t 1 and t2} in a legal relation r(R) agree on the values of K
i.e. t1 [K] = t2 [K]. ------- (i)
Since, K R, it implies that t 1 [R] = t2 [R] , and thus t 1 = t2

P S Gill
3
But no two tuples in a legal relation r(R) can be equal, since a relation is defined as a set
of tuples and in a set no two elements can be same.

Thus our assumption (i) is wrong.


Thus, it is proved that no two tuples in a legal relation r(R) can agree on the values of K;
thus K is a Super Key of R. So, knowing the value of attribute set K, a tuple can be
uniquely identified in a relation r.

Each Relation Schema R will have at least one default Super Key; that is, the set of all its
attributes i.e. the entire schema R itself.

Candidate Key (CK) of a Relation Schema R

Since superset of a Super Key of R will also be its Super Key. This implies that a Super
Key may be containing some extraneous attributes. Suppose K is a Super Key of R and E
is the complete set of extraneous attributes contained in K, then (K-E) will form a
minimal Super Key of R. No proper subset of (K-E) will form a Super Key of R. This
minimal Super Key is called a Candidate Key of R.

Alternatively, it can be stated that if K R forms a left-irreducible FD holding on R,


then K is called a Candidate Key of R.

For example, {Roll_No} and {Registration_No} will form Candidate Keys of schema
“Student”. In addition, {Name, Fathers_Name, Address, DOB} also forms another
Candidate Key.

Primary-Key

A Relation Schema R may have more than one Candidate Keys. One of the Candidate
Keys is chosen as primary means to identify tuples uniquely in a relation r(R). This
designated candidate key is called a Primary Key of R. Out of the three candidate keys in
the above example, we may select {Roll-No} as Primary Key of the Schema Student.

Prime Attributes or Key Attributes of a Relational Schema An attribute A of


Relational Schema R (AR) is said to be a Prime Attribute or Key Attribute, if it forms
part of any of its Candidate Keys. Let {K1, K2, …. Kn} be the complete set of candidate
keys of R. Then the set of Prime Attributes of R = K’ = K1  K2  …. Kn

Non-Prime Attributes of a Relational Schema The subset of R, which does not


form part of any of its Candidate Keys, is called the set its Non-Prime Attributes or Non-
Key Attributes.

P S Gill
4

Logically implied FDs


Suppose F is the set of FDs holding on a Relation Schema R. There may be some FDs
that can be logically inferred from F. These inferred FDs will also hold on every legal
relation r(R). The set of such FDs, inferred from set F, is said to logically implied by F.

Rules for the Inference of FDs

Suppose , ,  and  are subsets of attributes of a Relation Schema R i.e.   R and


  R,   R and   R

Armstrong’s Rules The inference rules 1..3 are Armstrong’s Rules:-

Rule 1 (Reflexivity Rule) If    then    holds on R.

Rule 2 (Augmentation Rule) If    holds on R, then   holds on R.

Rule 3 (Transitivity Rule) If    and  hold on R, then   holds on R.

Additional Rules of Inference

Rule 4 (Union Rule) If    and  hold on R, then   holds on R.

Rule 5 (Decomposition Rule) If    holds on R, then   and   hold on R.

Rule 6 (Pseudo-Transitivity Rule) If    and  hold on R, then   holds


on R.

Proofs of Inference Rules


1 Reflexivity Rule

Suppose  .
Consider a relation r (R) and a tuple pair { t 1, t2 }  r such that
t1[] = t2[] ----------------- (i)
Since   , t1 and t2 will also satisfy
t1[] = t2[]-------------------(ii)
From (i) and (ii), it is implied that    holds on R.
Thus, proved.

P S Gill
5
2 Augmentation Rule

We prove this rule by contradiction.


Suppose    holds on R.
Consider a relation r(R ) and a tuple pair { t 1, t2 }  r such that
t1[] = t2[] --------------- (i)
Since,    holds on R, t1 and t2 will also satisfy
t1[] = t2[] --------------- (ii)
We make an assumption that   does not hold on R. ----(A)
That is, if t 1[] = t2[] -------------(iii)
Then we have t 1[]  t2[] --------------(iv)
From (i) and (iii), it is implied that
t1[] = t2[] -----------------(v)
From (ii) and (v), it is implied that
t1[] = t2[] --------------(vi)

Since, (vi) contradicts (iv), Our assumption (A), that   does not hold on R, is NOT
CORRECT.

Thus, if    holds on R then   also holds on R.

3 Transitivity Rule

Suppose    and  holds on R.


Since,    holds on R, for every relation r(R ) and a tuple pair { t1, t2 }  r , which
satisfies :-
t1[] = t2[] -----------------(i)
we also have:-
t1[] = t2[] -----------------(ii)
Since,    holds on R, for
t1[] = t2[] -----------------(iii)
we also have:-
t1[] = t2[] ----------------- (iv)

From (i) and (iv), it is implied that   holds on R.

4 Union Rule

Suppose    and  hold on R.


Applying Augmentation Rule to   {Augmenting by }, it implies that
   ------------------------(i)
Applying Augmentation Rule to   {Augmenting by }, it implies that
   ------------------------(ii)

P S Gill
6
Applying Transitivity Rules to (i) and (ii), it is implied that
  

5 Decomposition Rule

Suppose    holds on R. ------ (i)


By Reflexivity Rule   --------(ii)
Applying Transitivity Rule between (i) and (iii)


Similarly, we can prove that   

6. Pseudo-Transitivity Rule

Suppose    and    hold on R.


Applying Augmentation Rule to    {augmenting by  on both sides),   
Applying Transitivity Rule to    and   , we get   

Trivial FD An FD    is said to be trivial, if    i.e. if the determinant is a


superset of dependent.

Example: The Functional Dependency conveyed by the statement, “ Knowing the Name
and Address of a Person, we can determine his Address “ i.e. (Name, Address) 
Address is obviously trivial.

Closure of FD Set

Suppose F is a set of FDs that holds on a Schema R, then the Closure of F, denoted by F+,
is the complete set of FDs, that includes F and all the FDs that are logically implied
(inferred) by F. Any legal relation r(R) that satisfies F will also satisfy F +.

Algorithm to determine Closure F+ of an FD Set F

F+ = F;
Repeat
Save-F+ = F+;
To each FD f1  F+, apply Armstrong’s Rule of Reflexivity; and add the FDs so
inferred to F+;
To each FD f1  F+, apply Armstrong’s Rule of Augmentation; and add the FDs
so inferred to F+;
To each such pair of FDs as {     } F+, apply the Armstrong’s Rule of
Transitivity, and add FD    to F+;

Until (F+ = Save-F+);

P S Gill
7
Cover of an FD Set

An FD set G is said to be the cover of another FD set F, if F  G+ i.e. all the FDs that are
there in F are also there in the Closure of set G.

Equivalent Sets

Two FD sets F and G are said to be equivalent sets, if both form cover of each other i.e F
 G+ and G  F+, which implies F+ = G+. This means that two sets F and G are
equivalent, if their closures are equal.

Extraneous FDs in a set

An FD f  F is said to be extraneous, if its exclusion from F does not affect the Closure
of F i.e. {F-f}+ = F+. Such FDs in a set are logically implied by other FDs in the set.

Example:-

Suppose we have an FD set F :{  ,    and    } then    is said to be


extraneous FD, since it can be inferred by applying Armstrong’s Transitivity Rule to the
other two FDs in the set. The extraneous FD can be eliminated from the set, without
affecting Closure of the set.

Extraneous attributes in the determinant of an FD

Suppose, we have an FD   , that holds on a schema R. An attribute A   is said to


be extraneous, if ( - A)   also holds on R.

Left-irreducible FDs

The left-side on an FD (i.e. its determinant) is said to be irreducible, if it does not contain
any extraneous attributes. Such FDs are known as left-irreducible FDs.

If R is a left-irreducible FD holding on the schema R, then  forms a candidate Key


of R.

Canonical Cover of an FD Set

An FD set Fc is said to be Minimal Cover or Canonical Cover of FD set F, iff F  Fc+ and
it satisfies the following three conditions:-

(a) Each FD in Fc is in a Canonical Form i.e. has only one attribute on its right
side.

(b) No FD in Fc has any extraneous attributes on its left side i.e. all the FDs in Fc
are left-irreducible.

P S Gill
8

(c) None of the FDs in Fc is extraneous i.e. no FD in Fc is logically implied by the


other FDs in the set Fc . This implies than no FD can be removed from the set
Fc without changing its closure.

Algorithm to determine Canonical Cover of an FD Set F

Fc = F;
Repeat
Save-Fc = Fc;
To each FD in Fc of the form ABC (where A, B and C are attributes of
Schema R), apply Decomposition Rule; and replace the FD by a set of
FDs {A, B, C};
For each FD ()  Fc and for each attribute A  
if {{Fc – {}}  { ( - A)  } }+ = Fc+
then replace FD  by FD ( - A)  in Fc ;

For each FD ()  Fc


if {Fc – { }}+ = Fc+
then eliminate FD  from Fc ;

Until (Fc = Save-Fc);

How does Canonical Cover of an FD set help to reduce the DBMS overheads?

Suppose F is the set of FDs holding on a schema R, then a relation r(R) would be legal
only if satisfies all the FDs in set F. Now, to determine whether a relation r(R) is legal or
not, DBMS has to check for the satisfaction of all the FDs in set F. On the other hand, if
we determine a minimal Cover Fc of F, then we have to check for the satisfaction of a
much smaller set of FDs, since, a relation r(R) that satisfies FD set Fc will also satisfy FD
set F, since both are equivalent sets. This will reduce DBMS overheads.

Can an FD Set have more than one Canonical Covers?

Yes, an FD set F can have more than one Canonical Covers, but all of those sets would be
equivalent to each other; and in turn equivalent to F.

Example:

Determine Canonical Cover of an FD Set {A  BCD, B CDA, C ABD}

Step 1: Covert the FDs to their canonical form i.e. by equivalent sets of FDs, having only
single attributes on their right side
Fc : { A  B, A C, A D, B  C, B D, B A, C  A, C B, C D}

P S Gill
9
Step2: Remove extraneous attributes from the left side of all FDs. Here, all FDs have
only one attribute on its left side, thus cannot contain any extraneous attribute.
Fc : { A  B, A C, A D, B  C, B D, B A, C  A, C B, C D}

Step3: Remove extraneous FDs from the set.


A  B is implied by A C, C B; so A  B can be eliminated from the set.
So, now Fc:{A C, A D, B  C, B D, B A, C  A, C B, C D}

A D is implied by A  C and C D; so A  C can now be eliminated.


So, now Fc:{A C, B  C, B D, B A, C  A, C B, C D}

B C is implied by B  A and A C; so B  C can now be eliminated


So, now Fc:{A C, B D, B A, C  A, C B, C D}

C D is implied by C  B and B D; so C  D can now be eliminated


So, now Fc:{A C, B D, B A, C  A, C B}

C A is implied by C  B and B A; so C  A can now be eliminated


So, now Fc:{A C, B D, B A, C B}

No more FDs can be eliminated; so {A C, B D, B A, C B}forms the


Canonical Cover of F.

In the beginning of Step3 above, had we eliminated A C, since it was implied by A


B and B C. Then, the Canonical Cover would have been different. So, the Canonical
Cover of an FD Set need not be unique.

Closure of an Attribute Set under F

Suppose  is a sub-set of Schema R i.e.   R. And suppose F is the set of FDs holding
on Schema R. Then, the Closure of Attribute Set , denoted by +, is the complete set of
attributes that can be determined by  under the FD set F.

Algorithm to determine Closure of an Attribute Set under a set of FDs

Let   schema R and F be the set of FDs holding on schema R.


The closure of  i.e. + under F can be determined as follows:-

+ =  ;
Repeat
Save-+ = + ;
For each FD (  )  F
if   + then + = +   ;
Until (Save-+ == + ) ;

P S Gill
10

The Concept of “Attribute Set Closure” can be used to determine the following:-

(a) Whether an Attribute Set  is a Super Key of Relation Schema R (where


  R)

Determine + under F.
If + equals R, then  is a Super-Key of R.

(b) Whether   holds on R(where   R and  R)

Determine + under F.
If   +, then    holds on R.

(c) To determine Closure of the FD Set F i.e F+

F+ := F;
For each FD    in F+
Begin
Determine + under F+;
For each   +
  holds; Thus include it in F+
i.e. F+ := F+  { };
End;

Loss-Less-Join Decomposition of a Relation Schema R


Decomposition of a relation r(R) into r 1(R1) and r2(R2) (such that R1  R2 = R) is said to
be a loss-less-join decomposition, if it satisfies r1 * r2 = r i.e. natural join of r1 and r2
should generate r, with no tuples eliminated and with no new tuples added. Such a
decomposition is also called Non-Additive Decomposition.

Example:-

Case I
Consider the following relation r on schema R (A,B,C) and its decomposition into r1
and r2 .
r(R)
A B C

A1 B1 C1
A2 B2 C1
A1 B1 C2
A3 B2 C3

P S Gill
11
A1 B1 C3
A2 B2 C4

r1(R1)
A B
A1 B1
A2 B2
A3 B2

r2(R2)
A C

A1 C1
A2 C1
A1 C2
A3 C3
A1 C3
A2 C4

r1 * r 2
A B C

A1 B1 C1
A1 B1 C2
A1 B1 C3
A2 B2 C1
A2 B2 C4
A3 B2 C3

It is a loss-less-join-decomposition, since r1 * r2 = r

Case II
Now consider the following decomposition of r into r1 and r2 .

r(R)
A B C

A1 B1 C1
A2 B2 C1
A1 B2 C2
A3 B2 C3
A1 B1 C3
A2 B1 C4

P S Gill
12

r1 (R1)
A B
A1 B1
A2 B2
A1 B2
A3 B2
A2 B1

r2 (R2)
A C

A1 C1
A2 C1
A1 C2
A3 C3
A1 C3
A2 C4

r1 * r 2
A B C

A1 B1 C1
A1 B1 C2
A1 B1 C3
A2 B2 C1
A2 B2 C4
A1 B2 C1
A1 B2 C2
A1 B2 C3
A3 B2 C3
A2 B1 C1
A2 B1 C4

It is NOT a loss-less-join-decomposition, since r1 * r2  r


Here r1 * r2 contains five additional tuples (shown in italics and underlined), which
are not found in r.

Necessary Condition for a Decomposition to be loss-less –join decomposition

A decomposition of Relation Schema R into R1 and R2 (such that R1 R2 =R) will be a
Loss-Less-Join (Non-Additive) Decomposition, if the common attributes of R1 and R2
(i.e. R1 R2) form candidate key of either R1 or R2 or both.
i.e. R1  R2  R1

P S Gill
13
OR
R1  R2  R2

In the above example, in Case I, FD A B holds on the Schema R which can be


ascertained from the data represented in relation r of Case I. Thus, the common attribute
of r1 and r2 i.e. {A} forms a Primary key of both R1 and R2. That is why the
decomposition is a loss-less-join decomposition; whereas in Case II, such a condition
does not hold; and that is why the decomposition is a Lossy (Additive) decomposition.

Heath’s Theorem If a relation schema R(,,) has an FD  holding on it,


then it can have a loss-less-join decomposition:-

R1 (,) and R2 (,).

Since R1  R2 = {}  R1

N-ary Loss-less-join Decomposition

An N-ary decomposition of R into R1,R2 , R3 ,……,Rn (such that R1 R2  R3 ……Rn


= R) is said to be Loss-less-join (Non-Additive) Decomposition, if each of the
decompositions Ri satisfies the following:-

Ri  Rk  Ri
Ri  Rk  Rk

where Rk is the union of all decompositions R1,R2 , R3 ,……,Rn, except Ri

Restriction of FDs to a Decomposition

Let F be the set FDs holding on a relation schema R and Ri  R is a decomposition of R.


Then the restriction of F to decomposition Ri (denoted by Fi) is defined as:-
Fi =   ()  F+ ,   Ri ,   Ri}
i.e. Fi is the set of FDs that belong to F+ AND     Ri.

Dependency-Preserving Decomposition

Let F be the set of FDs holding on a schema R, having a decomposition (R1, R2,…, Rn)
such that R1 R2  R3 ……Rn = R.
Let {F1, F2, ……, Fn} be the restrictions of F to R1, R2, ……, Rn respectively.
Let F’ = F1 F2  ……U Fn
The decomposition (R1, R2,…, Rn) is said to be “Dependency-Preserving” if F’+ = F+ i.e
each FD of F+ must be preserved in at least one of the decompositions; else the
decomposition is called non-dependency-preserving.

Example:-
Suppose a relation schema R (A, B, C) has FD set F holding on it:-

P S Gill
14
F: {AB, B C}

And let us consider the decomposition R1(B,C ) and R2(A,B).


Restriction of F to R1 = F1 = { B C}
Restriction of F to R2 = F2 = { A B}
F’ = F1  F2 = { AB , B C}
Obviously, F’+ = F+; thus it is a Dependency Preserving Decomposition.
Also R1  R2 = {B} is the primary key of R2, thus making it a loss-less-join
decomposition.
Thus, this decomposition is a “Dependency-Preserving” and “Loss-less-join”
Decomposition.

Now consider the Decomposition R1 (A,C ) and R2(A,B).:-


Here, R1  R2 = {A} forms a primary key of both R1 and R2, making it a loss-less-join
decomposition.
Restriction of F to R1 = F1 = {A C}
Restriction of F to R2 = F2 = {A B}
F’ = F1  F2 = { AB, A C }
Obviously, F’+  F+; since one of the FDs (i.e. B C) is lost in the decomposition, which
is not logically implied by other FDs in the set F’. Thus, though this decomposition is a
loss-less-join decomposition but not a dependency-preserving decomposition.

Why is it desirable that a decomposition should be dependency-


preserving:-

For any decomposition, it is mandatory that it should be a loss-less-join decomposition


to ensure consistency of database; and also desirable (not mandatory) that it should be
dependency-preserving, for the following reason:-

If each FD in F+ is preserved in at least one of the decompositions, then the satisfaction of


each FD can be verified in a single relation itself; else it would require natural-join of
more than one relations to verify some of the FDs, which are not preserved in the
decomposition. A natural join operation to ascertain compliance of FDs would be too
costly in terms of execution time and memory requirements. But, sometimes while
normalizing a schema, it may not be possible to preserve all the FDs. But, ensuring that a
decomposition is a loss-less-join decomposition is mandatory, since no compromise is
possible on consistency of a database.

So, dependency-preservation is a desirable criteria, NOT A MANDATORY ONE.


Whereas, ensuring Loss-Less-Join (Non-Additive Join) Decomposition is a mandatory
criteria.

Algorithm to determine whether a Decomposition *( R1, R2, R3, -----,Rn) of R is a


Dependency-Preserving Decomposition or not.

P S Gill
15
Let F be the set of FDs holding on R.

Compute Minimal Cover Fc of F.


For Each FD  in Fc
If   Ri (For 1< i < n )
Then it is Dependency-Preserving Decomposition
Else it is Non-Dependency-Preserving Decomposition.

P S Gill
16

NORMALIZATION

First Normal Form (1 NF) A Schema R is said to be in first normal form (1 NF) if all
its attributes have only atomic domains i.e. domains of all attributes have only indivisible
values. Alternatively, it can be stated that for a Relation Schema R to be in 1 NF, all its
attributes should be “simple” and “single-valued”. Each field in each tuple of that relation
must have only one value from the respective domain or a “NULL” value.

Let there be a relation schema EMP (E#, E_Name, Salary, Tel_No)


An employee in EMP may have none/one/more-than-one Tel_No.

Then a Table (Relation) may be represented as :-

UN-NORMALIZED TABLE

E# E_Name Salary Tel_No

001 Ajay 200000 {9810222777,


2449227,
9422230230}
002 Vijay 50000 {NULL}

003 Ram 100000 {9810345567}

The Field Tel_No in a tuple has a set of values. Such a Table is said to be Un-normalized
and it is not in First Normal Form.

The above table can be transformed to First Normal Form as follows:-

NORMALIZED TABLE

E# E_Name Salary Tel_No

001 Ajay 200000 9810222777


001 Ajay 200000 2449227
001 Ajay 200000 9422230230
002 Vijay 50000 NULL

003 Ram 100000 9810345567

P S Gill
17
A tuple in the Un-Normalized Table is replaced by as many tuples in the Normalized
Table as the number of telephones owned by the respective employee. Such a table is
called Normalized Table or Flat Table. It is in First Normal Form and has a lot of data
redundancy, which will be eliminated by further normalization of the Table.

Full Functional Dependency Let there be a Relational Schema R with a Candidate Key
K and a non-prime attribute A (K  R, AR). The Functional Dependency K  A is said
to be a “Full Functional Dependency”, if attribute A cannot be determined by any proper
subset of K i.e. there does not exists any K1  K for which K1 A holds.

Partial Functional Dependency Let there be a Relational Schema R with a Candidate


Key K and a non-prime attribute A (K  R, AR). The Functional Dependency K  A
is said to be a “Partial Functional Dependency”, if attribute A can be determined by a
proper subset of K i.e. there exists K1  K for which K1 A holds.

Let R1 (A, B, C, D, E) be a Relation Schema with all its attributes (A..E) having only
atomic domains; and let F: {AB  C, A D, D E} be the set of FDs holding on it.

By the Armstrong’s Rule of Transitivity {A D, D E} implies A E.


So, A E also holds on R.

Since, all attributes of R1 have only atomic domains, it is in 1 NF.

Since {A, B}+ = ABCDE, {A,B} forms the a candidate key of this schema R; and this is
the only candidate key of R.
So, A and B are the prime attributes of R and all other attributes i.e. C, D and E are non-
prime attributes.

The Non-prime attribute C is determined only by the full candidate key, since AB C.
holds on R. So, C is said to be fully functionally dependent on the candidate key and the
FD AB C is called a Full FD or Complete FD of R.

However, the non-prime attributes D and E are determined by A alone, which is a proper
subset of the candidate key {A, B}. Such a dependency is called partial functional
dependency; and such FD causes certain Insert/Delete/Update anomalies, as
demonstrated in the following example:-

Example:-

Consider a Schema SP1 (S#, P#, Sname, Scity, Status, Pname, Qty)

Where S# :Supplier Id number (Unique)


P# :Part Id Number (Unique)
Sname :Supplier Name

P S Gill
18
Scity :Supplier City
Status :Supplier Status, which depends on Scity
Pname: Part Name
Qty :Quantity of a Part (P#) to be supplied by a Supplier (S#)

Suppose, the following set of FDs holds on the Schema SP 1:-


S#  Sname, Scity
Scity  Status,
P#  Pname,
{S#,P#}  Qty

An instance of a relation, defined on Schema SP1:-

S# P# Sname Scity Status Pname Qty


S1 P1 Avia Mumbai 10 Aero-engine 5
S1 P2 Avia Mumbai 10 Generator 5
S2 P1 Aero Delhi 20 Aero-engine 2
S2 P3 Aero Delhi 20 Altimeter 5
S3 P2 Air-supp Mumbai 10 Generator 10
S3 P3 Air-supp Mumbai 10 Altimeter 20

This relation, being in 1 NF, has the following Insert/Delete/Update-anomalies:-

(a) Insertion Anomalies:-

(i) Information about a supplier, like Sname, Scity can be inserted


only when the supplier is supplying at least one part.

(ii) Information about a Part like its Pname can be inserted only when
the part is being supplied by at least one supplier.

(iii) Information about a City like its Status can be inserted only when
there is at least one supplier from that city and the supplier is supplying at
least one part.

(b) Deletion Anomalies:-

(i) If a supplier is supplying only one part, and that supply is


concluded. Then the tuple relating to that supply will be deleted. With the
deletion of that tuple, we would lose complete information about the
supplier i.e. its Name and City.

(ii) Suppose, a part is being supplied by only one supplier. On completion


of that supply, when the related tuple is deleted, we would lose the
information about the Name of that part.

P S Gill
19
(iii) Suppose, a city has only one supplier and that supplier is supplying
only one part. On deletion of the tuple of that particular supply, we would
lose the information about the Status of that City.

(c ) Update Anomalies There is a lot of unwanted data redundancy like:-

(i) Information about the Sname and Scity of a particular supplier will
be appearing as many times as the number of parts being supplied
by that supplier.

(ii) Information about the name of a particular part will be appearing


as many times as the number of supplies relating to that part.

(iii) Information about the Status of a City will be appearing as many


times as the number of supplies from the suppliers of that City.

The unwanted data redundancy will need the redundant information to be


updated at multiple places. Like when a Supplier moves from one city to
other city, this has to be changed in multiple tuples. An inconsistent update
or a partial update will cause database inconsistency.

Second Normal Form (2 NF)

A Relation Schema R is said to be in Second Normal Form (2 NF), iff:-

(a) It is in 1 NF and

(b) Each non-prime attribute of R is fully functionally determined by the


candidate keys of R, that is, R does not involve any partial functional
dependencies.

The above relation schema R1 is not in 2 NF, since it involves a partial functional
dependency A DE.

Decomposing a 1 NF Schema into 2 NF Schemas

Using Heath’s Theorem, R1 can be loss-less decomposed into R21 and R22.

R21 (A, D, E) Primary Key {A}


A D, D E
By transitivity A E also holds on R21.

R22 (A, B, C) Primary Key {A,B}


Foreign Key {A} references R21
AB C

P S Gill
20

The decomposition of R1 into R21 and R22 is Loss-less Join Decomposition since:-

R21  R22  R21

Since, R21 and R22 involve no partial functional dependencies, both are in 2 NF.

Now, Let us consider Schema SP1

{S#,P#}is a candidate key of SP1 and this is the only candidate key of SP1
So, S# and P# are prime attributes and all other attributes are non-prime.
Since S#  Scity and Scity  Status; so S#  Status also holds.

SP1 involves the following partial FDs:-


S#  Sname, Scity, Status
AND
P#  Pname

Since, SP1 has partial FDs, so it is not in 2 NF.

SP1 can be decomposed into 2 NF Schemas S, P and SP 2 in two steps as shown


below:-

Step 1 Decompose SP1 into S and SP11,


on the basis of partial FD S#  Sname, Scity, Status

S (S#, Sname, Scity, Status) Primary Key {S#}


S# Sname, S# Scity, Scity Status
By Transitivity, S# Status

SP11( S#, P#, Pname, Qty) Primary Key {S#, P#}


P# Pname
{S#, P#}  Qty

The above decomposition is a loss-less-join decomposition, since:-

S  SP11 = {S#}  S since S#  Sname, Scity, Status

The Schema S has no partial FD and is so in 2NF.


However, SP11 has a partial FD P# Pname and it is still not in 2NF.

Step 2 Decompose SP11 into P and SP2,


on the basis of partial FD P# Pname

P (P#, Pname) Primary Key {P#}


P# Pname

P S Gill
21

SP2 (S#, P#, Qty) Primary Key {S#, P#}


{S#,P#}  Qty Foreign Key {S#} references S
Foreign Key {P#} references P
The above decomposition is a loss-less-join decomposition, since:-
P  SP2 = {P#}  P Since P#  Pname
Both P and SP11 have no partial FDs and are in 2NF.
Also, SP11 has no partial FD P# and it is still not in 2NF.

So, the 2NF decomposition of SP1 is:-


S (S#, Sname, Scity, Status) Primary Key {S#}
S# Sname, S# Scity, Scity Status
By Transitivity, S# Status
P (P#, Pname) Primary Key {P#}
P# Pname

SP2 (S#, P#, Qty) Primary Key {S#, P#}


{S#,P#}  Qty Foreign Key {S#} references S
Foreign Key {P#} references P

The projections of SP1 over S, P and SP2 are:-


S
S# Sname Scity Status
S1 Avia Mumbai 10
S2 Aero Delhi 20
S3 Air-supp Mumbai 10

P
P# Pname
P1 Aero-engine
P2 Generator
P3 Altimeter

SP2
S# P# Qty
S1 P1 5
S1 P2 5
S2 P1 2
S2 P3 5
S3 P2 10
S3 P3 20

The partial dependency related problems have been resolved, as explained


below:-

P S Gill
22

Insert:
Sname and Scity of a Supplier can now be inserted into table S, even when it is
not supplying even one part.

Pname of a part can now be inserted into table P, even when it is not being
supplied by any supplier.

Delete:
When information about a supply is deleted from table SP 2, we do not lose any
information about Sname, Scity or Pname.

Update

Information about Sname & Scity of a supplier now appears only in one tuple in
table S.

Information about Pname of a particular part now appears only in one tuple in
table P.

However, some anomalies still remain in 2 NF Schema. For example, the


Status of city can be inserted in table S, only when at least one supplier is
available in that city. Also, if the only supplier from a city ceases to exist, we
lose the information about the status of that city. And if multiple suppliers
exist in city, it status would appear in multiple tuples of table S. These
anomalies will be eliminated in the next normal form.

In R21, the non-prime attribute E is dependent on the candidate key A, through


another non-prime attribute D
i.e. A D, D E  A E.

Such a Functional Dependency is called a Transitive Dependency.

Transitive Functional Dependency This refers to a situation wherein a non-


prime attribute of a Relation Schema R is dependent on its candidate key, through
another non-prime attribute of R. Such a FD also causes some
Insert/Delete/Update anomalies.

Third Normal Form (3NF)

A Relation Schema R is said to be in 3 NF, iff:-

(a) It is in 2 NF and

(b) Each non-prime attribute of R is non-transitively dependent on its


candidate keys i.e. R does not involve any Transitive Dependencies.

P S Gill
23

Thus, R22 is in 3 NF but R21 is not, since it involves a Transitive Dependency i.e
A D, D E  A E.

Decomposition of a 2 NF Schema into 3 NF Schemas

R21 can be loss-less decomposed into R21 and R22.

R31 (D, E) Primary Key {D}


D E

R32 (A, D) Primary Key {A}


A D Foreign Key {D} references R31.

The decomposition of R21 into R31 and R32 is Loss-less Join Decomposition
since:-
R31  R32  R31

R31 and R32 do not involve any Transitive Dependency; and are thus in 3 NF.

3 NF decomposition of R1:-

R31 (D, E) Primary Key {D}


D E

R31 (A, D) Primary Key {A}


A D Foreign Key {D} references R31.

R22 (A, B, C) Primary Key {A, B}


Foreign Key {A} references R31
AB C

Let us now consider decompositions of SP1

The Schemas P and SP2 do not involve any Transitive Dependencies; and are thus
already in 3NF. But, S has a Transitive Dependency i.e. S# Scity, Scity
Status  S# Status

This Transitive Dependency of Schema S can be eliminated by decomposing it on


the basis of FD Scity Status

STS (Scity, Status) Primary Key {Scity}


Scity Status

SUPP (S#, Sname, Scity) Primary Key {S#}


Foreign Key {Scity} references STS

P S Gill
24
S# Sname, Scity

Now, STS and SUPP do not have any Transitive Dependency; so both are in 3 NF.

The decomposition is loss-less, since:-

STS  SUPP = {Scity}  STS, since Scity Status

Thus, the 3NF Decomposition of SP1 is:-

STS (Scity, Status) Primary Key {Scity}


Scity Status

SUPP (S#, Sname, Scity) Primary Key {S#}


Foreign Key {Scity} references STS
S# Sname, Scity

P (P#, Pname) Primary Key {P#}


P# Pname

SP2 (S#, P#, Qty) Primary Key {S#, P#}


{S#,P#}  Qty Foreign Key {S#} references SUPP
Foreign Key {P#} references P

The projections of S over STS and SUPP are:-

SUPP

S# Sname Scity
S1 Avia Mumbai
S2 Aero Delhi
S3 Air-supp Mumbai

STS

Scity Status
Mumbai 10
Delhi 20

Now, all update anomalies have been resolved. Status of a particular city now
appears in only one tuple in table STS. Also, information about status of a city can
now be inserted irrespective of whether any supplier exists in that city or not.

P S Gill
25

BOYCE CODD NORMAL FORM (BCNF)

An alternate definition of a Relation Schema R to be in 3NF is as follows:-

3 NF A Relation Schema R is said to be in 3 NF, if each FD α→β holding on R satisfies


one of the following three conditions:-
(a) It is a trivial FD
OR (b) α is a Super Key of R
OR (c) Each attribute in the set (β – α ) is a prime attribute.

A Relation Schema in Third Normal Form (3 NF) may still be riddled with some
anomalies, under the situations when a schema has multiple candidate keys, which may
be composite and overlapping. Any relation, under such schema, may have some data
redundancies that would cause some update anomalies. For example, the schema:-

SP (S#, Sname, P#, Qty)


with FDs: {S#,P#}  Qty
{Sname, P#} Qty
S#  Sname
Sname S#
holding on it.

The relation schema SP has two candidate keys i.e {S#, P#} and {Sname, P#}. Both the
candidate keys are composite and have one common attribute i.e. P#.

Set of Prime Attributes: {S#, Sname , P# }


Set of Non-Prime Attributes: { Qty}

The only non-key attribute i.e. Qty is non-transitively and fully dependent on both the
candidate keys. Thus, the schema SP is free of any partial dependencies or transitive
dependencies and is thus in Third Normal Form (3NF).

This can also verified from the fact that each FD satisfies one of the necessary conditions
for SP to be in 3 NF.

Despite being in 3NF, any legal relation under the schema will have some data
redundancies, for example, the name of a particular supplier i.e. Sname will be repeated
as many times as the number of supplies being made by that supplier.

Thus, there is need to have a normal form, stronger than 3NF. The necessary solution is
provided by Boyce Codd Normal Form (BCNF).

P S Gill
26

BCNF A Relation Schema R is said to be in BCNF, if all non-trivial left-


irreducible FDs that hold on R have only candidate keys as determinants. Alternately, we
can state that a Relation Schema R would be in BCNF if each FD α→β holding on R
satisfies one of the following three conditions:-
(a) It is a trivial FD
OR (b) α is a Super Key of R

The above two conditions, for BCNF, are same as the first two conditions of 3 NF. Thus,
if a schema is in BCNF, it must also be in 3NF. However, the third condition of 3 NF is
missing against the BCNF criteria, indicating that BCNF is more restrictive as compared
to 3NF. So, it is possible that a schema may be in 3NF but not in BCNF. Thus, BCNF is a
stronger normal form than 3NF.

Going by the definition of BCNF, SP is in 3 NF but not in BCNF since the two FDs i.e
S#  Sname, Sname S#, are neither trivial and nor have Super Keys on their Left Side.

Alternating it can be stated that a Relational Schema R will be in BCNF if each non-
trivial left-irreducible FD α→β, holding on R, has only Candidate Key on its left side
i.e. α must be a Candidate Key of R.

Decomposition of SP into a BCNF schema

SP can be decomposed into BCNF Schemas, on the basis of the FDs that violate BCNF
i.e. S#  Sname and Sname S#. The resulting BCNF decompositions of SP will be:-

S (S#, Sname) Primary Key {S#} or {Sname}


S#  Sname, Sname S#

SP1 (S#, P#, Qty) Primary Key {S#, P#}


Foreign Key {S#} references S
{S#, P#}  Qty

OR

S (S#, Sname) Primary Key {S#} OR {Sname}


S#  Sname, Sname S#

SP2 (Sname, P#, Qty) Primary Key {S#, P#}


Foreign Key {Sname} references S
{S#, P#}  Qty

P S Gill
27
It can be verified that the decompositions are loss-less-join decompositions.

Algorithm to decompose a Non-3NF Schema into a set of 3NF Schemas

Let R be a relation schema that is not in 3NF.


Let F be the set of FDs holding on R.

Determine Canonical Cover Fc of F.


Determine the set of Candidate Keys {K1, K2,-------, Kn} of R.
K’ = R -{K1  K2 ------ Kn}; /*Set of Non-Prime Attributes of R*/
S = {R}; {Where S is a set of Relation Schemas}

WHILE (There exists a Non-3NF Schema Ri  S) DO


FOR (Each Non-Trivial Left-Irreducible FD  holding on Ri) DO
IF (( is not a candidate key of Ri) AND ({-}  K’  0 ))
THEN S = {S - Ri}  (Ri - )  (, ) ;
/* Replace the Schema Ri by two schemas (Ri-) and (, ) */

At the end, S would comprise of a set of 3NF Schemas, equivalent to R.

Algorithm to decompose a non-BCNF Schema into a set of BCNF Schemas

Let R be a relation schema that is not in BCNF.


Let F be the set of FDs holding on R.

Determine Canonical Cover Fc of F.


S = {R}; {Where S is a set of Relation Schemas}

WHILE (There exists a Non-BCNF Schema Ri  S) DO


FOR (Each Non-Trivial Left-Irreducible FD  holding on Ri) DO
IF ( is not a candidate key of Ri)
THEN S = {S - Ri}  (Ri - )  (, ) ;
/* Replace the Schema Ri by two schemas (Ri-) and (, ) */

At the end, S would comprise of a set of BCNF Schemas, equivalent to R.

Example:- Decompose the following Schema into a set of BCNF Schemas.

SP ( S#, Sname, P#, Pname, Scity, Status, Qty )

The set of FDs holding on SP:-

P S Gill
28

S#  Sname, Scity
Sname S#, Scity
Scity  Status
P#  Pname
{S#,P#}  Qty
{Sname, P#} Qty

The candidate keys of SP are :- {S#,P#}, {Sname, P#}

It can be verified that all the FDs indicated above are non-trivial and left-irreducible; and
the following FDs do not have Candidate Key on left side:-

S#  Sname, Scity
Sname S#, Scity
Scity  Status
P#  Pname
Thus SP is not in BCNF.

Now, applying the above algorithm to convert SP into a BCNF Schema:-

Let S’ := { SP };

SP has an FD Scity  Status | ((Scity is not a candidate key of SP)


&& (Scity  Status = ))
S’ := (S’ – SP)  SP1  STS
where SP1 = ( S#, Sname, P#, Pname, Scity, Qty )
and STS = (Scity, Status)

Now, S’ = {SP1 , STS}

Again, SP1 has an FD P#  Pname | ((P# is not a candidate key of SP1 )


&& (P#  Pname = ))

S’ := (S’ – SP1 )  SP2  P

where SP2 = ( S#, Sname, P#, Scity, Qty )


and P = (P#, Pname)

Now, S’ = {SP2 , STS, P}

Still, SP2 has an FD S#  {Sname, Scity} | ((S# is not candidate key of SP 2 )


&& (S#  {Sname, Scity} = ))

S’ := (S’ – SP2 )  SP3  SUPP

P S Gill
29
where SP3 = ( S#, P#, Qty )
and SUPP = (S#, Sname, Scity)

Now, S’ = {SP3 , STS, P, SUPP}

Finally, all the schemas in S’ are now in BCNF.

Now, the BCNF equivalent schema of SP is:-

STS (Scity, Status) Primary Key {Scity}


Scity  Status

P (P#, Pname) Primary Key {P#}


P#  Pname

SUPP (S#, Sname, Scity) Primary Key {S#}


Foreign Key {Scity} references STS

S#  {Sname, Scity}
SP3 (S#, P#, Qty) Primary Key {S#, P#}
Foreign Key {S#} references SUPP
Foreign Key {P#} references P
{S#, P#}  Qty

All the relation schemas in the above decomposition are free of any partial dependencies
and transitive dependencies. Also, all the FDs have only candidate keys (of the respective
schemas) as their determinants. Thus, all the relation schemas are in BCNF.

Also, the decomposition is loss-less join decomposition, since:-

SP3  P  P
SP3  SUPP  SUPP
SUPP  STS  STS

Is BCNF a stronger normal form than 3NF ?

Yes, a relation in 3NF may not be in BCNF. But, a relation in BCNF will definitely be in
3NF also; since BCNF is more restrictive as compared to 3 NF. Like in the above
example, SP is in 3NF but not in BCNF. However, its decompositions SP 3, P, STS and
SUPP are all in BCNF; and are also in 3NF. Thus, BCNF is a stronger normal form than
3NF.

In fact, we can state that a relation schema in BCNF will be free of all those data
anomalies that can be eliminated on the basis of functional dependencies (FDs).

P S Gill
30
ABU’s Algorithm to determine whether a given Decomposition of a Relational R is a
Loss-less-join Decomposition or not.

ABU’s Algorithm can be used to determine whether a Decomposition *(R 1, R2,….., Rn)
of Schema R is a loss-less-join decomposition or not.

Let the Relational Schema R be of degree m i.e. R (A1, A2, …….Am)


Let F be the set of FDs holding on the Schema R.

The ABU’s Algorithm operates as follows:-

Step 1 Make a matrix M of size nXm with column “j” corresponding to Attribute Aj (1 <
j < m) and row “i” corresponding to a projection Ri (1 < i < n).

Step 2 Initialize the Matrix M as follows:-

for i := 1 to n do
for j:=1 to m do
if Aj  Ri
then M [i, j] := aj;
else M [i,j] := bij ;

Step 3
Repeat
Save-M = M;
For each FD ()  F
if any two Rows of Matrix M match on the values of 
then force those two rows to match on the values of , by
replacing “b” values by corresponding “a” values.
(if corresponding “a” value does not exist for a pair of cells
to be matched, then replace both the cells by one of the
corresponding “b” values.)
Until (M = Save-M);

Step 4 Test of Loss-less-join Decomposition:

if (any of the rows of M contains only “a” values)


then the decomposition is a loss-less-join decomposition
else it is a lossy decomposition.

Example:-

Using ABU’s Algorithm, determine whether the following decomposition of


SP (S#, Sname, Scity, Status, P#, Pname, Price, Qty) is a loss-less-join decomposition?

Decomposition:-

P S Gill
31
CS (Scity, Status)
SUPP(S#, Sname, Scity)
PART (P#, Pname, Price)
SPN (S#, P#, Qty)

FDs Holding on SP:-

S#  Sname, Scity
Scity  Status
P#  Pname, Price
{S#, P#}  Qty

Since, S#  Scity and Scity  Status, thus S#  Status


Therefore, S#  Sname, Scity, Status
ABU’s Algorithm
Step 1 and Step 2: Make a matrix of size 4 x 8 and initialize it.

S# Sname Scity Status P# Pname Price Qty


0 1 2 3 4 5 6 7
CS 0 b00 b01 a2 a3 b04 b05 b06 b07
SUPP 1 a0 a1 a2 b13 b14 b15 b16 b17
PART 2 b20 b21 b22 b23 a4 a5 a6 b17
SPN 3 a0 b31 b32 b33 a4 b35 b36 a7

Step 3:

Applying the FD Scity  Status, rows 0 and 1 match on the value of Scity, so force these
two rows to match on the value of Status. Thus replace b13 in row 1 by a3.

S# Sname Scity Status P# Pname Price Qty


0 1 2 3 4 5 6 7
CS 0 b00 b01 a2 a3 b04 b05 b06 b07
SUPP 1 a0 a1 a2 a3 b14 b15 b16 b17
PART 2 b20 b21 b22 b23 a4 a5 a6 b17
SPN 3 a0 b31 b32 b33 a4 b35 b36 a7

Now, applying the FD P#  Pname, Price , rows 2 and 3 match on the value of P#, so
force these two rows to match on the value of Pname and Price. Thus replace b35 in row 3
by a5 and replace b36 in row 3 by a6

S# Sname Scity Status P# Pname Price Qty


0 1 2 3 4 5 6 7
CS 0 b00 b01 a2 a3 b04 b05 b06 b07
SUPP 1 a0 a1 a2 a3 b14 b15 b16 b17
PART 2 b20 b21 b22 b23 a4 a5 a6 b17

P S Gill
32
SPN 3 a0 b31 b32 b33 a4 a5 a6 a7

Now, applying the FD S#  Sname, Scity, Staus , rows 1 and 3 match on the value of
S#, so force these two rows to match on the value of Sname, Scity and Status. Thus
replace b31 in row 3 by a1; replace b32 in row 3 by a2; and replace b33 in row 3 by a3

S# Sname Scity Status P# Pname Price Qty


0 1 2 3 4 5 6 7
CS 0 b00 b01 a2 a3 b04 b05 b06 b07
SUPP 1 a0 a1 a2 a3 b14 b15 b16 b17
PART 2 b20 b21 b22 b23 a4 a5 a6 b17
SPN 3 a0 a1 a2 a3 a4 a5 a6 a7

Step 4 The row 3 contains only “a” values; therefore the above decomposition is a loss-
less-join decomposition of SP.

P S Gill
33

Multi-Valued Dependencies (MVDs and 4 NF)


Definition of MVD

A relation schema R (,,) is said to have multi-valued dependencies    ( multi-


determines ) and    ( multi-determines ), if and only if for every legal relation
r(R) and for each tuple-pair {t1, t2}  r that satisfies t 1[] = t2[], t1 []  t2 [] and t1 []
 t2 [], there exists a tuple-pair {t3, t4}  r that satisfies t 3[] = t4[]= t1[] = t2[] and
t3[] = t1[] & t4[] = t2[] and t3[] = t2[] & t4[] = t1[]. Non-Trivial MVDs always
occur in pairs, like  and   ; both can be jointly denoted as     .

Trivial MVD

An MVD    holding on a schema R is said to be trivial, iff:-


(a) 
or
(b) =R

Non-Trivial MVDs occur only in pairs.

MVDs and a Loss-less Join Decomposition

Fagin’s Theorem

If a relation schema R (, , ) has a MVD      holding on it, then it can be loss-
less decomposed into R1 ( ,) and R2 (,).

Inference Rules for MVDs

1. Complementation Rule If    holds on a Schema R where (R,


R) then   R- (  ) will also hold on R.

This rule implies that all non-trivial MVDs will occur only in pairs.

2. Transitivity Rule If    and    then   (-)

3. Union Rule If   ,   , then     ,


  - ,
  – ,
    

P S Gill
34
4. Augmentation Rule If   , then   where 

5. Replication Rule If   , then   .

6. Coalescence Rule If   
and    Where    and   = 0
Then   .

7. Mixed Transitivity Rule

If   ,   , then   (-)

Problem Let R = (A, B, C, G, H, I) be a relation schema, with the


following set of dependencies holding on it:-

D = { A  B, BHI, CGH}

Find whether the following dependencies are members of D + :-


A  CGHI
A  HI
BH
A  CG
AH
Sol:

Since A  B, so by complementation A  (R – A – B)
 CGHI

Since A  B and B  HI so by transitivity A  HI – B


 HI

Since B  HI
CG  H where H  HI and HI  CG = 0
Therefore, by Coalescence Rule, B  H

Since, A  CGHI and A  HI therefore by Union Rule A  CGHI- HI


 CG
Since A  HI
CG  H where H  HI and HI  CG = 0
Therefore, by Coalescence Rule, A  H

Problem Consider a Relational Schema R (A, B, C, D, E). Let the set of MVDs
holding on R be { A  BC, B  CD and E  AD}. Determine its loss-less 4NF
decomposition.

P S Gill
35

Solution
R (A, B, C, D, E)
M = {A  BC, B  CD, E  AD }

Since, MVD A  BC holds on R, it will also satisfy A  (R-BC) – A


i.e. A  DE
Thus, by Fagin’s Theorem, R can be loss-less decomposed into:-
R1 (A, B , C)
R2 (A, D, E).

Similarly, since MVDs B  CD and E  AD hold on R, it can be proved that


the following are also loss-less & 4 NF decompositions of R :-
R1 (B, C, D)
R2 (B, A, E)

OR
R1 (E, A, D)
R2 (E, B, C)

P S Gill
36

BCNF to 5 NF

A Relation Schema is said to be in Boyce Codd Normal Form (BCNF), if all non-trivial
left-irreducible Functional Dependencies (FDs), holding on the schema, have only its
Candidate Keys as their Determinants. Any relation defined on such a schema will be
free of all those data anomalies that can be eliminated on the basis of FDs. However,
there may still be some residual data redundancies persisting in BCNF relations, causing
insert/delete/update anomalies. So, we have to look beyond FDs, for the elimination of
such anomalies in BCNF Schemas.

A SCHEMA IN BCNF AND STILL HAVING SOME DATA REDUNDANCIES,


CAUSING ANOMALIES:-

Let us define a Schema CTX (Course, Teacher, Text) with the following
constraints:-

(a) A Course can be taught by more than one Teachers.

(b) A number of Text Books can be followed for teaching a Course.

(c) The set of Text Books followed for teaching a Course is determined only
by the Course taught and is completely independent of the Teacher
teaching it. Thus, the attributes Teacher and Text are completely
independent of each other.

There exists a one-to-many cardinality from Course to Teacher and also from Course to
Text, but there is absolutely no relationship between Teacher & Text. This situation
represents a MVD Course  TeacherText.

A Relation ctx, satisfying the above constraints:-

ctx:
COURSE TEACHER TEXT
OS Ravi Galvin
OS Vivek Dietel
OS Ravi Dietel
OS Vivek Galvin
CO Ram Hamacher
CO Shyam M-mano

P S Gill
37
CO Ram M-mano
CO Shyam Hamacher

As indicated in ctx, the schema CTX does not have any non-trivial FDs. Thus, it is an
“All-Key” schema and all legal relations under this schema will be in BCNF. But, this
relation still has the following data anomalies:-

(a) The information that a particular Teacher is teaching a particular course is


represented as many times as the number of Text books followed for that
particular Course.

(b) The information that a particular Text Book is followed for a particular
Course is represented as many times as the number of Teachers teaching the
particular Course.

How to eliminate these data anomalies?

These anomalies are due to the non-trivial Multi Valued Dependencies (MVDs)
holding on the schema CTX.

Multi Valued Dependency (MVD)

Let there be a relation schema R (,,). It is said to have MVD from  to  (denoted as
) and from  to  (denoted as  ), if and only if for every legal relation r(R)
it satisfies the following:-

(a) The set of -values, matching a given {-value, -value} pair, are
dependent only on the -value and are completely independent of the -value.
And

(b) The set of -values, matching a given {-value, -value} pair, are
dependent only on -value and are independent of -value.

Alternately, we can state that a relation schema R (,,) is said to have multi-valued
dependencies  and  (both denoted by  ), if and only if for every
legal relation r(R), and for a tuple pair {t 1 , t2 }  r  t1[] = t2[], there exists a tuple-pair
{t3 , t4 } r, which satisfy:-

t1[] = t2[]=t3[] = t4[]


and t1[] = t3[], t2[] = t4[],
and t1[] = t4[], t2[] = t3[],

As per this definition, the relation ctx satisfies the MVD Course  TeacherText

P S Gill
38
Since, {OS, Galvin}  {Ravi, Vivek} and {OS, Dietel}  {Ravi, Vivek}
So, the set of Teachers, teaching a particular Course, is dependent only on the Course
taught and is completely independent of the Texts followed for the particular Course.

Similarly, {OS, Ravi}  {Galvin, Dietel} and {OS, Vivek}  (Galvin, Dietel}
So, the set of Texts, followed for a Course, depends only on the Course taught and is
independent of the Teacher teaching it.

Trivial MVD An MVD , holding on a relation schema R, is said to be trivial, if:-

(a)  or

(b) =R

Such MVDs are termed to be trivial, since these are satisfied by every relation on a
schema R.

MVD is a generalization of FD An FD  implies an MVD , wherein the


set of  values, matching a given  value, will be a singleton set.

Fagin’s Theorem If a relation schema R (,,) satisfies an MVD  then R


can be loss-less decomposed into schemas R1 (,) and R2 (,). This implies that any
legal relation r, on the schema R (,,), will be equal to equi-join of its projections on
(,) and (,); that is r = ,(r) * ,(r)

Fourth Normal Form (4 NF)

A relation schema R is said to be in 4 NF, if and only if every MVD  holding on
R satisfies either the following two conditions:-
(a) It is trivial MVD or
(b)  is a Super Key of R

Now, the relation CTX has non-trivial MVDs Course  Teacher and Course 
Text. These are not trivial MVDs. Also, Course is not Super Key of CTX, since CTX is
an “All Key” Relation Schema. Thus, CTX is in BCNF; but not in 4 NF.

Non-loss Decomposition, based on MVDs

As per Fagin’s Theorem, a relation schema R (,,) satisfying MVDs  can be
loss-less decomposed into schemas R1 (,) and R2 (,). So, CTX can be decomposed
into CT and CX.
ct
COURSE TEACHER
OS Ravi

P S Gill
39
OS Vivek
CO Ram
CO Shyam
cx
COURSE TEXT
OS Galvin
OS Dietel
CO Hamacher
CO M-mano

As evident, ct * cx = ctx

The relations ct and cx are not satisfying any non-trivial MVDs; thus both CT & CX are
in 4 NF.
The relations are free of the data redundancies indicated above, which existed in ctx. The
information of a Teacher teaching a particular Course is now represented only in one
tuple in ct and the information regarding a Text being followed for a particular Course is
represented only at one place in cx.

A relation schema R in BCNF, not having any non-trivial MVDs holding on it, will be in
4 NF. But, it may still have some data anomalies, for example consider a schema CTX4,
with the following constraints:-

(a) A Course may be taught by a number of Teachers.

(b) A number of Text books may be followed for a Course.

(c) The set of Texts, followed for a Course, depend not only on the Course
but also on the Teacher teaching it. It means that each teacher teaching a
particular course may follow different sets of text books; the sets may be
overlapping.

(d) If a Teacher T1, teaching a Course C1, does not follow a Text X1, which is
being followed by another Teacher T2 to teach the course C1, then T1 must not
follow X1 for any other Course, which he may be teaching.

ctx4

COURSE TEACHER TEXT


OS Ravi Galvin
OS Vivek Dietel
OS Ravi Dietel
CO Ram Hamacher
CO Shyam M-mano
CO Shyam Hamacher

P S Gill
40

So, the set of TEXT-values that occur matching a given {COURSE-value, TEACHER-
value} pair in ctx4 depends not only on COURSE but also on TEACHER. So, it does not
satisfy the MVD COURSE TEACHERTEXT. So, the schema CTX4 does not have
any non-trivial MVDs holding on it. So, it is in 4 NF. But ctx4 still has some data
redundancies, like the information about a teacher teaching a course appears as many
times, as the number of text books followed by that teacher for that Course.

Since, CTX4 does not satisfy MVD COURSE TEACHERTEXT, it can be verified
that ct * cx  ctx4

ct
COURSE TEACHER
OS Ravi
OS Vivek
CO Ram
CO Shyam

cx
COURSE TEXT
OS Galvin
OS Dietel
CO Hamacher
CO M-mano

ct * cx
COURSE TEACHER TEXT
OS Ravi Galvin
OS Ravi Dietel
OS Vivek Galvin
OS Vivek Dietel
CO Ram Hamacher
CO Ram M-mano
CO Shyam M-mano
CO Shyam Hamacher

As verified above, ct * cx  ctx4. ct * cx has two spurious tuples, which do not exist in
ctx4. So, the decomposition of CTX4 into CT and CX is not a loss-less (non-additive)
decomposition.

P S Gill
41

It may be feasible to eliminate these data redundancies of ctx4, on the basis of another
type of dependency, called Join Dependency (JD).
Example (Employee-Project-Department)

Suppose an organization has a set of Employees, a set of Departments and a set of


Projects which are being progressed at various departments with the following constraints
holding on the system:-

(a) Each Department could be working on many Projects.


(b) Each Project could be getting progressed at many Departments.
(c) Each employee could be working on many Projects in many Departments.
(d) The set of Departments in which an Employee is Working is determined by
the Employee alone and is completely independent of the set of Projects on
which that Employee is working.
(e) The set of Projects on which an Employee is working is determined by the
Employee alone and is completely independent of the set of Departments in
which the Employee is working.

A Sample Database, created on a schema with the above constraints, will be:-

EPD
E# P# D#
E1 P1 D3
E1 P2 D1
E1 P1 D1
E1 P2 D3
E4 P3 D2
E4 P1 D3
E4 P3 D3
E4 P1 D2

The above table does not have any non-trivial FD; and is thus in BCNF. However, it
has a lot of data redundancies; like the fact that Employee E 1 is working on project P1
is reflected in two tuples. Similarly, there are many redundancies.

In the above table, we have:-


{E1 , P1} {D3, D1}
{E1 , P2} {D3, D1}

This implies that the set {D3, D1}is determined by E1 alone and does not change when
P# is changed from P1 to P2.

However when E# is changed from E1 to E4, the set of D#s changes as indicated
below:-
{E4 , P3} {D2, D3}

P S Gill
42
{E4 , P1} {D2, D3}

Thus, the schema for this table satisfies E#  P# and E#  D#. This pair of
MVDs is non-trivial. Thus, EPD is not in 4NF. It can be loss-less decomposed into
EP(E#, P#) and ED(E# , D#) as shown below:-

EP
E# P#
E1 P1
E1 P2
E4 P3
E4 P1

ED
E# D#
E1 D3
E1 D1
E4 D2
E4 D3

It can be verified that EP*ED = EPD. Also, both EP and ED are free of the data
redundancies. Both EP and ED do not have any non-trivial MVD and are thus in 4NF.

Join Dependency (JD) A Relation Schema R is said to have a Join Dependency


*(R1, R2,…., Rn), if and only if any legal relation r(R) is equal to equi-join of its
projections on R1, R2,…., Rn.

i.e r = R1 (r) * R2 (r) * ……..* Rn (r)

Trivial Join Dependency

A JD *( R1, R2,…., Rn) of a relation schema R is said to be trivial, if one of the


projections (R1…Rn )is equal to R itself.

An MVD is also a JD

An MVD  on a relation schema R is also a JD *(, ). This implies that a
legal relation r(R ) can be loss-less decomposed into its projections  and  i.e.
r =  (r)   (r)

The relation schema CTX4 has a Join Dependency * (CT, TX, XC) where C: Course, T:
Teacher and X: Text , which can be verified as follows:-

P S Gill
43

ct
COURSE TEACHER
OS Ravi
OS Vivek
CO Ram
CO Shyam

tx
TEACHER TEXT
Ravi Galvin
Vivek Dietel
Vivek Galvin
Ram Hamacher
Shyam M-mano
Shyam Hamacher

xc
TEXT COURSE
Galvin OS
Dietel OS
Hamacher CO
M-mano CO

ct * tx * xc

COURSE TEACHER TEXT


OS Ravi Galvin
OS Ravi Milan
OS Vivek Galvin
CO Ram Hamacher
CO Shyam M-mano
CO Shyam Hamacher

Thus, ct * tx * xc = ctx4, thus CTX4 has a Join Dependency *(CT, TX, XC)
So, CTX4 can be loss-less decomposed into its projections on CT, TX and XC, which are
free of any data redundancies that existed in the relation CTX 4.

Non-Trivial JD
A JD *(R1, R2, ….Rn) of relation schema R is said to be trivial iff one of the projections
in JD is equal to R itself. Such a JD hold on each schema.

P S Gill
44

Fifth Normal Form (5 NF)

A relation schema R is said to be in 5 NF, if and only if any non-trivial Join Dependency
holding on R, is implied by its Candidate Keys.

The relation CTX4 has a JD *(CT,TX,XC) which is not implied by its Candidate Key
{C,T,X}. Thus, CTX4 is in 4 NF but not in 5 NF.

The relations CT, TX and XC do not have any non-trivial Join Dependencies, and are
thus in 5 NF.

Assuming that the Text Books followed for a Course are dependent not only on the
Course but also on the Teacher teaching it, Join Dependency will hold on CTX4,
only if the following is satisfied:-

“If a Teacher T1 teaching a Course C1, does not follow a Text Book X1, which is
being followed by another Teachers teaching Course C1, then T1 must not follow X1
for any other Course also that he may be teaching.” Only then the JD *(CT,TX,XC)
will hold on CTX4

A relation schema in 5 NF but still having some data redundancies


Example: A schema CTX5, with the following constraints:-

1. A Course may be taught by any number of Teachers and a Teacher


may teach any number of Courses.

2. A number of Text books may be followed for a Course and a Text


book may be followed for any number of courses.

3. A Teacher T1, teaching a Course C1, may not follow a Text Book
X1, which is being followed by another teacher teaching the
Course C1, but T1 may follow X1 while teaching another Course
say C2.

A relation under schema CTX5:-

ctx5
COURSE TEACHER TEXT
OS Ravi Galvin
OS Vivek Milan
OS Ravi Milan
CO Vivek Hamacher
CO Ram Hamacher
CO Vivek Galvin

P S Gill
45

If we have its projections on ct, tx and xc :-

ct
COURSE TEACHER
OS Ravi
OS Vivek
CO Vivek
CO Ram

tx
TEACHER TEXT
Ravi Galvin
Vivek Milan
Ravi Milan
Vivek Hamacher
Ram Hamacher
Vivek Galvin

xc
TEXT COURSE
Galvin OS
Milan OS
Hamacher CO
Galvin CO

ct * tx * xc

COURSE TEACHER TEXT


OS Ravi Galvin
OS Ravi Milan
OS Vivek Galvin
OS Vivek Milan
CO Vivek Hamacher
CO Vivek Galvin
CO Ram Hamacher

P S Gill
46

ct * tx * xc has an additional tuple i.e. OS, Vivek, Galvin, which does not exist CTX5.
Thus, decomposition of CTX5 into its projections on CT,TX and XC is not a loss-less
(NON-ADDITIVE) decomposition. The natural join of CT, TX, XC contains some
spurious tuples. Thus, CTX5 does not have any JD and. thus, it is in 5 NF.

CTX5, though in 5 NF, still has some data anomalies; like information that ‘Galvin is the
text-book for OS’ is represented twice. These data anomalies cannot be eliminated on the
basis of Functional Dependencies, Multi-Valued Dependencies or Join Dependencies.

What Next?

We reach a dead end, till a new type of dependency is discovered.

SIXTH NORMAL FORM (6 NF)

A Relation Schema R will be in 6NF if the only Join Dependencies holding on R are
trivial Join Dependencies.

Example: ACCOUNT (AN, BN, BAL)


AN BN, BAL
Primary Key: {AN}

The only left-irreducible, non-trivial FD holding on ACCOUNT has


Primary Key on LHS; thus the schema is at least in BCNF.

The FD AN BN, BAL implies MVD AN BN | BAL.


Since the MVD has Primary Key on LHS; ACCOUNT is at least in 4NF.

The MVD AN BN | BAL implies JD * ({AN,BN}, {AN,BAL})


Since all the decompositions of this JD form Super Keys of ACCOUNT;
thus ACCOUNT is at least in 5NF.

But it is not in 6NF, since it has a non-trivial JD * ({AN,BN},


{AN,BAL})

For transformation to 6NF, ACCOUNT has to be decomposed on the basis


of the non-trivial JD holding on it.

So, ACCOUNT is decomposed into:-

ACOOUNT-BN (AN, BN) Primary Key (AN)

P S Gill
47
AN BN

ACCOUNT-BAL (AN, BAL) Primary Key (AN)


AN BAL

ACCOUNT-BN and ACCOUNT-BAL do not support any non-trivial JDs;


and are thus in 6 NF.

We can conclude that a Schema in 6NF can comprise of only its Primary Key and at
most one non-key attribute.

How is 6NF superior to 5NF?

To transform it into higher normal forms, it has to be decomposed wrt AN  BN

i.e. ACOOUNT-BN (AN, BN) Primary Key (AN)


AN BN

ACCOUNT-BAL (AN, DATE, TIME, BAL)


Primary Key (AN, DATE, TIME)
{AN, DATE, TIME} BAL

Since each of these schemas contains Primary Key plus only one non-key
attribute, thus both are in 6NF.

Taking another example:-

EMP (E#, E_NAME, SALARY, PROJ_NO) Primary Key (E#)


E# E_NAME, SALARY, PROJ_NO
This schema is also in 5NF but not in 6NF.

The SALARY and PROJ_NO (Project on which he works) will keep on changing.
Suppose, we want to record the information of durations during which different
values of SALARY were valid and the durations during which different values of
PROJ_NO were valid for each employee, then the schema would need to be
decomposed as follows:-

6NF is most suitable for Temporal Databases, which contain time-element. For example
if we want to introduce time element in ACCOUNT, to indicate the Time and Date when
BAL is valid, then the schema will be:-
ACCOUNT (AN, DATE, TIME, BN, BAL)
{AN, TIME, DATE}  BAL
AN  BN

Primary Key : {AN, TIME, DATE}


Since it has a partial FD AN  BN, thus it is not even in 2NF; it is rather in 1NF.

P S Gill
48

EMP_NAME (E#, E_NAME) Primary Key (E#)

EMP_SALARY (E#, FROM_DATE, TO_DATE, SALARY)


Primary Key (E#, FROM_DATE, TO_DATE)

EMP_PROJ( E#, FROM_DATE, TO_DATE, PROJ_NO)


Primary Key (E#, FROM_DATE, TO_DATE)
All these schemas are in 6NF.
Concept of Inclusion Dependencies

A Foreign Key constraint cannot be represented by an FD, MVD of JD. It can be


represented using an Inclusion Dependency (ID).

Inclusion Dependency Suppose attribute set Y in Schema S is a Foreign Key (FK)


referencing Primary Key X of Schema R, then this foreign key constraint can be
represented by an Inclusion Dependency S.Y < R.X
This constraint specifies that for a given relation r(R), a relation s(S) would be valid only
if it satisfies Y (s (S)) X (r (R))

Suppose there are schemas ACCOUNT (AN, BN, BAL)


And BRANCH (BN, BC, ASSETS)
where BN is Foreign Key in ACCOUNT referencing BN in BRANCH.

This Foreign Key Constrain can be represented by Inclusion Dependency


ACCOUNT. BN < BRANCH. BN

Inference Rules of Inclusion Dependecies.

(i) Reflexivity: R.X < R.X

(ii) Attribute Correspondence: If R.X < S.Y and X = {A1, A2, ….., An} and
Y = {B1, B2, ….., Bn} and Ai corresponds to Bi for 1 < i < n, then it will have
R. Ai <S. Bi for all i.

(iii) Transitivity: If R.X < S.Y and S.Y < T.Z then it will have R.X < T.Z.

P S Gill
49

P S Gill
1

Some Solved Examples


1. Use FDs to indicate the following mapping cardinalities:-

(a) Many-to-one relationship exists between account-no and


customer-id.

Sol: Account_no  customer_id

(b) One-to-one relationship exists between account-no and customer-id.

Sol: Account_no  customer_id, customer_id  account_no

(c) Many-to-many relationship exists between account-no and customer-id.

Sol: {cutomer_id ,account_no}  cutomer_id, account_no


(Trivial FD}

2. Suppose, we have following three tuples in S(A,B,C)

A B C
1 2 3
4 2 3
5 3 3

Which of the following is true? (a) AB (b) BCA (c) BC

Sol:
(a) AB holds (since the value of A is not matching in any pair of
tuples and so no two tuples are expected to match on the value of B)

(b) BCA does not hold (since in tuples 2 & 3, the value of BC is
matching but that of A is different).

(c) BC holds (since the value of B is matching in tuples 1 & 2 and so is
the value of C matching in these two tuples).

3. Given the following set of FDs on R(A, B, C, D, E, F)


{A  BC, E  CF, B  E and C  EF}, compute the closure of attribute
set {A}.
Sol:
{A}+ = ABC since A  BC
= ABCEF since C  EF

P S Gill
2

4. Given the following set of FDs on Schema R (A,B,C,D,E,F,G)


A  B, ABCD  E and EF  G, determine if ACDF  G holds on R.

Sol: {ACDF}+ = ACDFB since A  B


= ACDFBE since ABCD  E

= ACDFBEG since EF  G

Since G  {ACDF}+ , ACDF  G holds.

5. Find the Closures (excluding trivial FDs) and Minimal Covers of the
following FD Sets. Are the sets equivalent?

F: A  B, AB  C, D  AC, D  E
G: A  BC, D  AE
Sol:
Minimal Cover of F

Step 1 Apply decomposition rule to each FD in F, such that FDs have only
singletons as dependents.
Fc :{A  B, AB  C, D  A, D  C, D  E}.

Step 2 Eliminate extraneous attributes, if any, from the determinants of all FDs
Since, A  B,  A AB
Since, A AB and AB  C,  A C;
Thus B is extraneous in FD AB C, which may be eliminated
So, Fc : {A  B, A  C, D  A, D  C, D  E}.

Step 3 Eliminate FDs in Fc which are logically implied by other FDs in FC

D C is logically implied by D  A and A  C.


Thus, D C is to be eliminated.
Fc : {A  B, A  C, D  A, D  E}.
This is equivalent to : {A  BC, D AE}.

Minimal Cover of G

Step 1 Apply decomposition rule to each FD in G, such that FDs have only
singletons as dependents.
Gc :{A  B, A  C, D  A, D  E}.

Step 2 Eliminate extraneous attributes, if any, from the determinants of all FDs
Since, all determinants are only singletons, so Gc remains unchanged.
Gc :{A  B, A  C, D  A, D  E}.

P S Gill
3
Step 3 Eliminate FDs in Gc which are logically implied by other FDs in GC
There is no such FD.
Gc : {A  B, A  C, D  A, D  E}.
This is equivalent to : {A  BC, D AE}.
Since, Minimal Cover of F = Minimal Cover of G, these are equivalent sets.

{F}+ = {G}+ = { A  BC, D AE, A  B, A  C, D A,


D E, D C, D B, D AB, D AC,
(Excluding Trivial FDs) D ABE, D ACE, D ADE, D BCE,
D AECB}

6. Following FDs hold on R (A,B,C):- A  BC, B  CA, C  AB


Determine its (a) Candidate Keys (b) Closure F + (c) Minimal Cover Fc

Sol:

(a) Candidate Keys: {A}, {B}, {C}

(b) F+ : { A  BC, B  CA, C  AB, A  B, A  C, B  C, B  A,


C  A, C  B, A  ABC, B  ABC, C  ABC }

(c) Minimal Cover:

Step 1 Apply decomposition rule to each FD in F, such that FDs have only
singletons as dependents.
Fc :{A  B, A  C, B  C, B  A, C  A, C  B}

Step 2 Eliminate extraneous attributes, if any, from the determinants of all FDs
Since, all determinants are only singletons, so Fc remains unchanged.
Fc :{A  B, A  C, B  C, B  A, C  A, C  B}

Step 3 Eliminate those FDs in Fc which are logically implied by other FDs in FC

A  B is logically implied by A  C, C  B, so A  B may be


eliminated.
FC now is :{A  C, B  C, B  A, C  A, C  B}

B  C is logically implied by B  A, A  C, so B  C may be


eliminated.
FC now is :{A  C, B  A, C  A, C  B}

C  A is logically implied by C  B, B  A, so C  A may be


eliminated.
FC now is :{A  C, B  A, C  B}

So, FC = {A  C, B  A, C  B}

P S Gill
4

Other Canonical Covers are:-


FC = {A  B, B  C, C  A}
FC = {A  B, A  C, B  A, C  A }

All these Canonical Covers are equivalent sets, since their closure is same.

7. Following FDs hold on the relation schema R (A,B,C,D,E)


A  BC , CD  E , B  D, E  A
Determine (a) whether E  D holds (b) Closure F+ (excluding trivial FDs)
(c) its Candidate Keys

Sol:
(a) {E}+ = EA Since E  A holds
= EABC Since A  BC holds
= EABCD Since B  D holds

Since, D  {E}+, E  D holds

(b) Closure F+ (excluding trivial FDs)

{ A  BC , CD  E , B  D, E  A, A  B , A  C , E  B,
E  C, CD  A , CD  B , …….}

(c) Candidate Keys {C,D}, {E}, {A}, {B,C}

8. Given R (ABCDE) and the set of FDs on R given by:-


F= {AB  CD, ABC  E, C  A}.
What are the candidate keys of R ?

Sol:
Candidate keys: {A,B}, {B,C}

9. Suppose a relational schema R = (A,B,C,D,E) is decomposed into:-


(A, B, C)
(A, D, E)
Prove that this decomposition is a loss-less join decomposition, if the
following FDs hold on R:-
(A  BC, CD  E, B  D, E  A )
Sol:
{A}+ = ABCDE
{B,C}+ = BCDEA
{C,D}+ = CDEAB
{E}+ = EABCD

P S Gill
5
So, the candidate keys are:- {E}, {A}, {B,C}, {C,D}
All attributes are prime attributes.
The relation is in 3 NF

The decomposition is a loss-less join decomposition, since the intersection


of the two decompositions = {A} which is a candidate key of both the
decompositions.

10. Define 1 NF, 2 NF, 3 NF and BCNF. Out of 3 NF & BCNF, which one is
stronger Normal Form? Justify.

Sol:

1 NF: A relation schema R is in 1 NF, if all its attributes have atomic domains

2 NF: A relation schema R is in 2 NF, if it is in 1NF and all of its non-prime


attributes are fully functionally dependent on its candidate keys.

3 NF: A relation schema R is in 3 NF, if each FD  holding on R satisfies one


of the following conditions:-
Either (a) It is a trivial FD
or (b)  is a super key of R
or (c) each attribute in (- ) is a prime attribute
BCNF: A relation schema R is in BCNF, if each FD  holding on R
satisfies one of the following conditions:-
Either (a) It is a trivial FD
or (b)  is a super key of R

As indicated above, BCNF is more restrictive as compared to 3 NF. A relation


schema R in BCNF will also be in 3 NF but reverse may not be true. So, BCNF is
stronger than 3 NF.

11. For a given R (ABCDEFGH) with the FDs-


{ A  BCDEFGH, BCD  AEFGH, BCE  ADFGH, CE  H, CD  H}
Find a BCNF decomposition of R. Is it dependency preserving?
Sol:

Candidate Keys: {A}, {B,C,D}, {B,C,E}

It has two partial FDs i.e CE  H, CD  H

BCNF Decomposition
R1 (C, E, H)
CE H

P S Gill
6
R2(A, B,C,D,E,F,G) All FDs have only candidate keys on left side

A  BCDEFG So, both R1 and R2 are in BCNF


BCD  AEFG
BCE  ADFG

The decomposition is not dependency preserving, since the following FDs of F +


have been lost in the decomposition:-
(a) CD  H
(b) A  H
(c) BCD  H
(d) BCE  H

12. Consider a Relation Schema R (A,B,C,D,E) with the following set of FDs
holding on it:-
A B, C  D, D E
Determine:-
(a) Its Candidate Keys.
(b) What Normal Form is it in?
(c) Decompose it into a BCNF Schema, if it is not already in BCNF.

Sol: (a) Candidate Key : {A,C}


(b) It is in 1 NF since there are partial FDs A B, C  DE holding on it.
(c) BCNF Decomposition:-

Decomposing on the basis of D  E


R1 (D, E) R1 is in BCNF
D E

R2 (A, B, C, D) PK (A, C)
A B, C  D
R2 has partial FDs A B, C  D
It is in 1NF only

Decomposing R2 on the basis of A B

R21 (A, B) PK (A) R21 is in BCNF


AB

R22 (A, C, D) PK (A, C)


CD Has partial FD C  D

Decompose R22 on the basis of C  D

R221 (C, D) PK (C) BCNF

P S Gill
7
CD
R222 (A,C) PK (A, C) BCNF

Therefore, the BCNF decomposition of R is:-

R1 (D, E) PK (D)
D E

R21 (A, B) PK (A)

R221 (C, D) PK (C)


CD FK (D) references R1

R222 (A, C) PK (A, C)

FK (A) references R21


FK(C) references R221

13. Given a relation schema R (A,B,C,D,E) with the set of FDs


A B, BC  D, D  BC and DE  0.
Does it have any redundant (extraneous) FDs ?
If yes, eliminate such FDs.
What normal form is it in?
If not in 3NF, then decompose it into 3 NF schemas.

Sol: DE  0 is redundant FD.


{A,C,E}+ = ACBDE
{A,D,E}+ = ADEBC
So, {A,C,E} and {A,D,E} are candidate keys.
B is a non-prime attribute.
A B and D B are partial FDs.
Therefore, R is in 1 NF.
Decomposing R into BCNF on the basis of A B

R1 (A, B) PK (A)
A B
R2 (A, C, D, E) {A,C,E} and {A,D,E} are candidate keys
FK (A) references R1

14. Following FDs hold on R(A,B,C,D,E) AB, BCE, EDA

(a) Is it in 3NF?
(b) Is it in BCNF?

P S Gill
8

Sol:

{A,C,D}+ = ACDBE
{E,D,C}+ = EDCAB
{B,C,D}+ = BCDEA

Therefore, {A,C,D} , {E,D,C} and {B,C,D} are candidate keys of R.


All are prime attributes.
R is at least in 3 NF.

R has FDs AB, BCE, EDA that do not have candidate keys on their
left side. Therefore, R is not in BCNF.

15. Following Schemas are the sub-schemas of a Global Schema R


(A,B,C,D,E,F,G,H,I).

For the FDs as indicated with each sub-schema, determine the Normal
Form of each sub-schema. If it is not in BCNF, decompose it into BCNF.

(a) R1 (A, B, C, D, E) AB, CD


(b) R5 (A, I, C, E) No FD

Sol: (a) R1 (A,B,C,D,E) AB, CD

{A,C,E} is candidate key of R1


It has partial FDs i.e. AB, CD
Therefore, R1 is in 1 NF

BCNF Decomposition of R1:-

R11 (A, B) PK (A)


AB
R12 (C,D) PK (C)
CD
R13 (A,C,E) PK (A,C,E)
FK (A) references R11
FK(C) references R12
(b) R5 (A,I,C,E) No FD
Since R5 has no non-trivial FD, it is in BCNF.

16. For R (A,B,C,D) with the following FDs :-

(i) Determine its Candidate Keys


(ii) What best NF is it in?
(iii) Decompose R into BCNF Schema, If it is not in BCNF.

P S Gill
9

(a) CD, CA, BC


(b) BC, DA
(c) AB, BCD, AC
(d) ABC, ABD, CA, DB
(e) ABCD, DA
Sol:
(a) CD, CA, BC

Candidate Key : {B}

There is a transitive dependency BC, CDA, but no partial FD.


So, R is in 2 NF but not in 3NF.

BCNF Decomposition of R

R1 (C,D,A ) PK (C)
CDA
R2 (B, C) PK (B)
BC FK(C) references R1

(b) BC, DA


Candidate Key : {B,D}
BC, DA are partial FDs. So, R is in 1 NF.
BCNF Decomposition of R

R1 (B, C ) PK (B)
BC
R2 (D, A) PK (D)
DA
R3 (B, D) PK (B, D)
FK (B) references R1
FK (D) references R2

(c) AB, BCD, AC


Candidate Key : {A}
ABC, BCD There is a transitive FD. So, R is in 2 NF.
BCNF Decomposition of R

R1 (B, C, D ) PK (B, C)
BCD
R2 (A, B, C) PK (A)
ABC FK (B,C) references R1

(d) ABC, ABD, CA, DB

P S Gill
10
Candidate Keys : {A, B} , {C,D}
All are prime attributes; So R is at least in 3 NF
There are two FDs CA, DB which do not have candidate key as
determinants; therefore it is not in BCNF.

BCNF Decomposition of R

R1 (C, A ) PK (C)
CA
R2 (D, B) PK (D)
DB
R3 (C, D) PK (C, D)
FK (C) references R1
FK (D) references R2

(e) ABCD , DA


Candidate Keys : {A, B,C }, {D, B, C}
All being prime attributes; R is at least in 3 NF
There is one FD DA, which does not have candidate key as its
determinant; so R is not in BCNF.

Possible BCNF Decomposition of R is:-

R1 (D, A ) PK (D)
DA

R2 (B, C, D) PK (B, C, D)
No non-trivial FD FK (D) references R1

17. For R (A,B,C,D,E,G,H) consider FD Set { ABC, ACB, ADE, BD,


BCA, EG}

For the following sub-schemas of R, find:-


(i) Restriction of FDs holding on the sub-schema
(ii) Minimal Cover of the FDs holding on the sub-schema
(iii) Strongest NF that the sub-schema is in
(iv) Equivalent BCNF Schema of the sub-schema

(a) R1 (A,B,C)
(b) R2( A,B,C,D)
(c) R3(A,B,C,E,G)
(d) R4(D,C,E,G,H)
(e) R5(A,C,E,H)

Sol:
(a) R1 (A,B,C)

P S Gill
11
FDs holding on the schema: ABC, ACB, BCA
Minimal Cover: ABC, ACB, BCA
Candidate Keys: {A,B} , {B,C}, {C,A}
Strongest Normal Form: Since all FDs have only candidate keys as
determinants, R1 is already in BCNF

(b) R2 (A,B,C,D)
FDs holding on the schema: ABC, ACB, BCA, BD
Minimal Cover: ABC, ACB, BCA, BD
Candidate Keys: {A,B} , {B,C}, {C,A}
Strongest Normal Form: It has a partial FD BD, so R2 is in 1 NF
BCNF Decomposition:-

R21 (B, D) PK (B)


BD

R22 (A,B,C) PK {A,B} or {B,C} or {C,A}


ABC, ACB, BCA FK (B) references R21

(c) R3 (A,B,C,E, G)
FDs holding on the schema: ABC, ACB, BCA, EG
Minimal Cover: ABC, ACB, BCA, EG
Candidate Keys: {A,B,E} , {B,C,E}, {C,A,E}
Strongest Normal Form: It has a partial FD EG, so R3 is in 1 NF

BCNF Decomposition:-
R31 (E, G) PK (E)
EG
R32 (A,B,C,E) PK {A,B,E} or {B,C,E} or {C,A,E}
ABC, ACB, BCA FK (E) references R31
R32 has the FDs, which do not have candidate key on left side,
it is not in BCNF
Decompose R32 as follows;-
R321 (A,B,C) PK {A,B} or {B,C} or {C,A}
ABC, ACB, BCA
R322 (A,B,E) PK {A,B,E}
FK {A,B} references R321
FK (E) references R31
So, BCNF decomposition: R31(E,G), R321 (A,B,C), R322 (A,B,E)

(d) R4 (D,C,E,G,H)

FDs holding on the schema: EG

Minimal Cover: EG

P S Gill
12
Candidate Keys: {D, C, E, H}
Strongest Normal Form: It has a partial FD EG, so R4 is in 1 NF

BCNF Decomposition:-
R41 (E, G) PK (E)
EG

R42 (D, C, E, H) PK {D,C,E,H}


No non-trivial FD FK (E) references R41

(e) R5 (A, C, E, H)
FDs holding on the schema:

AC  E since AC  B And B  D AC  ACBD & AD  E

Minimal Cover: { AC  E }
Candidate Keys: {A, C, H}
Strongest Normal Form: 1 NF since, it has partial FD AC  E

BCNF Decomposition:-
R51 (A, C, E) PK (A, C )
EG

R52 (A, C, H) PK {A, C, H}


No non-trivial FD FK (A, C) references R51

18. Consider following decompositions of R (A,B,C,D,E,G) with FDs: { ABC,


ACB, ADE, BD, BCA, EG}. Determine, which decomposition is (i)
Loss-less Join Decomposition ? (ii) Dependency Preserving Decomposition?

(a) {AB, BC, ABDE, EG}


(b) {ABC, ACDE, ADG}

Sol: R (A,B,C,D,E,G)
ABC, ACB, ADE, BD, BCA, EG

{A,C}+ = ACBDEG
{B,C}+ = BCADEG
{A,B}+ = ABCDEG
Candidate Keys : {A, B} , {B,C}, {C,A}

Use ABU’s Algorithm to determine whether the decompositions are


non-loss or not.

P S Gill
13
19. Suppose R(A,B,C) has an FD BC holding on it and A is its Candidate
Key. Can it be in BCNF? If yes, under what conditions?

Sol: Yes, R can be in BCNF, if BA holds on it.


Then, it will have FDs BAC and ABC holding on it. Both FDs have
only candidate keys on their left side. Thus, R would be in BCNF.

20. For R (A,B,C,D) with Primary Key {A,B}, state conditions for R to be in 2NF but
not in 3NF.

Sol:
R will be in 2 NF, but not in 3NF iff
AB  C & C  D holds
or AB  D & D C holds

21. Define Fourth Normal Form. Consider a Relational Schema R =


(A,B,C,D,E). Let M be the following set of Multi-Valued Dependencies:-

M = (A BC, BCD, EAD)


Give a loss-less join decomposition of R into Fourth Normal Form. Justify your
answer.

Sol: Since A BC holds, by Complementation Rule A (R-A-BC)


 DE
So, by Fagin’s Theorem R1 (A,B,C) and R2 (A,D,E) will be loss-less
decomposition of R.

Considering BCD, we can prove that R1 (B,C,D) and R2 (B,A,E) will


also be a loss-less decomposition of R.

Similarly, considering EAD, we can prove that R1 (E,A,D) and R2


(E,B,C) will also be a loss-less decomposition of R.

22. Given a relation schema R (A,B,C,D,E) with the following FDs


A  BCDE, B  ACDE, C  ABDE

(a) What are the Candidate Keys of R ?


(b) What are the Join Dependencies of R?
(c ) Give a loss-less join decomposition of R.
Sol: Candidate Keys: {A}, {B}, {C}
Join Dependencies of R
*(AB, AC, AD, AE)
*(BC, BA, BD, BE)
*(CA, CB, CD, CE)
*(ABC, ADE)

P S Gill
14
*(BAC, BDE)
*(CAB,CDE)
*(ABCD, AE)
*(BCDA, BE)
*(CABD, CE)
and so on ……….

All the above JDs show a loss-less join decomposition of R


For example R1 (A,B,C) and R2 (A,D,E) is a loss-less decomposition of R.

23. Let R = (A, B, C, G, H, I) with the following set of dependencies :-

D = { A  B, BHI, CGH}
Find whether the following are members of D+ :-
A  CGHI
A  HI
BH
A  CG
AH
Sol:

Since A  B, so by complementation A  (R – A – B)
 CGHI

Since A  B and B  HI so by transitivity A  HI – B


 HI

Since B  HI
CG  H where H  HI and HI  CG = 0
Therefore, by Coalescence Rule, B  H

Since, A  CGHI and A  HI therefore by Union Rule A  CGHI- HI


 CG
Since A  HI
CG  H where H  HI and HI  CG = 0
Therefore, by Coalescence Rule, A  H

P S Gill
15

SOME SOLVED UPTU EXAM QUESTIONS

B TECH (UPTU 2002-2003)

Q.1. Given the following set of FDs on the Schema R (V, W, X, Y, Z )


{Z  V, W  Y, XY  Z and V  WX} state whether the following
decompositions are loss-less-join decompositions or not.

(i) R1 = (V, W, X)
R2 = (V, Y, Z)

(ii) R1 = (V, W, X)
R2 = (X, Y, Z)

Sol:
(I) Since V  WX, V is a Candidate Key of R1
Now, R1  R2 = {V} which is candidate key of R1
 It is a loss-less-join decomposition of R.

(i) R1  R2 = {X} which is NOT a candidate key of R1 or of R2


 It is NOT a loss-less-join decomposition of R.

Q. 2.
Given Schema R = (A, B, C, D, E, F, G, H, I, J) and FDs
F: {AB  C, A  DE, B  F , F  GH and D  IJ }

(I) Determine Candidate Key of R


{A, B}+ = ABC since AB  C
= ABCDE since A  DE
= ABCDEF since B  F
= ABCDEFGH since F  GH
= ABCDEFGHIJ since D  IJ
=R
Since, {A, B}+ = R
So, {A,B} is a Candidate Key of R

(II) Decompose R into 2NF Schemas

R has the following FDs, wherein some of the Non-Prime


Attributes are determined by proper sub-set of the Candidate
Key:-

A  DEIJ

P S Gill
16
B  FGH

So, R can be decomposed on the basis of above FDs:-

R1 (A, D, E, I, J)
A  DE, D  IJ

R2 (B, F, G)
B  F, F  GH

R3 (A, B, C)
AB  C

The above schemas are in 2 NF.

R1 is in 2NF but not in 3NF, since it has a transitive


dependency A  D, D  IJ. It can be decomposed into 3NF
on the basis of this transitive dependency:-

R11 (D, I, J)
D  IJ
R12 (A, D, E)
A  DE

B Tech – UPTU 2003 (Carry Over Paper)

Q.3. Consider the Schema with FDs:-


R (A, B, C, D, E)
A  BC, CD  E, B  D and E  A

(a) How is Candidate Key found for a given FD Set?

The Candidate Key is found by determining attribute set  ( where


  R) such that + = R and  has no extraneous attributes in the
FD  R.

Find Candidate Keys of R.

{A}+ = ABC since A BC


= ABCD since BD
= ABCDE since CDE
=R

Since {A}+ = R, {A} is a Candidate Key of R.

P S Gill
17
Similarly, we can determine that {E}, {C,D} and {B,C} are other
Candidate Keys of R.

(b) How is BCNF more desirable than 3NF?

3NF and BCNF are defined as follows:-

3 NF: A relation schema R is in 3 NF, if each FD  holding on R


satisfies one of the following conditions:-
(a) It is a trivial FD
or (b)  is a super key of R
or (c) each attribute in (- ) is a prime attribute

BCNF: A relation schema R is in BCNF, if each FD 


holding on R satisfies one of the following conditions:-
(a) It is a trivial FD
or (b)  is a super key of R

So, by definition, BCNF is more restrictive than 3NF. Relations satisfying


BCNF will have less data redundancies as compared to 3NF relations.
That is why BCNF is more desirable than 3NF.

B Tech (UPTU 2003-2004)

Q.4. For Schema R (A,B,C,D,E), is the following decomposition loss-less join


decomposition :-
(A, B, C)
(A, D, E) with FDs :- { A BC, CD E, B D, E A } holding on R.

sol:-
Since, A BC, therefore, A is Candidate key of (A, B, C)

 (A, B, C)  (A, D, E) = {A}  (A, B, C)

Thus, it is a loss-less-join decomposition of R.

Q.5 For Schema R (A,B,C,D,E)


With MVDs M = { A BC, B CD, E AD }
Give a loss-less-join decomposition of R into 4NF. Justify your answer.

Sol: Since A BC on R, so by Fagin’s Theorem, R can be loss-less


decomposed into:-

P S Gill
18
R1 (A, B, C)
R2 (A, D, E)
Only MVD that holds on R1 is A BC.
Since A BC = R1, therefore it is a trivial MVD and thus R1 is in 4 NF.

Similarly, the only MVD that holds on R2 is E AD.


Since E AD = R2, therefore it is a trivial MVD and thus R2 is in 4 NF.

Similarly, the following decompositions are also loss-less-join decompositions:-

R1 (B, C, D)
R2 (B, A, E) on the basis of MVD B CD

and
R1 (E, A, D)
R2 (E, B, C) on the basis of MVD E AD

P S Gill
19

Case Studies

8.1. Consider the following Bank_Schema:-

Bank_Schema ( Cust_Id, Cust_Name, Cust_Address,


Acct_Number, Acct_Type, Date_of_Opening,
Branch_Code, Branch_Street, Branch_City,
Branch_Status, Rate_of_Interest, Balance,
Date, Time)

Make following assumptions:-


(i) Each Customer has unique Cust_Id
(ii) Each Account has unique Acct_Number
(iii) Each Brach has unique Branch_Code
(iv) The bank does not open more than one branch in any street of a city.
(v) Rate_of_Interest is determined by Acct_Type and Date_of_Opening
(vi) Balance is determined by Acct_Number, Date &Time.
(vii) Branch_Status is determined by Branch_City.

Now determine the following for schema Bank_Account:-

(a) The set of FDs holding on the schema.


(b) Its Candidate Keys.
(c) Normal Form of the schema. Justify.
(d) Decompose it into BCNF Sub_Schemas.
(e) Using ABU’s Algorithm, verify that the decomposition is a loss-
less-join decomposition.
(f) Is the decomposition dependency-preserving?

8.2. Consider the following Student_Result schema:-

Student_Result (Univ_Roll_No, Class_Roll_No, Semester, Branch,


Section, S_Address, S_DOB, Fathers_Name,
Sub_Code, Sub_Title, Sub_Credits, Marks)

Make Following Assumptions:-

(i) Each Student has unique Univ_Roll_No


(ii) A student can also be uniquely identified by Class_Roll_No,
Semester, Branch and Section.
(iii) Each Subject is uniquely identified by Sub_Code
(iv) Marks are determined by student identity and subject identity.

P S Gill
20
Now determine the following for schema Bank_Account:-

(a) The set of FDs holding on the schema.


(b) Its Candidate Keys.
(c) Normal Form of the schema. Justify.
(d) Decompose it into BCNF Sub_Schemas.
(e) Using ABU’s Algorithm, verify that the decomposition is a loss-
less-join decomposition.
(f) Is the decomposition dependency-preserving?

P S Gill

You might also like