Database Management System
Database Management System
LECTURE NOTES
UNIT - 1
P S Gill
2
CHAPTER 1
INTRODUCTION
USER QUERIES
Query Processing
Software
DBMS
Storage Management
Software
DATABASE
SYSTEM
Schema DATA
Definition
DATABASE
(a) It must represent some real-world aspect; like a college or a company etc.
The aspect represented by the database is called its “Mini-world”.
P S Gill
3
(c) The repository of data must be designed, developed and implemented for a
specific purpose. There must exist an intended group of users, who must have
some pre-conceived applications of the data.
For example, in the college database, sources of information will be students, faculty,
labs etc. The real-world events affecting the information in the database will be
admissions, exams, results & placements etc. The set of intended users will be faculty,
students, admin staff etc.
P S Gill
4
(i) Data Redundancy and Inconsistency:- Since the same information is stored
at multiple places, it causes data-inconsistency problems during updates.
(ii) Difficulty in Accessing of Data Suppose there exists some information in the
files, but the existing set of application programs do not support extraction of that
information. Under such situations, the application programs need to be updated and this
is very inconvenient, time consuming and costly solution.
(iii) Data Isolation The information is scattered over a large number of files,
on a number of stand-alone (not networked) machines, making it very difficult to process
certain queries, which need information to be extracted from multiple locations.
(v) Atomicity Problems: Since the information needed to rollback a transaction may
not be readily available in a file-processing systems, ensuring atomicity of transactions
will be difficult.
(vii) Security Problems: Since the information is scattered and does not have
centralized access path, effective enforcing of user access rights will not be fail-proof.
(a) Data Dictionary A Database System will support a Data Dictionary (or Data
Directory or DBMS Catalog), which contains information like Data Types,
P S Gill
5
(e) Concurrency control DBMS will support concurrency control tools for
permitting multiple users or application programs to access the database
concurrently, while preserving the consistency of database.
P S Gill
6
DATA MODELS
The Object Based Logical Models view the universe as a collection of objects.
(i) Entity-Relationship Model.
- An Entity will have a set of properties, known as Attributes; for example, the
Entity “Account” may have attributes like “Account-Number”, “Current-
Balance” etc
- Each attribute will have a set of permitted values, called its Domain; for
example the domain of Balance of an account can be the set of +ve real
numbers.
P S Gill
7
- A set of Relationships of the same kind, having the same set of attributes is
called a Relationship Set.
- E-R Model also specifies certain constraints, like Mapping Cardinalities i.e.
whether the relationship is one-to-one, one-to-many, many-to-one or many-to-
many.
- The E-R Diagram below depicts two Entity Sets “STUDENT”, “COURSE”
and a relationship set “RESULT” indicating the marks obtained by students in
different Courses.
S_Name
Sub_Code
S-Address
Roll_No
Marks Sub_Title
(ii) Object-Oriented Model. Like the E-R Model, this model also
models a database as a Collection of Objects. An Object Body
encapsulates Data (Variables) as well as Methods (Functions) to
manipulate the Data (Variables). The Objects that contain same Type of
Data Variables and same Type of Functions are grouped together as a
Class. Thus, a Class may be viewed as a Type Definition of the Objects.
The only way an Object “A” can access the Data Items of another Object
“B” is by invoking the Methods of “B”. “A” can accomplish this by
making calls to the methods of “B”, through B’s Interface. The methods
defined within an object are made visible to the external world, through its
Interface.
P S Gill
8
Variables
Functions
Interface
OBJECT
(b) Record Based Logical Models. These models describe data at the Logical
level, as a collection of fixed-format Records of different types. Each Record Type can
have a fixed number of Fields (or Attributes) and each Field is usually of fixed length.
Use of fixed-length Records simplifies the Physical Level implementation of a database.
The most widely used Record Based Logical Models are:-
(i) Hierarchical Model. This is one of the oldest models, dating back to
1960s. The first commercial DBMS, based on this model, was “Information
Management System” (IMS), released by IBM in 1966. At one time, it was the
most used DBMS. In the Hierarchical Model, the Data is represented as Records;
and the Records are organized as a collection of Trees. The relationships among
the data are represented by Links, which can be viewed as pointers. The tree
structure permits that each record can have only one parent record. Thus, it
permits modeling of only one-to-many relationship (not many-to-many
relationship) amongst the Records.
Course
Offered By Attended By
Student
Teacher
P S Gill
9
HIERARCHICAL MODEL
It does not indicate the relationships “ What are the courses being offered
by a faculty”, “What are courses being attended by a student”, “who are the
students being taught by a faculty” and “who are the faculty teaching a student”.
This is due the limitation of tree structure that a node can have only one parent
node; and thus we can represent only one-to-many relationship but not many-to-
many relationship.
(ii) Network Model. Like the Hierarchical Model, this Model also
models a database as a collection of Records; and the Records are organized as a
collection of arbitrary graphs (or Networks). Thus a Record can have any number
of parent records; and thus supports many-to-many relationship amongst records.
The relationships among the records are represented as links (pointers). Since, this
Model supports many-to-many relationship amongst the records, it is considered
more versatile as compared to Hierarchical Model.
Course
Offered By Attends
Offers Attended By
Teaches
Teacher Student
Taught By
NETWORK MODEL
(iii) Relational Model. This is most modern and most commonly used
model amongst the Record Based Models. It has been widely accepted. The
Relational Model models a database as a collection of Tables to represent both
data and the relationships amongst the data. Each Table is called a Relation,
which is assigned a unique name. Each relation has a number of Columns,
representing the Fields (or Attributes) of the relation. Each Field is also uniquely
P S Gill
10
named. A Relation (or Table) can have an un-limited number of Rows and each
Row represents an Instance of the Relation. A Row is also termed as a Tuple.
Each Tuple will be unique in a Relation. So, a Relation can be viewed as a set of
Tuples of the same type. The relationships amongst the tables will be modeled as
Foreign Key- Primary Key Relationships.
STUDENT
Roll_No S_Name Branch Semester Section S_Address
COURSE
Sub_Code Sub_Title Semester Branch Contact_Hrs
TEACHER
Fac_Code Fac_Name Desig Dept Fac_Address
COURSE-TEACHER
Sub_Code Fac_Code
COURSE-STUDENT
Sub_Code Roll_No
TEACHER-STUDENT
Fac_Code Roll_No
P S Gill
11
Instance. It refers to the actual collection of data (a Snapshot of data) existing in the
database at a particular moment of time. Since, a database will continuously experience
insertion of new data, deletion of defunct data and update of changed data, the Instance
will be under continuous change.
There are three levels of data abstraction in a database; and each level is described by a
schema as explained below:-
(a) Physical Level. This is the lowest level of abstraction. At this level, a
Physical Schema describes “how data is physically stored”. The Physical Schema may
describe complex structures, used to store the data, with the sole aim of achieving an
efficient access of the data.
(b) Logical Level. This is the intermediate level of abstraction. At this level, a
Logical Schema (or Conceptual Schema) would describe “what data is stored in the
database” and “what are the relationships amongst the data”. This Schema is used by
Database Administrators, who decide what information is to be kept in the Database. It
would describe the logical structure of database, data types and integrity constraints. As
compared to Physical Level, Database at Logical Level is described by relatively smaller
number of simpler structures. But, the implementation of these simple structures may be
quite complex at the Physical Level. The user operating at Logical Level need not be
aware of the complexities at the Physical Level.
(c ) View Level. This is the highest level of abstraction. At this level, there will be
many Views, defined for different categories of users. A View for a certain group of
users describes “what subset of the database is to be made visible” to that group. A view
will describe only a subset of the underlying database. This is the subset, which the
intended group of users needs to access. There may be many Views, tailored to the
specific needs of various users. At the view level, the main goal is to provide an efficient
and a user-friendly human-interaction with the system. So, the interface at this level is
P S Gill
12
made as simple and user-friendly as possible. A user doesn’t have to be aware of the
complexities at the conceptual level and physical level.
DATA INDEPENDENCE
The ability of a DBMS to modify its Schema definition at one level, without affecting a
Schema definition at the next higher level, is called Data Independence. There are two
levels of Data Independence:-
(b) Logical Data Independence. This refers to the ability of DBMS to modify the
Logical Schema without causing any changes in the application programs at the view
level. Modifications at Logical Level are necessitated by need to alter the Logical
Structure of the database. The Logical Data Independence is much more difficult to
achieve than the Physical Data independence, since the application programs are heavily
dependent on the logical structure of the database.
DATABASE LANGUAGES
A DBMS will support two kinds of languages; one called Data Definition Language
(DDL) to specify the Database Schema and the other called Data Manipulation Language
(DML) to enable accessing and manipulation of the data stored in the database.
The storage structure and access methods used by the database system are
specified by a set of definitions in a special type of DDL called Data Storage and
Definition Language. The result of interpretation of these definitions will be a set of
physical schema structures and a set of access methods supported by the system. These
details are usually hidden from the database-users.
(b) DML. A DML is a language that enables users to access and manipulate the data
stored in the database. A DML query is a statement specifying information to be accessed
P S Gill
13
Non-procedural DMLs are easier to learn and to use than the procedural DMLs.
However, since non-Procedural DMLs do not specify “how to get the data”, the
queries in Non-Procedural DMLs may not generate as efficient code as the
equivalent queries in Procedural DMLs. This limitation of Non-Procedural DMLs
is overcome by performing query optimization at the System Level.
P S Gill
14
Users
Unskilled Application DML DBA
Users Programmers Users
Buffer Authorization
Manager & Integrity Transaction
Manager Manager
File Manager
Storage Manager
P S Gill
15
DBA is the custodian of the Database System placed under his control and is responsible
for the following functions:-
1. Creation of Conceptual Schema and its periodic update to adapt to the changed
requirements.
3. Liaise with the Users to ensure that the information required by the Users is made
available.
4. Ensure system security, through Grant and Revoke of Access Rights to the Users.
A user must have only as much rights as required by his role in the organization- nothing
more, nothing less.
9. Ensure sufficient Disk Space is always available. If needed, upgrade the Disk
Drives to meet the increased requirements.
10. To liaise with the DBMS vendor to obtain necessary technical supports and to
obtain the necessary tools & software upgrades, whenever made available by the vendor.
In a traditional file system, each user defines & implements the files needed for a specific
application, as a part of programming the application itself. Multiple users of the same set
of data will create replicated sets of files, specific to their respective applications. This
redundancy in defining & storage of data results in higher storage costs and database
inconsistencies during updates. On the other hand, in a database approach, a single
repository of data is maintained, which is defined once and then accessed by various
users of the data.
P S Gill
16
(ii) Data Abstraction In a traditional file processing system, the structure of the
data files is hard coded in the application programs; thus any changes in
structure would need the related application programs to be modified
accordingly. Whereas in a Database System, the application programs are
insulated from the data stored in the database. The application programs are
only concerned with ‘what data’ is stored in the database and not concerned
with ‘how the data is stored’. As long as the contents of data remain
unchanged, the database structure can be changed, without affecting the
existing application programs. This feature is called Data Abstraction.
(v) Effective System Protection through grant of Access Rights Access Rights
are granted to the users, to the extent required for their roles in the
organization. These rights are stored in the data dictionary itself. When a
query is to be processed, the DBMS will first ensure that the user submitting
P S Gill
17
the query has sufficient rights for the processing of that query; only then the
query is processed.
(vi) Support for efficient Recovery. When a system is restarted after a failure,
log-based recovery recovers the database efficiently.
(b) Restricting unauthorized access The user access rights are stored in
the data dictionary. Whenever, any query is received from any user, it is
checked for valid access rights. If access rights exist, the query is processed
else it is rejected as ‘Invalid Query’. This prevents unauthorized access of
data.
(f) Providing backup & recovery A DBMS supports data backup & recovery
in case of failures.
P S Gill
18
P S Gill
19
Exercises
Ex.1.1 Explain three level of data abstraction. Distinguish between Physical Data
Independence and Logical Data Independence. Which is more difficult to achieve and
why?
Ex.1.3 Compare the three data models: Hierarchical, Network and Relational.
What are the distinguishing features of Relational Model that make it so popular?
Ex.1.5 What is the roe of a Data Dictionary in DBMS? How does this feature
make the DBMS independent of the underlying database?
P S Gill
20
CHAPTER 2
ENTITY-RELATIONSHIP MODELING
The Entity Relationship Model (ER Model) models the real world situations as a
collection of entities and relationships amongst the entities.
Entity Set An Entity-Set refers to a collection of entities of the same kind. Each
entity in an Entity-Set will have the same set of attributes and the set of attributes will
distinguish it from other Entity Sets. No other entity set will have exactly the same set of
attributes. Some of the attributes of an entity set may overlap with other entity sets.
Domain of an Attribute
Each attribute has a set of permitted values called its domain or value set, like the
attribute ‘NAME’ may have a domain that is set strings of characters of specified
maximum length.
P S Gill
21
Attribute Types:-
(e) If value is applicable, but not specified; like TEL#- an employee may not
be owning a Telephone.
(f) If value is applicable and specified but not known to the agency entering
the information; like an employee may be owning a Telephone but the
number may not be known to the organization.
Null value can only be assigned to an Attribute, if assigning value to that attribute
is optional (not mandatory). The Mandatory attributes cannot be assigned a
“Null” value.
P S Gill
22
RELATIONSHIP constraints
- Mapping Cardinalities
- Participation Constraint
Mapping Cardinalities. For a binary relationship set R between entity sets A and B,
the mapping cardinalities can be on of the following:-
R
A B
P S Gill
23
R
A B
R
A B
R
A B
Example:-
DEPOSITOR
CUSTOMER ACCOUNT
(One-to-One Relationship)
DEPOSITOR
CUSTOMER ACCOUNT
(One-to-Many Relationship)
P S Gill
24
DEPOSITOR
CUSTOMER ACCOUNT
(Many-to-One Relationship)
DEPOSITOR
CUSTOMER ACCOUNT
(Many-to-Many Relationship)
- Total Participation
- Partial Participation
Total Participation
An Entity Set E is said to have total participation in relationship set R if each entity in E
is participating at least in one relationship through R. In E-R Diagram, the Total
Participation is represented by a “Double Line” drawn between the Entity Set symbol and
the Relationship Set symbol.
Partial Participation
An Entity Set E is said to have partial participation in relationship set R if some of the
entities in E are not participating in any relationship through R. In E-R Diagram, the
Partial Participation is represented by a “Single Line” drawn between the Entity Set
symbol and the Relationship Set symbol.
P S Gill
25
BORROWER
Total Participation
LOAN
Concept of Key
Super Key. A Super Key of an Entity Set or Relationship Set refers to the set of
attributes, which when taken collectively, will uniquely determine an entity within the
Entity Set or a Relationship within the Relationship Set. If K forms a Super Key (SK) of
an Entity Set E then any super set of K will also be a Super Key of E. So, a Super Key
may have some extraneous (unnecessary) attributes, which if removed, the balance set
may still form a Super Key of R.
Example :- Suppose each student in the Entity Set STUDENT (ROLL_NO, NAME,
BRANCH, FATHERS-NAME, ADDRESS, DOB, TEL-NO) has a unique value of
ROLL-NO. This implies that no two students can have same ROLL-NO. Then {ROLL-
NO, NAME} forms a super key of Entity-Set STUDENT. In this, the attribute NAME is
extraneous; which if removed, the balance set i.e. {ROLL-NO} still forms a Super Key of
STUDENT.
Candidate Key. A Super Key, whose no proper subset forms a Super Key, is called
a Candidate Key. Thus, Candidate Key is a minimal Super Key (i.e. a Super Key having
no extraneous attributes). An Entity Set may have more than one Candidate Keys.
Example:- The Entity Set STUDENT will have at least two Candidate Keys i.e.
{ROLL-NO} and {NAME, FATHERS-NAME, DOB, ADDRESS}.
Primary Key. Primary Key is one of the Candidate Keys that is designated by the
database designers as primary means of identifying entities within an entity set. In the E-
R Diagram, the Primary Key Attributes are underlined with a firm line.
P S Gill
26
Let R be a binary relationship set between Entity Sets E 1 and E2. Let K1 and K2 be the
respective Primary Keys of E1 and E2. Then the Primary Key of Relationship Set R will
depend upon the cardinality mapping of the relationship set, as explained below:-
DEPOSITOR
CUSTOMER ACCOUNT
(One-to-One Relationship)
PK (DEPOSITOR) = CN or AN
CN AN
(ii)
DEPOSITOR
CUSTOMER ACCOUNT
(One-to-One Relationship)
P S Gill
27
CN AN
(iii)
DEPOSITOR
CUSTOMER ACCOUNT
(One-to-One Relationship)
CN AN
(iv)
DEPOSITOR
CUSTOMER ACCOUNT
(One-to-One Relationship)
An Entity Set is said to be a Weak Entity Set if it does not have sufficient attributes to
form its Primary Key. On the other hand, an entity set having a primary key of its own is
called a Strong Entity Set. A Weak Entity Set (say E 2) will be dependent for its existence
on a Strong Entity Set (say E 1) to form its Candidate Key. Then Entity Set E2 is said to be
“Existence-Dependent” on E1 and E1 is said to be the “Owner Entity Set” of E 2. The
relationship R between E2 and E1 is called “Identifying Relationship”. The Weak Entity
Set E2 will have a set of attributes called its “Discriminator”, which together with the
Primary Key of E1 will form the Primary Key of E2.
E1
E2
P S Gill
28
EMP-ID
DOB
SALARY
EMPLOYEE DEPENDENT
- The Weak Entity Set DEPENDENT is Existence Dependent on the Strong Entity
Set EMPLOYEE.
(i)The Identifying Relationship will be one-to-many from Owner Entity Set to Weak
Entity Set.
(ii) The Participation of Owner Entity Set in the Identifying Relationship will be
partial and the participation of the Weak Entity Set in the Identifying Relationship
will be Total.
In the above example, the Weak Entity Set DEPENDENT can also be modeled as a
multi-valued attribute of Entity Set EMPLOYEE. The multi-valued attribute can be used
to indicate the names of the dependents of employees. But suppose we want to indicate
other parameters of dependents like dependent’s relationship with the employee then the
multi-valued approach will not be suitable. In this case, the Weak Entity approach will be
the ideal choice, since then the weak entity set DEPENDENT can have any number of
attributes.
P S Gill
29
A2 ISA C2
C1
A1 B1
En
E1 E2
In the above example, an Entity Set E has been specialized into Sub-groups designated as
E1 , E2 ….. En. E is called “Super Class” or “Higher Level Entity Set” and the entity sets
E1 , E2 ….. En are called “Sub Classes” or “Lower Level Entity Sets” of E. The common
attributes of all sub entity sets are represented with the super entity sets. And the distinct
attributes of each sub entity set are represented with the sub entity set.
The relationship of Higher Level Entity Set with its Lower Level Entity Sets is called
ISA relationship. It is read as “is a”.
Each Sub Class will inherit the Attributes of its Super Class; plus it will have its own
distinct Attributes. Like in the above case, each lower entity set will inherit attributes A1
and A2 of the Super Class E.
P S Gill
30
Account-Number Balance
ACCOUNT Mat-Date
Int-Rate Installment
Interest-Rate
ISA
RD
Over-Draft
SAVINGS-
ACCOUNT Int-Rate
Mat-Date
CURRENT-
ACCOUNT
FD
Specialization Constraints
Disjoint. It implies that an entity does not belong to more than one lower-
level entity set i.e. an account is either savings-account or current-account but not
both.
Total Each higher level entity must belong to a lower-level entity set.
Partial. Some higher-level entities may not belong to any lower-level entity set.
P S Gill
31
A1 A2 B1 B2
E1 R1 E2
R2
E3
C1 C2 C3
Here, the Relationship Set R1 between Entity Set E1 and Entity Set E2 has been
aggregated as Higher Level Entity Set “R1”. This Higher Level Entity Set is participating
in a Relationship R2 with Entity Set E3. Thus, through aggregation, we are able to
represent a Relationship between Relationship Set R1 and Entity Set E3.
P S Gill
32
If we represent this scenario without use of aggregation, then the E-R Diagram will be as
follows:-
BRANCH
EMPLOYEE
EBJ
JOB
EM BM
JM
MANAGER
The above Scenario can be better modeled by aggregating the Relationship Set “EBJ” a a
higher level Entity Set and the creating a relationship between this higher level entity set
and the Entity Set “MANAGER”, as indicated below:-
BRANCH
EMPLOYEE JOB
EBJ
EBJM
MANAGER
P S Gill
33
This modeling represents the situation more realistically, wherein the Relationship Set
“EBJM” indicates “which combinations of employee-branch-job” are being managed by
each manager.
(a) Tabular representation of a Strong Entity Set. A Strong Entity Set E will be
represented by a Table named “E”. The Table will have columns as follows:-
Let E be a Strong Entity Set with simple single-valued attributes a1,a2,……,an. This
Entity Set will be represented by a Table called E with n distinct columns, each of which
will correspond to one of the attributes. Let D1,D2,…Dn be the domains of attributes
a1,a2,….,an respectively. The Table E will comprise of a set of rows, which will be a
subset of the Cartesian Product D1 X D2 X…….Dn.
P S Gill
34
Example Age
DOB Tel_No
Name
Univ_Roll_No City
Street
H-No Pin
STUDENT
Address
The derived attribute Age will not be represented in the STUDENT table. When required,
its value will be derived from DOB.
STUDENT
Univ_Roll_No Name DOB H-No Street City Pin
STUDENT-TEL-NO
Univ_Roll_No Tel_No
P S Gill
35
Example
C-Address
Account-No Branch-Name
C-Id
Balance
C-Name
ACCOUNT
CUSTOMER DEPOSITOR
CUSTOMER
C-Id C-Name C-address
ACCOUNT
Account-Number Balance Branch-Name
DEPOSITOR
C-Id Account-Number Date-of-Operation
P S Gill
36
Example:-
Date-of-Operation
C-Address
Account-No Branch-Name
C-Id
Balance
C-Name
ACCOUNT
CUSTOMER DEPOSITOR
ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26
A-305 50000 CP
A-310 25000 RKP
P S Gill
37
DEPOSITOR
C-Id Account-Number Date-of-Operation
C-001 A-310 10-Jan-2007
C-220 A-101 23-Dec-2006
C-310 A-203 03-Feb-2007
C-505 A-305 27-Dec-2007
As obvious, the rows in DEPOSITOR table are having one-to-one mapping with the rows
in the CUSTOMER Table and also with the rows in the ACCOUNT Table. That is, the
first row of DEPOSITOR maps onto the fourth row of ACCOUNT, the second row of
DEPOSITOR maps onto the first row of ACCOUNT, the third row of DEPOSITOR
maps onto the second row of ACCOUNT and the last row of DEPOSITOR maps onto the
third row of ACCOUNT. Thus, the descriptive attribute Date-Of-Operation of the
Relationship Set DEPOSITOR can be shifted to either CUSTOMER or ACCOUNT.
Also, the DEPOSITOR Table can be combined either with the CUSTOMER Table or
with the ACCOUNT Table, without losing any information. The combined table will
have union of the columns of the two merged tables. Suppose, DEPOSITOR Table is
merged with the CUSTOMER Table, then the CUSTOMER Table will also include
attributes Account_Number and Date_Of_Operation . The resulting set of tables will then
be:-
CUSTOMER
C-Id C-Name C-address Account- Date-of-
Number Operation
C-001 Ajay 320, Sector-26, Noida A-310 10-Jan-2007
C-220 Vijay 110,Sector-8, RKP A-101 23-Dec-2006
C-310 Ram 120,Sector-25, Noida A-203 03-Feb-2007
C-505 Shyam 303,Sector-22,RKP A-305 27-Dec-2007
ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26
A-305 50000 CP
A-310 25000 RKP
The combined CUSTOMER Table now includes the Primary Key (AN) of ACCOUNT
and descriptive attribute Date_Of_Operation of DEPOSITOR.
P S Gill
38
Date-of-Operation
Example
C-Address
Account-No Branch-Name
C-Id
Balance
C-Name
ACCOUNT
CUSTOMER DEPOSITOR
CUSTOMER
C-Id C-Name C-address
C-001 Ajay 320, Sector-26, Noida
C-220 Vijay 110,Sector-8, RKP
C-310 Ram 120,Sector-25, Noida
C-505 Shyam 303,Sector-22,RKP
ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26
A-305 50000 CP
A-310 25000 RKP
A-550 35000 CP
A-670 60000 Sec-18
DEPOSITOR
C-Id Account-Number Date-of-Operation
C-001 A-310 10-Jan-2007
C-220 A-101 23-Dec-2006
C-310 A-203 03-Feb-2007
C-505 A-305 27-Dec-2007
C-101 A-550 22-Dec-2006
C-310 A-670 01-Jan-2007
The rows in the DEPOSITOR table have one-to-one mapping onto the rows in
ACCOUNT Table i.e. with the “Many-Side Entity Set” Table. That is, the first row of
DEPOSITOR maps onto the fourth row of ACCOUNT, the second row of DEPOSITOR
maps onto the first row of ACCOUNT, the third row of DEPOSITOR maps onto the
second row of ACCOUNT, the fourth row of DEPOSITOR maps onto the third row of
ACCOUNT, the fifth row of DEPOSITOR maps onto the fifth row of ACCOUNT and
the last row of DEPOSITOR maps onto the last row of ACCOUNT table. Thus, the
P S Gill
39
CUSTOMER
C-Id C-Name C-address
C-001 Ajay 320, Sector-26, Noida
C-220 Vijay 110,Sector-8, RKP
C-310 Ram 120,Sector-25, Noida
C-505 Shyam 303,Sector-22,RKP
ACCOUNT
Account- Balance Branch-Name Customer_Id Date_of_Operation
Number
A-101 10000 Sec-18 C-220 23-Dec-2006
A-203 30000 Sec-26 C-310 03-Feb-2007
A-305 50000 CP C-505 27-Dec-2007
A-310 25000 RKP C-101 10-Jan-2007
A-550 35000 CP C-101 22-Dec-2006
A-670 60000 Sec-18 C-310 01-Jan-2007
Date-of-Operation
Example
C-Address
Account-No Branch-Name
C-Id
Balance
C-Name
ACCOUNT
CUSTOMER DEPOSITOR
CUSTOMER
C-Id C-Name C-address
C-001 Ajay 320, Sector-26, Noida
C-220 Vijay 110,Sector-8, RKP
P S Gill
40
ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26
DEPOSITOR
C-Id Account-Number Date-of-Operation
C-001 A-101 10-Jan-2007
C-220 A-203 23-Dec-2006
C-310 A-101 03-Feb-2007
C-505 A-203 27-Dec-2007
The rows in the DEPOSITOR table have one-to-one mapping onto the rows in
CUSTOMER Table i.e. with the “Many-Side Entity Set” Table. Thus, the descriptive
attributes of DEPOSITOR can be shifted to “Many-Side” Entity Set CUSTOMER and
the DEPOSITOR Table can be with the CUSTOMER Table, without losing any
information. The resultant CUSTOMER table will also include the Primary Key
Account_Number of ACCOUNT table and descriptive attribute DOO of the
DEPOSITOR table. The resulting set of tables will then be:-
CUSTOMER
C-Id C-Name C-address Account_Number DOO
C-001 Ajay 320, Sector-26, Noida A-101 10-Jan-2007
C-220 Vijay 110,Sector-8, RKP A-203 23-Dec-2006
C-310 Ram 120,Sector-25, Noida A-101 03-Feb-2007
C-505 Shyam 303,Sector-22, RKP A-203 27-Dec-2007
ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26
P S Gill
41
Date-of-Operation
Example
C-Address
Account-No Branch-Name
C-Id
Balance
C-Name
ACCOUNT
CUSTOMER DEPOSITOR
CUSTOMER
C-Id C-Name C-address
C-001 Ajay 320, Sector-26, Noida
C-220 Vijay 110,Sector-8, RKP
C-310 Ram 120,Sector-25, Noida
C-505 Shyam 303,Sector-22, RKP
ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26
A-305 50000 CP
A-310 25000 RKP
DEPOSITOR
C-Id Account-Number Date-of-Operation
C-001 A-101 10-Jan-2007
C-220 A-203 23-Dec-2006
C-310 A-101 03-Feb-2007
C-505 A-203 27-Dec-2007
C-101 A-305 30-Dec-2007
C-505 A-310 02-Jan-2007
Now, the rows in the DEPOSITOR table do not have one-to-one mapping with
CUSTOMER table and also with the ACCOUNT table. So, the DEPOSITOR table can
neither be merged with CUSTOMER table nor with ACCOUNT table. Thus, there has to
be a separate table for DEPOSITOR as indicated above. Also, the descriptive attributes of
the Relationship Set cannot be shifted to the participating Entity Sets; the descriptive
attributes have to remain with the relationship set itself.
P S Gill
42
(c) Tabular representation of Weak Entity Sets. Let A be a Weak Entity Set
with descriptive Attributes a1,a2,……,am. Let B be the Strong Entity Set on which A is
existence dependent. Let the primary key of B consist of attributes b1,b2,….bn. The
Entity Set A is represented by a Table called A with (m+n) columns, each column
representing one of the attributes from the set {a1,a2,……am} U {b1,b2,….bn}.
Example: Payment-Date
Payment-No
Amount
Loan-No Installment
There will be Tables LOAN and PAYMENT; the PAYMENT table will also include the
Primary Key of Loan i.e. Loan-No. The Primary Key of table PAYMENT will be
{Loan-No, Payment-No} where the attribute Payment-No is called a “Discriminator” or
“Partial Key” of the table PAYMENT.
Create a Table each for the higher-level entity set and for each lower-level entity
set. The table for lower-level entity set will include its own attributes plus all the
Primary-Key attributes of its higher-level entity set.
P S Gill
43
Account-Number Balance
ACCOUNT Mat-Date
Int-Rate Installment
Interest-Rate
ISA
RD
Over-Draft
SAVINGS-
ACCOUNT Int-Rate
Mat-Date
CURRENT-
ACCOUNT
FD
For example, in the above case there will five tables i.e. ACCOUNT, SAVINGS-
ACCOUNT, CURRENT-ACCOUNT, FD and RD. The table ACCOUNT will have
columns Account-Number and Balance; and table SAVINGS-ACCOUNT will have
columns Account-Number and Interest Rate; and table CURRENT-RATE will have
columns Account-Number and Over-Draft. Same is applicable to the tables FD and RD.
P S Gill
44
B-Name
E-Name B#
E#
BRANCH J#
EMPLOYEE JOB
EBJ
Mgr-Id EBJM
MANAGER
In the above scenario, there will be tables for Entity Sets EMPLOYEE, BRANCH, JOB
and MANAGER. There will be one table for Relationship Set EBJM having Attributes
E#, B#, J# and Mgr-Id. No table is required for the Relationship Set EBJ because this
table would be a subset of table EBJM.
EBJM
E# B# J# Mgr-Id
P S Gill
45
AIRCRAFT
TO_PLACE
FLT_NO ATA
ETA
DESIGNATION
FLT_
CREW
CREW
C_DATE
CONFIRMED
CANCELLATION RESERVATION
SEAT_NO
TICKET_NO
ISSUE_DATE
AMOUNT
VOUCHER_
NO
FARE
TICKET
REFUND
P_ADDRESS
P_TEL_NO
P_NAME
PASSENGER
P S Gill
46
The above E-R Diagram can be reduced to the following set of tables:-
The SEAT_NO will get defined only after a passenger checks in for a flight.
P S Gill
47
O_ADDRESS
O_TEL_NO
O-NAME
OWNER PREMIUM
COLOR EXPIRY_DATE
MODEL
REG_NO
BONUS
POLICY_NO
MAKE
VEHICLE INSURANCE_POLICY
P_DATE
A_DATE
A_REPORT_NO PAYMENT_
VOUCHER_
NO P_AMOUNT
PLACE
ACCIDENT CLAIM_PAYMENT
S_REPORT_NO
ASSESSED
_DAMAGE
SURVEYOR REPORT
REPAIR_ITEM
COST
REF_NO
REPAIRS
P S Gill
48
Exercises
Ex.2.3 Explain the concept of Super Keys, Candidate Keys and Primary Keys of
an Entity Set. Explain the determination of Primary Key of a binary Relationship set.
How is it influenced by the Cardinality Mapping of the Entity Sets participating in the
Relationship Set?
Ex.2.4 Explain the concept Weak Entity Sets and Identifying Relationships.
Explain how a multi-valued attribute can be better modeled as a Weak Entity Set.
Ex.2.5 Explain the concepts of Specialization and Generalization. What are the
different types constraints involved in specialization? Distinguish between Total &
Partial Specialization and between Disjoint & Overlapping Specialization.
Ex.2.7 Draw E-R diagrams to indicate the following relationships between entity set
Operator and entity set Machine:-
(a) Each Machine can be operated by many Operators but each Operator can
operate only one machine.
(b) An operator can operate many machine and each machine can be operated
by many Operators.
Ex.2.7 Make E-R Diagrams for the following real-world situations (Indicate clearly
entity sets, relationship sets, cardinalities, attributes and candidate Keys. Also indicate
any weak entity sets, specialization, generalization and aggregation etc.) Also reduce the
E-R diagrams to Tables:-
P S Gill
49
[1] Wang, H., Naghavi, M., Allen, C., Barber, R.M., Bhutta, Z.A., Carter, A., Casey,
D.C., Charlson, F.J., Chen, A.Z., Coates, M.M. and Coggeshall, M., 2016. Global,
regional, and national life expectancy, all-cause mortality, and cause-specific mortality
for 249 causes of death, 1980–2015: a systematic analysis for the Global Burden of
Disease Study 2015. The lancet, 388(10053), pp.1459-1544.
P S Gill
1
LECTURE NOTES
UNIT - 2
CHAPTER 3
Relation Schema
A Relation Schema refers to the structure of a Table. It indicates the Name of the
Relation Schema and the Names its Attributes (represented by Columns in the
table). For example R (A1, A2, ……, An) represents a Relation Schema named
“R” having n columns, representing the Attributes A 1,A2, …. An.
- One value that is member of all domains is null, which implies that the
value is either unknown or non-existent; for example suppose an entity has
attribute telephone-number. It may have null value if an entity does not have a
telephone number or if the entity has telephone number, but number is not
known.
Let r(R) = {t1, t2 , ……, tm} where t1, t2 , ……, tm are the tuples in the relation.
The n-values in a tuple will represent an entity or a relationship. Thus, all the n
values in the tuple will be related to each other. That is why the resulting
database is called a “relational database”.
Note that a tuple is an Ordered List (not a Set) of n values. That is why the
order of the values in the tuple matters. A value V i (1 < I < n) in the tuple t will be
from the Domain of attribute Ai.
A Relation can also be defined as a subset of the Cartesian Product of all the
Domains of its Attributes.
i.e. r (R) Domain (A1) x Domain (A2) X …… X Domain (An)
There can be more than one Relations (Tables) defined on the same Schema,
For example r1(R) and r2(R) are two independent relations defined on the same
Schema R. Both will have same Degree but may have different cardinalities.
The Cardinality of a relation will vary from moment to moment; but degree will
change very rarely.
Some Notations
Since a relation is defined as a set of tuples, the notation t r implies that tuple t
is in relation r
The Super Key of a Relation Schema R refers to the set of attributes K (K R),
which when taken collectively, will uniquely identify each tuple in a relation r(R).
The superset of a super-key will also be a super-key. Thus, a super-key may
have some extraneous attributes, without which the balance set still remains a
super-key. Such extraneous attributes, which need not be there in a super-key,
can be eliminated from the key.
In the above schemas, the relation schema “STUDENT” has the following
Candidate Keys:-
{Univ_Roll_No}
{Class_Roll_No, Year, Branch, Section}
{S_Name, DOB, Father_Name, Address} Assuming that twins will not
have same name
And Relation Schema “SUBJECT” has {Sub_Code} as its Candidate Key and
Relation Schema “RESULT” has {Roll_No, Sub_Code} as its candidate key.
Since the other two relation schemas have only one candidate key each,
the same will be designated as their respective Primary Keys.
Composite Key A Key, that includes more than one attributes, is called a
Composite Key.
Foreign Key A relation schema R may include among its attributes some primary
key of another schema S. This is called a Foreign Key in schema R.
The following three types of Integrity Constraints are applicable to all Relational
Databases:-
value clause or by “NOT NULL” clause. Various data types are NUMBER,
CHAR, VARCHAR, DATE etc.
For each attribute, the permitted domain is specified in the schema definition;
for example the domain of “Year” is the set of Integers.
Suppose a relation r(R) with Key {A,B} has the following state:-
r(R)
A B C
A1 b1 c3
A2 b1 c1
A3 b2 c5
A1 b2 c7
A3 b2 c1
This relation state is invalid, since there are two tuples with value of {A,B}
equaling {a3, b2}. One of these two tuples must be deleted, only then it will be a
valid relation.
Let r(R) and s(S) be two relations and let (s) be a Foreign Key
(FK) in S referencing Primary Key “K” of R. The relation s(S) would be
legal (valid) only if for each tuple ts s, there exists a tuple tr r such
that tr [K] = ts []. A tuple ts s that does not satisfy this condition will
be called a “Dangling Tuple”, in the sense, that such tuples do not
have necessary support from relation r. Such Dangling Tuples are
invalid; and DBMS has to ensure that such tuples do not exist in the
database.
subject
Sub_Code Title
TCS-401 Comp Org
TCS-402 DBMS
TCS-301 Data Structure
result
Roll_No Sub_Code Semester Marks
0209130010 TCS-301 3 66
0209130010 TCS-401 4 57
0209130010 TCS-403 4 78
1. Strong Entity Set. The Primary key of the entity set forms the primary
key of the relation.
2. Weak Entity Set. Primary key of the relation comprises the union of the
primary key of the strong entity set on which the weak entity set is existence-
dependent and the discriminator of the weak entity set.
3. Relationship Set. The union of the primary keys of the related entity
sets becomes a super key of the relation. If the relationship is many to many,
then this super key is also the primary key of the relation. If the relationship is
many-to-one, then primary key of the “many-side” entity set is the primary key of
the relation. If the relationship is one-to-one, then primary key of any of the
related entity sets can be the primary key of the relation.
“Many-Side” Entity Set. For example, the table for ACCOUNT also contains the
key of Entity Set BRANCH.
For a one-to-one relationship set, the relationship table can be combined with the
table of any of the participating entity sets.
Entity Sets:-
Relationship Sets:-
SUBJECT STUDENT
RESULT
SUB-
OFFERED
Dept_Name
HOD
Fac_Code
Fac_Name
FACULTY DEPT
FAC-DEPT
Relational Database:-
Fac_Code
Semester S_Addr
Marks
Dept_Name HOD
Fac_Code
Assignment: 2
CHAPTER 4
RELATIONAL ALGEBRA
Basic Operations
1. Select ( )
2. Project ()
3. Set Union ()
4. Set Difference (-)
5. Cartesian Product (X)
6. Rename ()
Additional Operations
Extended RA Operations
1. Select ( ) The Select operation P (r) selects those tuples from relation
r, which satisfy predicate P.
Emp
E# E_Name E_City E_Street Salary D#
Dept
D# D_Name D_City Total_Sal
E# E_City E_Street
Result:-
D# D_Name
03 Finance
(a) Both r and s must be of same degree i.e. they must have same
number of attributes.
(b) For all i, the domain of ith attribute of r must be same as the domain of
the ith attribute of s.
Deposit
Cust_Name Account_No
Ajay A-101
Vijay A-103
Ram A-107
Loan
Cust_Name Account_No
Vishal L-103
Ram L-102
Query 4: Get the names of those customers, who have either account
or loan in the bank.
Result:-
Cust_Name
Ajay
Vijay
Ram
Vishal
Query 5: Get the names of those customers who have account in the bank,
but do not have a loan.
Result:-
Cust_Name
Ajay
Vijay
The resultant relation will be on schema, that will be concatenation of the two
schemas R and S, expressed as (R,S).
For each tuple tr r and each tuple tss, there will be a tuple t in r x s, such that
t [R] = tr and t [S] = ts.
Query 6:
Deposit X Loan
If the names of the argument relations are not distinct (which is the case when
Cartesian Product of a relation with itself is specified), rename operation, as
explained below, is used to rename one of the arguments.
Query 7:
Result:-
Cust_Name Account_No CN LN
Result:-
Salary
50000
(a) E1 E2
(b) E1 – E2
(c) E1 E2
(d) P (E1) where P is a predicate on the attributes in E1.
(e) S (E1) where S is a list consisting of some of the attributes in E 1.
(f) x (E1) where x is the new name for the result of E1.
The following are additional operations, which do not add any power to the
relational algebra, but simplify common queries.
This operation is called additional, since it can be expressed in terms of the basic
operations, as shown below:-
r s = r – (r-s) i.e. eliminate those tuples from r, which exist in r but not in s.
OR
r s = s – (s-r) i.e. eliminate those tuples from s, which exist in s but not in r.
Query 9: Get the names of those customers who have account as well as
loan in the bank.
Result:-
Cust_Name
Ram
Query 10: Get Cust_Name, Account_No and Loan_No of the customers having
account as well as loan.
Deposit * Loan
Just note, the expression has become extremely user friendly with the use of
“Natural Join” operation.
The result, of the expression on the right, is assigned to the variable on the left.
Division () Let r(R) and s(S) be two relations and let S R, that is every
attribute in schema S is also there in schema R. The relation obtained by dividing
relation ‘r’ by relation ‘s’ i.e. r s is a relation on schema R-S (i.e. schema
containing those attributes of R which are not there in S.
A tuple t will appear in r s , if and only if the following two conditions are
satisfied.
1. t R-S(r)
(a) tr[S] = ts
(b) tr[R-S] = t
Let r( R ) and s (S) be two relations such that schema S R. Then, the DIVIDE
operation r s is defined as:-
Let r (R) =
A B C D
a1 b1 c1 d1
a2 b2 c2 d2
a1 b2 c1 d2
And s (S) =
B D
b1 d1
b2 d2
A C
a1 c1
a2 c2
(temp1 x s)
A C B D
a1 c1 b1 d1
a1 c1 b2 d2
a2 c2 b1 d1
a2 c2 b2 d2
R-S,S (r)
A C B D
a1 c1 b1 d1
a2 c2 b2 d2
a1 c1 b2 d2
A C B D
a2 c2 b1 d1
R-S (temp2)
A C
a2 c2
A C
a1 c1
Example: Find the Names of all customers, who have accounts in all
branches of Delhi.
Generalized Projection
It permits arithmetic functions to be used in the projection list.
Ex. customer-name, limit – credit (credit-info) where credit-info is a relation
on the schema(customer-name, limit, credit)
Outer Join In Natural Join, the resultant output relation contains tuples
corresponding to only those tuples of input relations which satisfy the equality
criteria on the values of their common attributes. The information pertaining to
the other tuples of input relation does not appear in the output relation. The Outer
Join operation enables to join such tuples also. There are three types of Outer
Join- Left Outer Join, Right Outer Join and Full Outer Join. The attributes with
missing values in some attributes would contain NULL values in those attributes.
The symbols for the three outer joins are- Left Outer join: ,Right outer join:
and Full Outer Join:
Ex.
Relation customer_residence
Relation bank_account
Natural Join
customer_residence bank_account
Customer_residence bank_account
customer_residence bank_account
Aggregate Functions
Aggregate Functions are the functions which take a collection of values as input
and return a single value as result; like sum, count, avg, min, max.
Ex. G SUM (amount) (loan) – computes the total of all loan amounts
G MAX (amount) (loan) - determines the max amongst loan amounts
Grouping
The following query will compute the total and max of loan amounts at
each branch and list the results branch-wise.
DATABASE MODIFICATION
Deletion
In relational algebra, it is expressed as r r – E where r is a relation and E is
relational-algebra expression.
Insertion
It is expressed as r r U E
Insert a new account for all loan holders of Noida branch, with account-
number same as loan- number and an initial balance of 1000.
Ex. account branch-name, account-number, balance balance*1.06( balance > 10000 (account))
branch-name, account-number, balance balance*1.05( balance < 10000 (account))
Let r1 and r2 be two relations with K as primary key of R 1 and as such that
Foreign Key in R2 referencing K1 in R1.
Insert If a tuple t2 is inserted in r2, the system must ensure that there exists a
tuple t1 r1 such that t1 [K] = t1 [] that is, t2 [] K1(r1).
Delete If a tuple t1 is deleted from r1, the system must compute a set of
tuples in r1 that reference t1, that is set S = K1(r2)
If set S is not empty an empty set, then either the Delete Command should be
rejected as an error or all tuples that reference t 1 (directly or indirectly) must also
be deleted. As obvious, this would result in a cascading delete, since the tuples
relations may reference tuples in r2, that further reference t1 r1.
Update
Case I If a tuple t2 is updated in r2 such that the update effects the attribute set
and t2’ is the modified tuple, the system must ensure that there is a tuple t1 r1
such that t1[K] = t2 [K] i.e. t2 [] K1(r1) must be satisfied.
Case II If a tuple t1 is modified in r1 such that the update effects the primary
key attributes K, the system must compute a set of tuple in r 2 that reference t1,
that is set S = K1(r2).
If this set S is not empty, then either the update command should be rejected as
an error or all tuples that reference t1 (directly or indirectly) must also be
updated. As obvious, this would result in a cascading update since the tuples
may reference tuples that reference t1.
VIEWS
The entire logical schema of a database is not visible to each and every user of
the database. Security considerations may dictate that certain data be hidden
from certain users. Beside the security reasons, the designers may wish to
create user-friendly set of relations, customized to the specific requirements of
different categories of users.
Any relation, that does not form part of the logical schema, but is made visible to
a set of users, as a virtual relation, is called a view. The view is not stored as a
physical table in the database. Only a definition of the view is stored in the data
dictionary.
Once a view has been defined, it can be referenced in queries just like a
physical table.
noida-customer
all-customer
borrower loan
One View may be used in the expression defining another View. A View relation
v1 is said to depend directly on a View relation v2 if v2 is used directly in the
expression defining v1. As shown in the above View Dependency graph, all-
customer is directly dependent on borrower and loan. A View relation v1 is said
to depend on other View relation v2, if a path exists in the View Dependency
Graph from v2 to v1. A View relation v is said to be recursive if it depends on
itself.
Assignment: 3
(v) Get Average Marks, Max Marks, Min Marks and Total Marks in
the Result.
(viii) Get the name(s) of the students scoring highest Average Marks.
(ix) Get the title(s) of the subjects in which students have scored
highest average marks.
(xi) Delete the tuples of result having marks less than 30.
(ii) Get the names of the employees drawing salary between 20000 and
50000.
(iii) Get Min Salary, Max Salary and Average Salary of each
department.
(v) Get the name of each employee along with its manager.
(vii) Get the name(s) of the department with total salary of employees
more than 50000000.
(ix) Get the names of the employees getting salary more than the
average salary of all the employees.
(xii) Department Name “Scrap” has been closed down. Transfer all
employees of this department to “Absorb” department and delete
the information of “Scrap”.
(iii) Who are the suppliers, supplying Part “Clutch Assembly” to Project Name
“Vehicle R&D”?
(iv) Get the names of the suppliers supplying parts to all projects in
“Mumbai”.
(v) Get the names of the suppliers, supplying all the parts, listed in table Part.
(vii) What are the parts, whose total quantity being supplied is larger than the
average quantity being supplied of part name “Ignition Switch”?
101 TCS-401 70
102 TCS-401 80
105 TCS-401 64
110 TCS-401 70
102 TCS-402 92
103 TCS-402 70
105 TCS-402 70
110 TCS-402 68
101 TCS-403 82
102 TCS-403 64
103 TCS-403 72
110 TCS-403 80
Depositor
Cust_Name Account_Number
Ajay A102
Vijay A110
Ram A111
Vikram A112
Borrower
Cust_Name Loan_Number
Vijay L102
Shyam L111
Ram L110
Ajeet L103
CHAPTER5
TUPLE RELATIONAL CALCULUS
Examples
Get information of the loans having loan amount more than 100000.
Find the loan numbers of the loans having amount more than 100000.
Find names of the customers who have a loan from Noida branch.
Find the names of the customers who have account but not loan.
Find names of all customers who have accounts at all branches located in Delhi.
(b) s[x] u[y], where s and u are tuple variables , x is an attribute on which s
is defined, y is an attribute on which u is defined, and is a comparison
operator (<, <, >, >, =, ). The attributes x and y should have domains that
can be compared by .
(d) If P1(s) is a formula containing free tuple variable s, and r is relation, then
sr (P1(s)) and sr (P1(s)) are also formulae.
Safety of Expressions.
Exercises
(iv) Get the Titles of the subjects assigned to “IC” department faculty.
(v) Get names of the students scoring more than 80 marks in “DBMS”.
(iii) Who are the suppliers, supplying Part “Ignition Switch” to Project Name
“Small Car”?
(iv) Get the names of the suppliers who are supplying parts to the projects
located in the same city as the city of the supplier.
(v) Get the names of the suppliers supplying parts to all projects in
“Mumbai”.
(ii) Get the names of the employees drawing salary more than 20000.
(iii) Get the name of each employee along with its manager.
CHAPTER 6
DOMAIN RELATIONAL CALCULUS
An atom in the domain relational calculus has one of the following forms:-
Examples.
Find the loan-numbers of those loans that for which amount is more than
100000.
Find names of the customers who have a loan from Noida branch & find loan
amount.
{<c, a> | l (<c, l> borrower b (<l, b, a> loan ^b=”Noida” ))}
Find the names of the customers who having loan or account or both at Noida
branch.
Find names of all customers who have accounts at all branches located in Delhi.
An expression like {<l, b, a> | (<l ,b ,a> loan)} is unsafe, since it allows
values in the result which are not there in the domain of the expression. An
expression in domain-relational calculus {<x1,x2,….xn>| P(x1,x2,…xn)} is safe if all
of the following hold:-
(a) All values that appear in the result are from dom(P).
(b) For every “there exists” sub-formula of the form x (P1(x)), the sub-
formula is true if and only if there is a value x in dom(P1) such that
P1(x) is true.
(c) For every “for all” sub-formula of the form x (P1(x)), the sub-formula
is true if and only if P1(x) is true for all values from dom(P1).
1. RA: A = C (R (A,B,C))
TRC: { t t R t[A] = t[C] }
DRC: { A, B, C A, B, C> R A = C }
CHAPTER 7
Structured Query Language (SQL) is a Language used for interaction with a Relational
Database Management System (RDBMS). It is not only a query language; but also used
for creation, update and maintenance of a database.
Characteristics of SQL
4. SQL has a very small set of Commands, which makes it easy to learn.
Advantages of SQL
2. Applications written in SQL can be easily ported from one system to another.
Such a need would arise when a system needs upgrade or change.
6. The language, while being very simple, flexible and easy to learn, it has very
powerful features, which enable it to perform very complex operations in a
DBMS.
11. DATE This data type has ten positions embedded in single quotes
i.e. ‘DD-MM-YYYY’; for example ‘31-05-1950’ implies 31st May 1950.
12. TIME This data type has at least 8 positions embedded in single quotes
‘HH:MM:SS’; For example ’11:07:05’ implies 11.07.05 AM and ’23:07:05’
implies 11.07.05 PM.
14. INTERVAL It specifies a time interval, a relative value that can be used
to increment or decrement an absolute value of DATE, TIME or
TIMESTAMP. The intervals are qualified either as YEAR/MONTH
intervals or DAY/TIME intervals.
‘DBMS’
‘Structured Query Language’
77.00
+77.77
-69
900
0.9
77.00E9
-7.7E8
+76.7E-4
+76.8E5
1. Data Definition Language (DDL) It is used for defining the database schema
i.e. to CREATE, ALATER & DROP Tables, Views and Indexes; like CREATE
TABLE, ALTER TABLE, DROP TABLE, CREATE VIEW, DROP VIEW,
CREATE INDEX, DROP INDEX.
SQL Operators
2. Comparison Operators:- =, > , < , >=, <=, ( != , <> , = ) , IN, NOT IN, IS
NULL, IS NOT NULL, LIKE, ALL, (ANY , SOME), EXISTS, NOT EXISTS,
BETWEEN x AND y .
Operator Precedence
NOT |
AND | Logical Operators
OR |
UNION |
INTERSECT | Set Operators
MINUS |
Creating a Table
The following DDL Statement will add a new Table STUDENT to the DBMS Catalog,
with the attributes and data types as explicitly clear from the statement. It indicates that
attribute REG_NO is primary key and attribute ROLL_NO is Unique, which implies that
ROLL_NO is a candidate key of STUDENT.
Similarly, the following DDL Statements will add new Tables RESULT, EMPLOYEE
and DEPT to the DBMS Catalog.
The following DDL statement will add a new attribute STATUS of type INT to the
existing Table EMPLOYEE.
The following DDL statement will remove the existing Table RESULT from the DBMS
Catalog.
Creating a View
The following statement will create a VIEW named DEPT_TOTAL_SAL with two
attributes D_NO and T_SAL by selecting DEPT_NO and TOTAL_SAL of existing Table
DEPT.
There will not be any table named DEPT_TOTAL_SAL; only its definition will be stored
in the DBMS Catalog. Whenever, a reference is made to DEPT_TOTAL_SAL in any
SQL Query, a table will be created with the help of the definition and the table will be
deleted after answering the query. For Example:-
SELECT DEPT_NO
FROM DEPT_TOTAL_SAL
WHERE T-SAL > 10000000;
The following SQL Statement will create VIEW named DEPT_AVG_SAL with
attributes D_NO and AVG_SAL from existing table EMPLOYEE. The attribute
AVG_SAL is computed by taking average of the salary of the employees of each
department.
Creating Indexes
The following statement will create a unique Index on the primary key EID of Table
EMPLOYEE.
The following statement will create a unique Index on the primary key EID of Table
EMPLOYEE.
Dropping an Index
A Query refers to a SELECT Statement used to extract information from the Tables.
Query Get Department Number and Average Salary of the employees of Dept Number 3
or more and having more than 10 employees; and order the information in descending
order of Average Salary.
Query List Employee Names along with the Names of their respective Dept Heads.
Aggregate Functions
Query Find Average Marks and Total Marks obtained by each Student
Query Find Minimum, Maximum and Average Marks obtained in each Subject.
UPDATE EMPLOYEE
SET SALARY = 55000
WHERE EID = ‘0012240L’;
UPDATE EMPLOYEE
SET SALARY = SALARY * 1.1;
JOINS
Query Get the Names of Students, who have appeared for Subject ‘TCS501’
This Query involves a Natural Join of STUDENT and RESULT and in Relational
Algebra it can be written as:-
UNION
Query Get the Names of the Students, who have appeared for subject
‘TCS501’ or for ‘TCS503’ or for both.
UNION
T1 STUDENT * RESULT
S_NAME (SUB_CODE= “TCS501” (T1)) S_NAME (SUB_CODE = “TCS503” (T1))
INTERSECT
Query Get the Names of the Students, who have appeared both for
‘TCS501’ and ‘TCS503’.
INTERSECT
T1 STUDENT * RESULT
S_NAME (SUB_CODE= “TCS501” (T1)) S_NAME (SUB_CODE = “TCS503” (T1))
Query Get the Names of the Students, who have appeared for subject
‘TCS501’ but not for ‘TCS503’.
MINUS
T1 STUDENT * RESULT
S_NAME (SUB_CODE= “TCS501” (T1)) - S_NAME (SUB_CODE = “TCS503” (T1))
Cursors in SQL
Cursor is a Construct in PL/SQL that enables a user to earmark a private memory area to
hold an SQL Statement for accessing later on.
Example
Suppose Total Marks Scored by a Student are to be extracted from RESULT and to be
entered into Table TOTAL_MARKS (ROLL_NO, T_MARKS).
DECLARE
CURSOR C_Student IS
SELECT ROLL_NO, SUM (MARKS)
FROM RESULT
GROUP BY ROLL_NO;
C_NO CHAR (10);
C_TOTAL INT;
BEGIN
OPEN C_Student;
LOOP
FETCH C_Student INTO C_NO, C_TOTAL;
EXIT WHEN C_Student%NOTFOUND;
INSERT INTO TOTAL_MARKS
VALUES (C_NO, C_TOTAL);
END LOOP;
CLOSE C_Student;
COMMIT;
END;
DDL
Suppose there is a constraint that account balance should not be less than 1000,
it can be added to the table Account as follows:-
ALTER TABLE Account ADD CONSTRAINT Check_Bal CHECK (Bal >= 1000);
Here D# in Emp is a foreign key referencing D# of Dept, Mgr# in Dept is a foreign key
referencing E# of Emp and Total_Sal in Dept is total salary of all the employees working
in a department. The situation is little tricky here:-
(i) Both the tables are referencing each other. If we create the Emp table first and
declare D# as foreign key referencing Dept (D#), the system will generate
exception “table or view does not exist” since the table Dept is non-existent.
Similar situation will occur if we attempt to create Dept first and declare
Mgr# as foreign key referencing Emp (E#).
(j) Insertion of data into the tables will also face problem. If we attempt into
Emp first, it will attempt to reference a non-existent tuple in table Dept for
D# and if we first insert a tuple in Dept, it will attempt to reference a non-
existent tuple in Emp for Mgr#.
Note that the foreign key constraints added to the above tables are of type
“Initially deferred deferrable”. This implies that while inserting data, the check for
compliance of foreign key constraints will be deferred till the next COMMIT statement is
executed. This will enable data entry into the two tables in any sequence, as long as the
information in the two tables is compatible at the time of execution of next COMMIT
statement.
CREATE TABLE Class ( Year INT, Branch Char(3), Section INT, Strength INT,
PRIMARY KEY (Year, Branch, Section),
CHECK (Branch IN
(‘CSE’,’IT’,’ECE’,’IC’,’ME’,’EE’, ’MT’)),
CHECK (Year BETWEEN 1 AND 4),
CHECK (Section BETWEEN 1 AND 2));
Creation of Indices
Since (Day, Period, Fac_Code) is a candidate key of Time_Table, we can create on this
combination of the columns.
Dropping of Tables
Dropping of Constraints
Dropping of Columns
Dropping of Indices
Take the case of tables Emp and Dept. Suppose table Emp is to be dropped, the system
would not permit this, since the dropping of Emp would violate the foreign key constraint
Dept_FK of table Dept. Dropping of Emp and Dept is achieved as follows:-
DML
1. Add a new customer to table Customer with C_Id = ‘C101’, C_Name = ‘Ajay’,
C_Street =’S-26’ and C_City = ‘Noida’.
2. Add a new student to the Student Table with Roll_No = ‘091010120’, S_Name =
‘Vijay’, S_Address = ‘S-27 Noida’. (Note that information about S_DOB is missing. It is
not a NOT NULL attribute, so it can be assigned a NULL value).
The above NULL can also be inserted as follows (the attribute name S_DOB is omitted
from the attribute list specified with the table name):-
Since attribute name S_DOB is not listed in the list of attributes listed with the table
Student, NULL value will be assigned to this attribute. As indicated, the attributes can be
listed in any order. Then the values have to specified in the same order.
SELECT *
FROM Student;
SELECT *
FROM Student
WHERE S_DOB >= ’01-JAN-1995’ AND DOB <= ’31-DEC-1995’;
SELECT *
FROM Student
WHERE S_DOB BETWEEN ’01-JAN-1995’ AND ’31-DEC-1995’;
SELECT *
FROM Student
WHERE S_DOB LIKE ‘%95’;
5. Get Roll_No and DOB of all students born before 01st Jan 1995.
6. SELECT *
FROM Account
WHERE AN = &numb;
Here numb is a substitution variable, whose value will be accepted by the system by
displaying prompt ‘Enter Value for numb:’. Each time the query is executed, a different
value for numb can be entered like A101, A105 etc.
7. Get the names of students who got more than 90 marks in any subject.
Here, we perform natural join of Result and Student and pick up names of those
students who have scored more than 90 marks in any subject.
Since attribute name Roll_No appears in both the tables specified in the FROM
clause, we need to qualify by the table name, while using this attribute name in
subsequent clauses. However, this is not the problem with attribute Marks, since it
appears only in table Result.
The qualifier DISTINCT has been used in the SELECT clause to avoid duplicates
names from appearing in the result in the case of those students who have scored
more than 90 marks in more than one subjects.
The above query can be expressed more elegantly by declaring a tuple variable
say R on the table Result and another tuple variable S on the table Student, as
shown below:-
8. Get the customer and account number of those customers, who are living in Noida
but having account in Delhi and have Balance more than 100000.
9. Get the Roll_No of those students whose DOB is not specified in the Student
table.
SELECT Roll_No
FROM Student
WHERE S_DOB IS NULL;
Using Aliases
Here, the Attributes in the resulting table will be named as Customer_Name and
Account_Number.
Arithmetic Operations
12. Suppose there is a schema Emp (E_Id, E_Name, Basic_Pay, DA, HRA,
Deduction). We can have query to determine gross salary of each employee:-
SELECT *
FROM Result
ORDER BY Marks;
SELECT *
FROM Result
ORDER BY Marks DESC;
18. Get Min Balance, Max Balance and Total Balance at each branch.
19. Get the names and Total Marks of those students who have scored Average Marks
more than 80%.
This will display total marks of each student having average score > 80%.
All tuples of Depositor will appear in the result. Wherever, LN is not defined, it
will be indicated by NULL.
All tuples of Borrower will appear in the result. Wherever, AN is not defined, it
will be indicated by NULL.
22. Get the Customer Id and name of those customers who have both account and
loan from the bank.
Here (SELECT C_Id FROM Borrower) is called inner sub-query and the main query
SELECT C.C_Id, C_Name FROM Custmer C, Depositor D WHERE C.C_Id = D.C_Id
AND C_Id IN ( ) is called outer query. The inner query can be evaluated independent of
the outer query. Such a query is evaluated in two steps:-
(i) First evaluate the inner sub query and save its output.
(j) Now evaluate the outer sub query wrt the result produced by inner sub-
query.
Evaluation of inner sub-query will produce a set of C_Id of those customers who have a
loan from the bank. For each C_Id existing in the depositor table, the outer sub-query will
examine whether that C-Id exists in the set produced by inner sub-query. If the answer is
“Yes” then that C_Id belongs to a customer having both account and loan and the
customer’s name appears in the final output table.
23. Get names of the customers having joint account with customer Ajay.
SELECT CN
FROM Customer C, Depositor D
WHERE C.C_Id = D.C_Id AND CN <>’Ajay’
AND AN IN ( SELECT AN
FROM Customer K, Depositor P
WHERE K.C_Id = P.C_Id AND CN =’Ajay’);
The inner sub-query will produce set of Account Numbers held by custmer ‘Ajay’. The
outer sub-query will determine the other customers who are having an account held by
‘Ajay’.
24. Get Branch Id and Name of the branch having highest average balance amongst
all branches.
Here, the inner sub-query is not independent of the outer sub-query. So, inner sub-query
is evaluated for each tuple of the outer sub-query.
25. Get the names of the customers who have account in each branch located in
Noida.
SELECT C_Name
FROM Customer C
WHERE NOT EXISTS (( SELECT B_Id
FROM Branch
WHERE B_City = ‘Noida’)
MINUS
( SELECT B_Id
FROM Account A, Depositor D
Updating of tables
Creation of Views
Dropping of Views
CHAPTER 8
NORMALIZATION OF RELATIONAL SCHEMA
(a) Identifying those data dependencies in the schema, which would cause
anomalies during Insert, Update and Delete of data.
(b) And decomposing the schema into a set of sub-schemas, on the basis of
dependencies, causing anomalies.
The resulting schemas would permit representation of the intended information, while
maintaining the data redundancies to a minimal.
The process of normalization can be well understood only after understanding the
concept of various types of data dependencies occurring in a database.
Let there be a Relation Schema R, comprising sets of attributes , & i.e. R,
R & R. A Functional Dependency (read as “ determines ”) is said
to be holding on the Schema R, if for every legal relation r(R) and for every tuple- pair
{t1,t2} r, if t1[] = t2[] then it must satisfy t 1[] = t2[]. This means that if any two
tuples of relation r(R) agree on the values of , then the two tuples must agree on the
values of .
A set of FDs F is said to be holding on a schema R, if all FDs in F are satisfied by every
legal relation r(R).
Suppose, there is a Relational Schema “Student” having the following FDs holding on it:-
P S Gill
2
{Name, Father_Name, DOB, Address} Roll_No, Registration_No, Section, Branch
Left-Irreducible FD
Proof by Contradiction
Let us assume that two tuples {t 1 and t2} in a legal relation r(R) agree on the values of K
i.e. t1 [K] = t2 [K]. ------- (i)
Since, K R, it implies that t 1 [R] = t2 [R] , and thus t 1 = t2
P S Gill
3
But no two tuples in a legal relation r(R) can be equal, since a relation is defined as a set
of tuples and in a set no two elements can be same.
Each Relation Schema R will have at least one default Super Key; that is, the set of all its
attributes i.e. the entire schema R itself.
Since superset of a Super Key of R will also be its Super Key. This implies that a Super
Key may be containing some extraneous attributes. Suppose K is a Super Key of R and E
is the complete set of extraneous attributes contained in K, then (K-E) will form a
minimal Super Key of R. No proper subset of (K-E) will form a Super Key of R. This
minimal Super Key is called a Candidate Key of R.
For example, {Roll_No} and {Registration_No} will form Candidate Keys of schema
“Student”. In addition, {Name, Fathers_Name, Address, DOB} also forms another
Candidate Key.
Primary-Key
A Relation Schema R may have more than one Candidate Keys. One of the Candidate
Keys is chosen as primary means to identify tuples uniquely in a relation r(R). This
designated candidate key is called a Primary Key of R. Out of the three candidate keys in
the above example, we may select {Roll-No} as Primary Key of the Schema Student.
P S Gill
4
Suppose .
Consider a relation r (R) and a tuple pair { t 1, t2 } r such that
t1[] = t2[] ----------------- (i)
Since , t1 and t2 will also satisfy
t1[] = t2[]-------------------(ii)
From (i) and (ii), it is implied that holds on R.
Thus, proved.
P S Gill
5
2 Augmentation Rule
Since, (vi) contradicts (iv), Our assumption (A), that does not hold on R, is NOT
CORRECT.
3 Transitivity Rule
4 Union Rule
P S Gill
6
Applying Transitivity Rules to (i) and (ii), it is implied that
5 Decomposition Rule
6. Pseudo-Transitivity Rule
Example: The Functional Dependency conveyed by the statement, “ Knowing the Name
and Address of a Person, we can determine his Address “ i.e. (Name, Address)
Address is obviously trivial.
Closure of FD Set
Suppose F is a set of FDs that holds on a Schema R, then the Closure of F, denoted by F+,
is the complete set of FDs, that includes F and all the FDs that are logically implied
(inferred) by F. Any legal relation r(R) that satisfies F will also satisfy F +.
F+ = F;
Repeat
Save-F+ = F+;
To each FD f1 F+, apply Armstrong’s Rule of Reflexivity; and add the FDs so
inferred to F+;
To each FD f1 F+, apply Armstrong’s Rule of Augmentation; and add the FDs
so inferred to F+;
To each such pair of FDs as { } F+, apply the Armstrong’s Rule of
Transitivity, and add FD to F+;
P S Gill
7
Cover of an FD Set
An FD set G is said to be the cover of another FD set F, if F G+ i.e. all the FDs that are
there in F are also there in the Closure of set G.
Equivalent Sets
Two FD sets F and G are said to be equivalent sets, if both form cover of each other i.e F
G+ and G F+, which implies F+ = G+. This means that two sets F and G are
equivalent, if their closures are equal.
An FD f F is said to be extraneous, if its exclusion from F does not affect the Closure
of F i.e. {F-f}+ = F+. Such FDs in a set are logically implied by other FDs in the set.
Example:-
Left-irreducible FDs
The left-side on an FD (i.e. its determinant) is said to be irreducible, if it does not contain
any extraneous attributes. Such FDs are known as left-irreducible FDs.
An FD set Fc is said to be Minimal Cover or Canonical Cover of FD set F, iff F Fc+ and
it satisfies the following three conditions:-
(a) Each FD in Fc is in a Canonical Form i.e. has only one attribute on its right
side.
(b) No FD in Fc has any extraneous attributes on its left side i.e. all the FDs in Fc
are left-irreducible.
P S Gill
8
Fc = F;
Repeat
Save-Fc = Fc;
To each FD in Fc of the form ABC (where A, B and C are attributes of
Schema R), apply Decomposition Rule; and replace the FD by a set of
FDs {A, B, C};
For each FD () Fc and for each attribute A
if {{Fc – {}} { ( - A) } }+ = Fc+
then replace FD by FD ( - A) in Fc ;
How does Canonical Cover of an FD set help to reduce the DBMS overheads?
Suppose F is the set of FDs holding on a schema R, then a relation r(R) would be legal
only if satisfies all the FDs in set F. Now, to determine whether a relation r(R) is legal or
not, DBMS has to check for the satisfaction of all the FDs in set F. On the other hand, if
we determine a minimal Cover Fc of F, then we have to check for the satisfaction of a
much smaller set of FDs, since, a relation r(R) that satisfies FD set Fc will also satisfy FD
set F, since both are equivalent sets. This will reduce DBMS overheads.
Yes, an FD set F can have more than one Canonical Covers, but all of those sets would be
equivalent to each other; and in turn equivalent to F.
Example:
Step 1: Covert the FDs to their canonical form i.e. by equivalent sets of FDs, having only
single attributes on their right side
Fc : { A B, A C, A D, B C, B D, B A, C A, C B, C D}
P S Gill
9
Step2: Remove extraneous attributes from the left side of all FDs. Here, all FDs have
only one attribute on its left side, thus cannot contain any extraneous attribute.
Fc : { A B, A C, A D, B C, B D, B A, C A, C B, C D}
Suppose is a sub-set of Schema R i.e. R. And suppose F is the set of FDs holding
on Schema R. Then, the Closure of Attribute Set , denoted by +, is the complete set of
attributes that can be determined by under the FD set F.
+ = ;
Repeat
Save-+ = + ;
For each FD ( ) F
if + then + = + ;
Until (Save-+ == + ) ;
P S Gill
10
The Concept of “Attribute Set Closure” can be used to determine the following:-
Determine + under F.
If + equals R, then is a Super-Key of R.
Determine + under F.
If +, then holds on R.
F+ := F;
For each FD in F+
Begin
Determine + under F+;
For each +
holds; Thus include it in F+
i.e. F+ := F+ { };
End;
Example:-
Case I
Consider the following relation r on schema R (A,B,C) and its decomposition into r1
and r2 .
r(R)
A B C
A1 B1 C1
A2 B2 C1
A1 B1 C2
A3 B2 C3
P S Gill
11
A1 B1 C3
A2 B2 C4
r1(R1)
A B
A1 B1
A2 B2
A3 B2
r2(R2)
A C
A1 C1
A2 C1
A1 C2
A3 C3
A1 C3
A2 C4
r1 * r 2
A B C
A1 B1 C1
A1 B1 C2
A1 B1 C3
A2 B2 C1
A2 B2 C4
A3 B2 C3
It is a loss-less-join-decomposition, since r1 * r2 = r
Case II
Now consider the following decomposition of r into r1 and r2 .
r(R)
A B C
A1 B1 C1
A2 B2 C1
A1 B2 C2
A3 B2 C3
A1 B1 C3
A2 B1 C4
P S Gill
12
r1 (R1)
A B
A1 B1
A2 B2
A1 B2
A3 B2
A2 B1
r2 (R2)
A C
A1 C1
A2 C1
A1 C2
A3 C3
A1 C3
A2 C4
r1 * r 2
A B C
A1 B1 C1
A1 B1 C2
A1 B1 C3
A2 B2 C1
A2 B2 C4
A1 B2 C1
A1 B2 C2
A1 B2 C3
A3 B2 C3
A2 B1 C1
A2 B1 C4
A decomposition of Relation Schema R into R1 and R2 (such that R1 R2 =R) will be a
Loss-Less-Join (Non-Additive) Decomposition, if the common attributes of R1 and R2
(i.e. R1 R2) form candidate key of either R1 or R2 or both.
i.e. R1 R2 R1
P S Gill
13
OR
R1 R2 R2
Since R1 R2 = {} R1
Ri Rk Ri
Ri Rk Rk
Dependency-Preserving Decomposition
Let F be the set of FDs holding on a schema R, having a decomposition (R1, R2,…, Rn)
such that R1 R2 R3 ……Rn = R.
Let {F1, F2, ……, Fn} be the restrictions of F to R1, R2, ……, Rn respectively.
Let F’ = F1 F2 ……U Fn
The decomposition (R1, R2,…, Rn) is said to be “Dependency-Preserving” if F’+ = F+ i.e
each FD of F+ must be preserved in at least one of the decompositions; else the
decomposition is called non-dependency-preserving.
Example:-
Suppose a relation schema R (A, B, C) has FD set F holding on it:-
P S Gill
14
F: {AB, B C}
P S Gill
15
Let F be the set of FDs holding on R.
P S Gill
16
NORMALIZATION
First Normal Form (1 NF) A Schema R is said to be in first normal form (1 NF) if all
its attributes have only atomic domains i.e. domains of all attributes have only indivisible
values. Alternatively, it can be stated that for a Relation Schema R to be in 1 NF, all its
attributes should be “simple” and “single-valued”. Each field in each tuple of that relation
must have only one value from the respective domain or a “NULL” value.
UN-NORMALIZED TABLE
The Field Tel_No in a tuple has a set of values. Such a Table is said to be Un-normalized
and it is not in First Normal Form.
NORMALIZED TABLE
P S Gill
17
A tuple in the Un-Normalized Table is replaced by as many tuples in the Normalized
Table as the number of telephones owned by the respective employee. Such a table is
called Normalized Table or Flat Table. It is in First Normal Form and has a lot of data
redundancy, which will be eliminated by further normalization of the Table.
Full Functional Dependency Let there be a Relational Schema R with a Candidate Key
K and a non-prime attribute A (K R, AR). The Functional Dependency K A is said
to be a “Full Functional Dependency”, if attribute A cannot be determined by any proper
subset of K i.e. there does not exists any K1 K for which K1 A holds.
Let R1 (A, B, C, D, E) be a Relation Schema with all its attributes (A..E) having only
atomic domains; and let F: {AB C, A D, D E} be the set of FDs holding on it.
Since {A, B}+ = ABCDE, {A,B} forms the a candidate key of this schema R; and this is
the only candidate key of R.
So, A and B are the prime attributes of R and all other attributes i.e. C, D and E are non-
prime attributes.
The Non-prime attribute C is determined only by the full candidate key, since AB C.
holds on R. So, C is said to be fully functionally dependent on the candidate key and the
FD AB C is called a Full FD or Complete FD of R.
However, the non-prime attributes D and E are determined by A alone, which is a proper
subset of the candidate key {A, B}. Such a dependency is called partial functional
dependency; and such FD causes certain Insert/Delete/Update anomalies, as
demonstrated in the following example:-
Example:-
Consider a Schema SP1 (S#, P#, Sname, Scity, Status, Pname, Qty)
P S Gill
18
Scity :Supplier City
Status :Supplier Status, which depends on Scity
Pname: Part Name
Qty :Quantity of a Part (P#) to be supplied by a Supplier (S#)
(ii) Information about a Part like its Pname can be inserted only when
the part is being supplied by at least one supplier.
(iii) Information about a City like its Status can be inserted only when
there is at least one supplier from that city and the supplier is supplying at
least one part.
P S Gill
19
(iii) Suppose, a city has only one supplier and that supplier is supplying
only one part. On deletion of the tuple of that particular supply, we would
lose the information about the Status of that City.
(i) Information about the Sname and Scity of a particular supplier will
be appearing as many times as the number of parts being supplied
by that supplier.
(a) It is in 1 NF and
The above relation schema R1 is not in 2 NF, since it involves a partial functional
dependency A DE.
Using Heath’s Theorem, R1 can be loss-less decomposed into R21 and R22.
P S Gill
20
The decomposition of R1 into R21 and R22 is Loss-less Join Decomposition since:-
Since, R21 and R22 involve no partial functional dependencies, both are in 2 NF.
{S#,P#}is a candidate key of SP1 and this is the only candidate key of SP1
So, S# and P# are prime attributes and all other attributes are non-prime.
Since S# Scity and Scity Status; so S# Status also holds.
P S Gill
21
P
P# Pname
P1 Aero-engine
P2 Generator
P3 Altimeter
SP2
S# P# Qty
S1 P1 5
S1 P2 5
S2 P1 2
S2 P3 5
S3 P2 10
S3 P3 20
P S Gill
22
Insert:
Sname and Scity of a Supplier can now be inserted into table S, even when it is
not supplying even one part.
Pname of a part can now be inserted into table P, even when it is not being
supplied by any supplier.
Delete:
When information about a supply is deleted from table SP 2, we do not lose any
information about Sname, Scity or Pname.
Update
Information about Sname & Scity of a supplier now appears only in one tuple in
table S.
Information about Pname of a particular part now appears only in one tuple in
table P.
(a) It is in 2 NF and
P S Gill
23
Thus, R22 is in 3 NF but R21 is not, since it involves a Transitive Dependency i.e
A D, D E A E.
The decomposition of R21 into R31 and R32 is Loss-less Join Decomposition
since:-
R31 R32 R31
R31 and R32 do not involve any Transitive Dependency; and are thus in 3 NF.
3 NF decomposition of R1:-
The Schemas P and SP2 do not involve any Transitive Dependencies; and are thus
already in 3NF. But, S has a Transitive Dependency i.e. S# Scity, Scity
Status S# Status
P S Gill
24
S# Sname, Scity
Now, STS and SUPP do not have any Transitive Dependency; so both are in 3 NF.
SUPP
S# Sname Scity
S1 Avia Mumbai
S2 Aero Delhi
S3 Air-supp Mumbai
STS
Scity Status
Mumbai 10
Delhi 20
Now, all update anomalies have been resolved. Status of a particular city now
appears in only one tuple in table STS. Also, information about status of a city can
now be inserted irrespective of whether any supplier exists in that city or not.
P S Gill
25
A Relation Schema in Third Normal Form (3 NF) may still be riddled with some
anomalies, under the situations when a schema has multiple candidate keys, which may
be composite and overlapping. Any relation, under such schema, may have some data
redundancies that would cause some update anomalies. For example, the schema:-
The relation schema SP has two candidate keys i.e {S#, P#} and {Sname, P#}. Both the
candidate keys are composite and have one common attribute i.e. P#.
The only non-key attribute i.e. Qty is non-transitively and fully dependent on both the
candidate keys. Thus, the schema SP is free of any partial dependencies or transitive
dependencies and is thus in Third Normal Form (3NF).
This can also verified from the fact that each FD satisfies one of the necessary conditions
for SP to be in 3 NF.
Despite being in 3NF, any legal relation under the schema will have some data
redundancies, for example, the name of a particular supplier i.e. Sname will be repeated
as many times as the number of supplies being made by that supplier.
Thus, there is need to have a normal form, stronger than 3NF. The necessary solution is
provided by Boyce Codd Normal Form (BCNF).
P S Gill
26
The above two conditions, for BCNF, are same as the first two conditions of 3 NF. Thus,
if a schema is in BCNF, it must also be in 3NF. However, the third condition of 3 NF is
missing against the BCNF criteria, indicating that BCNF is more restrictive as compared
to 3NF. So, it is possible that a schema may be in 3NF but not in BCNF. Thus, BCNF is a
stronger normal form than 3NF.
Going by the definition of BCNF, SP is in 3 NF but not in BCNF since the two FDs i.e
S# Sname, Sname S#, are neither trivial and nor have Super Keys on their Left Side.
Alternating it can be stated that a Relational Schema R will be in BCNF if each non-
trivial left-irreducible FD α→β, holding on R, has only Candidate Key on its left side
i.e. α must be a Candidate Key of R.
SP can be decomposed into BCNF Schemas, on the basis of the FDs that violate BCNF
i.e. S# Sname and Sname S#. The resulting BCNF decompositions of SP will be:-
OR
P S Gill
27
It can be verified that the decompositions are loss-less-join decompositions.
P S Gill
28
S# Sname, Scity
Sname S#, Scity
Scity Status
P# Pname
{S#,P#} Qty
{Sname, P#} Qty
It can be verified that all the FDs indicated above are non-trivial and left-irreducible; and
the following FDs do not have Candidate Key on left side:-
S# Sname, Scity
Sname S#, Scity
Scity Status
P# Pname
Thus SP is not in BCNF.
Let S’ := { SP };
P S Gill
29
where SP3 = ( S#, P#, Qty )
and SUPP = (S#, Sname, Scity)
S# {Sname, Scity}
SP3 (S#, P#, Qty) Primary Key {S#, P#}
Foreign Key {S#} references SUPP
Foreign Key {P#} references P
{S#, P#} Qty
All the relation schemas in the above decomposition are free of any partial dependencies
and transitive dependencies. Also, all the FDs have only candidate keys (of the respective
schemas) as their determinants. Thus, all the relation schemas are in BCNF.
SP3 P P
SP3 SUPP SUPP
SUPP STS STS
Yes, a relation in 3NF may not be in BCNF. But, a relation in BCNF will definitely be in
3NF also; since BCNF is more restrictive as compared to 3 NF. Like in the above
example, SP is in 3NF but not in BCNF. However, its decompositions SP 3, P, STS and
SUPP are all in BCNF; and are also in 3NF. Thus, BCNF is a stronger normal form than
3NF.
In fact, we can state that a relation schema in BCNF will be free of all those data
anomalies that can be eliminated on the basis of functional dependencies (FDs).
P S Gill
30
ABU’s Algorithm to determine whether a given Decomposition of a Relational R is a
Loss-less-join Decomposition or not.
ABU’s Algorithm can be used to determine whether a Decomposition *(R 1, R2,….., Rn)
of Schema R is a loss-less-join decomposition or not.
Step 1 Make a matrix M of size nXm with column “j” corresponding to Attribute Aj (1 <
j < m) and row “i” corresponding to a projection Ri (1 < i < n).
for i := 1 to n do
for j:=1 to m do
if Aj Ri
then M [i, j] := aj;
else M [i,j] := bij ;
Step 3
Repeat
Save-M = M;
For each FD () F
if any two Rows of Matrix M match on the values of
then force those two rows to match on the values of , by
replacing “b” values by corresponding “a” values.
(if corresponding “a” value does not exist for a pair of cells
to be matched, then replace both the cells by one of the
corresponding “b” values.)
Until (M = Save-M);
Example:-
Decomposition:-
P S Gill
31
CS (Scity, Status)
SUPP(S#, Sname, Scity)
PART (P#, Pname, Price)
SPN (S#, P#, Qty)
S# Sname, Scity
Scity Status
P# Pname, Price
{S#, P#} Qty
Step 3:
Applying the FD Scity Status, rows 0 and 1 match on the value of Scity, so force these
two rows to match on the value of Status. Thus replace b13 in row 1 by a3.
Now, applying the FD P# Pname, Price , rows 2 and 3 match on the value of P#, so
force these two rows to match on the value of Pname and Price. Thus replace b35 in row 3
by a5 and replace b36 in row 3 by a6
P S Gill
32
SPN 3 a0 b31 b32 b33 a4 a5 a6 a7
Now, applying the FD S# Sname, Scity, Staus , rows 1 and 3 match on the value of
S#, so force these two rows to match on the value of Sname, Scity and Status. Thus
replace b31 in row 3 by a1; replace b32 in row 3 by a2; and replace b33 in row 3 by a3
Step 4 The row 3 contains only “a” values; therefore the above decomposition is a loss-
less-join decomposition of SP.
P S Gill
33
Trivial MVD
Fagin’s Theorem
If a relation schema R (, , ) has a MVD holding on it, then it can be loss-
less decomposed into R1 ( ,) and R2 (,).
This rule implies that all non-trivial MVDs will occur only in pairs.
P S Gill
34
4. Augmentation Rule If , then where
6. Coalescence Rule If
and Where and = 0
Then .
If , , then (-)
D = { A B, BHI, CGH}
Since A B, so by complementation A (R – A – B)
CGHI
Since B HI
CG H where H HI and HI CG = 0
Therefore, by Coalescence Rule, B H
Problem Consider a Relational Schema R (A, B, C, D, E). Let the set of MVDs
holding on R be { A BC, B CD and E AD}. Determine its loss-less 4NF
decomposition.
P S Gill
35
Solution
R (A, B, C, D, E)
M = {A BC, B CD, E AD }
OR
R1 (E, A, D)
R2 (E, B, C)
P S Gill
36
BCNF to 5 NF
A Relation Schema is said to be in Boyce Codd Normal Form (BCNF), if all non-trivial
left-irreducible Functional Dependencies (FDs), holding on the schema, have only its
Candidate Keys as their Determinants. Any relation defined on such a schema will be
free of all those data anomalies that can be eliminated on the basis of FDs. However,
there may still be some residual data redundancies persisting in BCNF relations, causing
insert/delete/update anomalies. So, we have to look beyond FDs, for the elimination of
such anomalies in BCNF Schemas.
Let us define a Schema CTX (Course, Teacher, Text) with the following
constraints:-
(c) The set of Text Books followed for teaching a Course is determined only
by the Course taught and is completely independent of the Teacher
teaching it. Thus, the attributes Teacher and Text are completely
independent of each other.
There exists a one-to-many cardinality from Course to Teacher and also from Course to
Text, but there is absolutely no relationship between Teacher & Text. This situation
represents a MVD Course TeacherText.
ctx:
COURSE TEACHER TEXT
OS Ravi Galvin
OS Vivek Dietel
OS Ravi Dietel
OS Vivek Galvin
CO Ram Hamacher
CO Shyam M-mano
P S Gill
37
CO Ram M-mano
CO Shyam Hamacher
As indicated in ctx, the schema CTX does not have any non-trivial FDs. Thus, it is an
“All-Key” schema and all legal relations under this schema will be in BCNF. But, this
relation still has the following data anomalies:-
(b) The information that a particular Text Book is followed for a particular
Course is represented as many times as the number of Teachers teaching the
particular Course.
These anomalies are due to the non-trivial Multi Valued Dependencies (MVDs)
holding on the schema CTX.
Let there be a relation schema R (,,). It is said to have MVD from to (denoted as
) and from to (denoted as ), if and only if for every legal relation r(R)
it satisfies the following:-
(a) The set of -values, matching a given {-value, -value} pair, are
dependent only on the -value and are completely independent of the -value.
And
(b) The set of -values, matching a given {-value, -value} pair, are
dependent only on -value and are independent of -value.
Alternately, we can state that a relation schema R (,,) is said to have multi-valued
dependencies and (both denoted by ), if and only if for every
legal relation r(R), and for a tuple pair {t 1 , t2 } r t1[] = t2[], there exists a tuple-pair
{t3 , t4 } r, which satisfy:-
As per this definition, the relation ctx satisfies the MVD Course TeacherText
P S Gill
38
Since, {OS, Galvin} {Ravi, Vivek} and {OS, Dietel} {Ravi, Vivek}
So, the set of Teachers, teaching a particular Course, is dependent only on the Course
taught and is completely independent of the Texts followed for the particular Course.
Similarly, {OS, Ravi} {Galvin, Dietel} and {OS, Vivek} (Galvin, Dietel}
So, the set of Texts, followed for a Course, depends only on the Course taught and is
independent of the Teacher teaching it.
Trivial MVD An MVD , holding on a relation schema R, is said to be trivial, if:-
(a) or
(b) =R
Such MVDs are termed to be trivial, since these are satisfied by every relation on a
schema R.
A relation schema R is said to be in 4 NF, if and only if every MVD holding on
R satisfies either the following two conditions:-
(a) It is trivial MVD or
(b) is a Super Key of R
Now, the relation CTX has non-trivial MVDs Course Teacher and Course
Text. These are not trivial MVDs. Also, Course is not Super Key of CTX, since CTX is
an “All Key” Relation Schema. Thus, CTX is in BCNF; but not in 4 NF.
As per Fagin’s Theorem, a relation schema R (,,) satisfying MVDs can be
loss-less decomposed into schemas R1 (,) and R2 (,). So, CTX can be decomposed
into CT and CX.
ct
COURSE TEACHER
OS Ravi
P S Gill
39
OS Vivek
CO Ram
CO Shyam
cx
COURSE TEXT
OS Galvin
OS Dietel
CO Hamacher
CO M-mano
As evident, ct * cx = ctx
The relations ct and cx are not satisfying any non-trivial MVDs; thus both CT & CX are
in 4 NF.
The relations are free of the data redundancies indicated above, which existed in ctx. The
information of a Teacher teaching a particular Course is now represented only in one
tuple in ct and the information regarding a Text being followed for a particular Course is
represented only at one place in cx.
A relation schema R in BCNF, not having any non-trivial MVDs holding on it, will be in
4 NF. But, it may still have some data anomalies, for example consider a schema CTX4,
with the following constraints:-
(c) The set of Texts, followed for a Course, depend not only on the Course
but also on the Teacher teaching it. It means that each teacher teaching a
particular course may follow different sets of text books; the sets may be
overlapping.
(d) If a Teacher T1, teaching a Course C1, does not follow a Text X1, which is
being followed by another Teacher T2 to teach the course C1, then T1 must not
follow X1 for any other Course, which he may be teaching.
ctx4
P S Gill
40
So, the set of TEXT-values that occur matching a given {COURSE-value, TEACHER-
value} pair in ctx4 depends not only on COURSE but also on TEACHER. So, it does not
satisfy the MVD COURSE TEACHERTEXT. So, the schema CTX4 does not have
any non-trivial MVDs holding on it. So, it is in 4 NF. But ctx4 still has some data
redundancies, like the information about a teacher teaching a course appears as many
times, as the number of text books followed by that teacher for that Course.
Since, CTX4 does not satisfy MVD COURSE TEACHERTEXT, it can be verified
that ct * cx ctx4
ct
COURSE TEACHER
OS Ravi
OS Vivek
CO Ram
CO Shyam
cx
COURSE TEXT
OS Galvin
OS Dietel
CO Hamacher
CO M-mano
ct * cx
COURSE TEACHER TEXT
OS Ravi Galvin
OS Ravi Dietel
OS Vivek Galvin
OS Vivek Dietel
CO Ram Hamacher
CO Ram M-mano
CO Shyam M-mano
CO Shyam Hamacher
As verified above, ct * cx ctx4. ct * cx has two spurious tuples, which do not exist in
ctx4. So, the decomposition of CTX4 into CT and CX is not a loss-less (non-additive)
decomposition.
P S Gill
41
It may be feasible to eliminate these data redundancies of ctx4, on the basis of another
type of dependency, called Join Dependency (JD).
Example (Employee-Project-Department)
A Sample Database, created on a schema with the above constraints, will be:-
EPD
E# P# D#
E1 P1 D3
E1 P2 D1
E1 P1 D1
E1 P2 D3
E4 P3 D2
E4 P1 D3
E4 P3 D3
E4 P1 D2
The above table does not have any non-trivial FD; and is thus in BCNF. However, it
has a lot of data redundancies; like the fact that Employee E 1 is working on project P1
is reflected in two tuples. Similarly, there are many redundancies.
This implies that the set {D3, D1}is determined by E1 alone and does not change when
P# is changed from P1 to P2.
However when E# is changed from E1 to E4, the set of D#s changes as indicated
below:-
{E4 , P3} {D2, D3}
P S Gill
42
{E4 , P1} {D2, D3}
Thus, the schema for this table satisfies E# P# and E# D#. This pair of
MVDs is non-trivial. Thus, EPD is not in 4NF. It can be loss-less decomposed into
EP(E#, P#) and ED(E# , D#) as shown below:-
EP
E# P#
E1 P1
E1 P2
E4 P3
E4 P1
ED
E# D#
E1 D3
E1 D1
E4 D2
E4 D3
It can be verified that EP*ED = EPD. Also, both EP and ED are free of the data
redundancies. Both EP and ED do not have any non-trivial MVD and are thus in 4NF.
An MVD is also a JD
An MVD on a relation schema R is also a JD *(, ). This implies that a
legal relation r(R ) can be loss-less decomposed into its projections and i.e.
r = (r) (r)
The relation schema CTX4 has a Join Dependency * (CT, TX, XC) where C: Course, T:
Teacher and X: Text , which can be verified as follows:-
P S Gill
43
ct
COURSE TEACHER
OS Ravi
OS Vivek
CO Ram
CO Shyam
tx
TEACHER TEXT
Ravi Galvin
Vivek Dietel
Vivek Galvin
Ram Hamacher
Shyam M-mano
Shyam Hamacher
xc
TEXT COURSE
Galvin OS
Dietel OS
Hamacher CO
M-mano CO
ct * tx * xc
Thus, ct * tx * xc = ctx4, thus CTX4 has a Join Dependency *(CT, TX, XC)
So, CTX4 can be loss-less decomposed into its projections on CT, TX and XC, which are
free of any data redundancies that existed in the relation CTX 4.
Non-Trivial JD
A JD *(R1, R2, ….Rn) of relation schema R is said to be trivial iff one of the projections
in JD is equal to R itself. Such a JD hold on each schema.
P S Gill
44
A relation schema R is said to be in 5 NF, if and only if any non-trivial Join Dependency
holding on R, is implied by its Candidate Keys.
The relation CTX4 has a JD *(CT,TX,XC) which is not implied by its Candidate Key
{C,T,X}. Thus, CTX4 is in 4 NF but not in 5 NF.
The relations CT, TX and XC do not have any non-trivial Join Dependencies, and are
thus in 5 NF.
Assuming that the Text Books followed for a Course are dependent not only on the
Course but also on the Teacher teaching it, Join Dependency will hold on CTX4,
only if the following is satisfied:-
“If a Teacher T1 teaching a Course C1, does not follow a Text Book X1, which is
being followed by another Teachers teaching Course C1, then T1 must not follow X1
for any other Course also that he may be teaching.” Only then the JD *(CT,TX,XC)
will hold on CTX4
3. A Teacher T1, teaching a Course C1, may not follow a Text Book
X1, which is being followed by another teacher teaching the
Course C1, but T1 may follow X1 while teaching another Course
say C2.
ctx5
COURSE TEACHER TEXT
OS Ravi Galvin
OS Vivek Milan
OS Ravi Milan
CO Vivek Hamacher
CO Ram Hamacher
CO Vivek Galvin
P S Gill
45
ct
COURSE TEACHER
OS Ravi
OS Vivek
CO Vivek
CO Ram
tx
TEACHER TEXT
Ravi Galvin
Vivek Milan
Ravi Milan
Vivek Hamacher
Ram Hamacher
Vivek Galvin
xc
TEXT COURSE
Galvin OS
Milan OS
Hamacher CO
Galvin CO
ct * tx * xc
P S Gill
46
ct * tx * xc has an additional tuple i.e. OS, Vivek, Galvin, which does not exist CTX5.
Thus, decomposition of CTX5 into its projections on CT,TX and XC is not a loss-less
(NON-ADDITIVE) decomposition. The natural join of CT, TX, XC contains some
spurious tuples. Thus, CTX5 does not have any JD and. thus, it is in 5 NF.
CTX5, though in 5 NF, still has some data anomalies; like information that ‘Galvin is the
text-book for OS’ is represented twice. These data anomalies cannot be eliminated on the
basis of Functional Dependencies, Multi-Valued Dependencies or Join Dependencies.
What Next?
A Relation Schema R will be in 6NF if the only Join Dependencies holding on R are
trivial Join Dependencies.
P S Gill
47
AN BN
We can conclude that a Schema in 6NF can comprise of only its Primary Key and at
most one non-key attribute.
Since each of these schemas contains Primary Key plus only one non-key
attribute, thus both are in 6NF.
The SALARY and PROJ_NO (Project on which he works) will keep on changing.
Suppose, we want to record the information of durations during which different
values of SALARY were valid and the durations during which different values of
PROJ_NO were valid for each employee, then the schema would need to be
decomposed as follows:-
6NF is most suitable for Temporal Databases, which contain time-element. For example
if we want to introduce time element in ACCOUNT, to indicate the Time and Date when
BAL is valid, then the schema will be:-
ACCOUNT (AN, DATE, TIME, BN, BAL)
{AN, TIME, DATE} BAL
AN BN
P S Gill
48
(ii) Attribute Correspondence: If R.X < S.Y and X = {A1, A2, ….., An} and
Y = {B1, B2, ….., Bn} and Ai corresponds to Bi for 1 < i < n, then it will have
R. Ai <S. Bi for all i.
(iii) Transitivity: If R.X < S.Y and S.Y < T.Z then it will have R.X < T.Z.
P S Gill
49
P S Gill
1
A B C
1 2 3
4 2 3
5 3 3
Which of the following is true? (a) AB (b) BCA (c) BC
Sol:
(a) AB holds (since the value of A is not matching in any pair of
tuples and so no two tuples are expected to match on the value of B)
(b) BCA does not hold (since in tuples 2 & 3, the value of BC is
matching but that of A is different).
(c) BC holds (since the value of B is matching in tuples 1 & 2 and so is
the value of C matching in these two tuples).
P S Gill
2
= ACDFBEG since EF G
5. Find the Closures (excluding trivial FDs) and Minimal Covers of the
following FD Sets. Are the sets equivalent?
F: A B, AB C, D AC, D E
G: A BC, D AE
Sol:
Minimal Cover of F
Step 1 Apply decomposition rule to each FD in F, such that FDs have only
singletons as dependents.
Fc :{A B, AB C, D A, D C, D E}.
Step 2 Eliminate extraneous attributes, if any, from the determinants of all FDs
Since, A B, A AB
Since, A AB and AB C, A C;
Thus B is extraneous in FD AB C, which may be eliminated
So, Fc : {A B, A C, D A, D C, D E}.
Minimal Cover of G
Step 1 Apply decomposition rule to each FD in G, such that FDs have only
singletons as dependents.
Gc :{A B, A C, D A, D E}.
Step 2 Eliminate extraneous attributes, if any, from the determinants of all FDs
Since, all determinants are only singletons, so Gc remains unchanged.
Gc :{A B, A C, D A, D E}.
P S Gill
3
Step 3 Eliminate FDs in Gc which are logically implied by other FDs in GC
There is no such FD.
Gc : {A B, A C, D A, D E}.
This is equivalent to : {A BC, D AE}.
Since, Minimal Cover of F = Minimal Cover of G, these are equivalent sets.
Sol:
Step 1 Apply decomposition rule to each FD in F, such that FDs have only
singletons as dependents.
Fc :{A B, A C, B C, B A, C A, C B}
Step 2 Eliminate extraneous attributes, if any, from the determinants of all FDs
Since, all determinants are only singletons, so Fc remains unchanged.
Fc :{A B, A C, B C, B A, C A, C B}
Step 3 Eliminate those FDs in Fc which are logically implied by other FDs in FC
So, FC = {A C, B A, C B}
P S Gill
4
All these Canonical Covers are equivalent sets, since their closure is same.
Sol:
(a) {E}+ = EA Since E A holds
= EABC Since A BC holds
= EABCD Since B D holds
{ A BC , CD E , B D, E A, A B , A C , E B,
E C, CD A , CD B , …….}
Sol:
Candidate keys: {A,B}, {B,C}
P S Gill
5
So, the candidate keys are:- {E}, {A}, {B,C}, {C,D}
All attributes are prime attributes.
The relation is in 3 NF
10. Define 1 NF, 2 NF, 3 NF and BCNF. Out of 3 NF & BCNF, which one is
stronger Normal Form? Justify.
Sol:
1 NF: A relation schema R is in 1 NF, if all its attributes have atomic domains
BCNF Decomposition
R1 (C, E, H)
CE H
P S Gill
6
R2(A, B,C,D,E,F,G) All FDs have only candidate keys on left side
12. Consider a Relation Schema R (A,B,C,D,E) with the following set of FDs
holding on it:-
A B, C D, D E
Determine:-
(a) Its Candidate Keys.
(b) What Normal Form is it in?
(c) Decompose it into a BCNF Schema, if it is not already in BCNF.
R2 (A, B, C, D) PK (A, C)
A B, C D
R2 has partial FDs A B, C D
It is in 1NF only
P S Gill
7
CD
R222 (A,C) PK (A, C) BCNF
R1 (D, E) PK (D)
D E
R1 (A, B) PK (A)
A B
R2 (A, C, D, E) {A,C,E} and {A,D,E} are candidate keys
FK (A) references R1
(a) Is it in 3NF?
(b) Is it in BCNF?
P S Gill
8
Sol:
{A,C,D}+ = ACDBE
{E,D,C}+ = EDCAB
{B,C,D}+ = BCDEA
R has FDs AB, BCE, EDA that do not have candidate keys on their
left side. Therefore, R is not in BCNF.
For the FDs as indicated with each sub-schema, determine the Normal
Form of each sub-schema. If it is not in BCNF, decompose it into BCNF.
P S Gill
9
BCNF Decomposition of R
R1 (C,D,A ) PK (C)
CDA
R2 (B, C) PK (B)
BC FK(C) references R1
R1 (B, C ) PK (B)
BC
R2 (D, A) PK (D)
DA
R3 (B, D) PK (B, D)
FK (B) references R1
FK (D) references R2
R1 (B, C, D ) PK (B, C)
BCD
R2 (A, B, C) PK (A)
ABC FK (B,C) references R1
P S Gill
10
Candidate Keys : {A, B} , {C,D}
All are prime attributes; So R is at least in 3 NF
There are two FDs CA, DB which do not have candidate key as
determinants; therefore it is not in BCNF.
BCNF Decomposition of R
R1 (C, A ) PK (C)
CA
R2 (D, B) PK (D)
DB
R3 (C, D) PK (C, D)
FK (C) references R1
FK (D) references R2
R1 (D, A ) PK (D)
DA
R2 (B, C, D) PK (B, C, D)
No non-trivial FD FK (D) references R1
(a) R1 (A,B,C)
(b) R2( A,B,C,D)
(c) R3(A,B,C,E,G)
(d) R4(D,C,E,G,H)
(e) R5(A,C,E,H)
Sol:
(a) R1 (A,B,C)
P S Gill
11
FDs holding on the schema: ABC, ACB, BCA
Minimal Cover: ABC, ACB, BCA
Candidate Keys: {A,B} , {B,C}, {C,A}
Strongest Normal Form: Since all FDs have only candidate keys as
determinants, R1 is already in BCNF
(b) R2 (A,B,C,D)
FDs holding on the schema: ABC, ACB, BCA, BD
Minimal Cover: ABC, ACB, BCA, BD
Candidate Keys: {A,B} , {B,C}, {C,A}
Strongest Normal Form: It has a partial FD BD, so R2 is in 1 NF
BCNF Decomposition:-
(c) R3 (A,B,C,E, G)
FDs holding on the schema: ABC, ACB, BCA, EG
Minimal Cover: ABC, ACB, BCA, EG
Candidate Keys: {A,B,E} , {B,C,E}, {C,A,E}
Strongest Normal Form: It has a partial FD EG, so R3 is in 1 NF
BCNF Decomposition:-
R31 (E, G) PK (E)
EG
R32 (A,B,C,E) PK {A,B,E} or {B,C,E} or {C,A,E}
ABC, ACB, BCA FK (E) references R31
R32 has the FDs, which do not have candidate key on left side,
it is not in BCNF
Decompose R32 as follows;-
R321 (A,B,C) PK {A,B} or {B,C} or {C,A}
ABC, ACB, BCA
R322 (A,B,E) PK {A,B,E}
FK {A,B} references R321
FK (E) references R31
So, BCNF decomposition: R31(E,G), R321 (A,B,C), R322 (A,B,E)
(d) R4 (D,C,E,G,H)
P S Gill
12
Candidate Keys: {D, C, E, H}
Strongest Normal Form: It has a partial FD EG, so R4 is in 1 NF
BCNF Decomposition:-
R41 (E, G) PK (E)
EG
(e) R5 (A, C, E, H)
FDs holding on the schema:
Minimal Cover: { AC E }
Candidate Keys: {A, C, H}
Strongest Normal Form: 1 NF since, it has partial FD AC E
BCNF Decomposition:-
R51 (A, C, E) PK (A, C )
EG
Sol: R (A,B,C,D,E,G)
ABC, ACB, ADE, BD, BCA, EG
{A,C}+ = ACBDEG
{B,C}+ = BCADEG
{A,B}+ = ABCDEG
Candidate Keys : {A, B} , {B,C}, {C,A}
P S Gill
13
19. Suppose R(A,B,C) has an FD BC holding on it and A is its Candidate
Key. Can it be in BCNF? If yes, under what conditions?
20. For R (A,B,C,D) with Primary Key {A,B}, state conditions for R to be in 2NF but
not in 3NF.
Sol:
R will be in 2 NF, but not in 3NF iff
AB C & C D holds
or AB D & D C holds
P S Gill
14
*(BAC, BDE)
*(CAB,CDE)
*(ABCD, AE)
*(BCDA, BE)
*(CABD, CE)
and so on ……….
D = { A B, BHI, CGH}
Find whether the following are members of D+ :-
A CGHI
A HI
BH
A CG
AH
Sol:
Since A B, so by complementation A (R – A – B)
CGHI
Since B HI
CG H where H HI and HI CG = 0
Therefore, by Coalescence Rule, B H
P S Gill
15
(i) R1 = (V, W, X)
R2 = (V, Y, Z)
(ii) R1 = (V, W, X)
R2 = (X, Y, Z)
Sol:
(I) Since V WX, V is a Candidate Key of R1
Now, R1 R2 = {V} which is candidate key of R1
It is a loss-less-join decomposition of R.
Q. 2.
Given Schema R = (A, B, C, D, E, F, G, H, I, J) and FDs
F: {AB C, A DE, B F , F GH and D IJ }
A DEIJ
P S Gill
16
B FGH
R1 (A, D, E, I, J)
A DE, D IJ
R2 (B, F, G)
B F, F GH
R3 (A, B, C)
AB C
R11 (D, I, J)
D IJ
R12 (A, D, E)
A DE
P S Gill
17
Similarly, we can determine that {E}, {C,D} and {B,C} are other
Candidate Keys of R.
sol:-
Since, A BC, therefore, A is Candidate key of (A, B, C)
P S Gill
18
R1 (A, B, C)
R2 (A, D, E)
Only MVD that holds on R1 is A BC.
Since A BC = R1, therefore it is a trivial MVD and thus R1 is in 4 NF.
R1 (B, C, D)
R2 (B, A, E) on the basis of MVD B CD
and
R1 (E, A, D)
R2 (E, B, C) on the basis of MVD E AD
P S Gill
19
Case Studies
P S Gill
20
Now determine the following for schema Bank_Account:-
P S Gill