0% found this document useful (0 votes)
39 views243 pages

Dbms Notes Be Sem 5wbeb

Uploaded by

sangamsri88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views243 pages

Dbms Notes Be Sem 5wbeb

Uploaded by

sangamsri88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 243

lOMoARcPSD|45233824

DBMS Notes BE sem 5wbeb

M Tech (Birla Institute of Technology, Mesra)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Sangam Srivastav ([email protected])
lOMoARcPSD|45233824

Semester : V

Academic Year : 2020 (MO-20)

Course : DATABASE MANAGEMENT SYSTEM

Attendance : Minimum 75% failing which students may not


be allowed to appear in the semester examinations

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Di
scl
aimerst
atement

"This course material booklet contains information compiled from variety of


sources including standard text books and electronic resources for academic
benefits of students to be used by them only as complementary to class room
lectures. Citations of references to the text are made wherever possible.
These notes are not meant for any commercial purpose and are solely meant
for internal circulation."

Syllabus : DATABASE MANAGEMENT SYSTEM

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Module – I
Introduction: Purpose of Database System; View of Data, Data Models, Database
Languages, Transaction Management, Database Architecture, Database Users
Administrator
Database Design and Entity - Relational Model: Overview of Design process, E-R
Model, Constraints, E – R Diagram, Week Entity Sets, Extended E – R Features

Module – II
Relational Model: Structure of Relational Database, Fundamental Relational Algebra,
Operation, Additional Operations, Tuple Relational

Module – III
SQL & Advanced SQL: Data Definition, Basic Structure of SQL Queries, Set
Operations, Aggregate Functions, Null Values, Nested Sub – Queries, Complex Queries,
Views, Modification of Database, SQL data types & schemas, Integrity Constraints,
Authorization, Embedded SQL

Module – IV
Relational Database Design: Atomic domains and First Normal Form, Decomposition
using Functional Dependencies, Decomposition using Multivalued Dependencies , more
normal forms

Module – V
Indexing and Hashing: Basic Concepts, Ordered Indices, B+ Tree Index Files, B Tree
Index Files, Multiple Key Access, Hashing, Comparison of Ordered Indexing and
Hashing

Module – VI

Query Processing: Overview, Measure of Query Cost, , Selection Operation, sorting join
operations

Module – VII
Transaction & Concurrency Control: Transaction Concepts & ACID Properties,
Transaction States, Concurrent Executions, Serializability & Its Testing, Recoverability,
Introduction to Concurrency Control, Locked Base Protocol & Deadlock Handling, Time
stamp Based Protocols, Validation-Based Protocols, Multiple Granualarity.
Text Book:
1. A.Silberschatz et.al - Database System Concepts, 5th Edn, Tata Mc-Graw Hill,
New Delhi – 2000.
Reference Books:
1. Date C.J. - An Introduction to Database System, Pearson Education, New Delhi-
2005
2. R.Elmasri, Fundamentals of Database Systems, Pearson Education, New Delhi,

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

COURSE PLAN

Depar
tment :Comput
erSci
ence
Subj
ect : DATABASE MANAGEMENT SYSTEM
Semest
er& br
anch :I
II
No.ofper
iodshour
s/week :3 Theor
y: yes Labs:
Tot
alNo.ofl
ect
ures :40

RecommendedCour
seBooks

Text Book:
1. A.Silberschatz et.al - Database System Concepts, 5 th Edn, Tata Mc-Graw Hill,
New Delhi – 2000.
Reference Books:

2. Date C.J. - An Introduction to Database System, Pearson Education, New


3. R.Elmasri, Fundamentals of Database Systems, Pearson Education, New
Delhi,

Lect
ureNo. Topi
c(s)t
obecover
ed
Introduction: Purpose of Database System
1

2 View of Data,

3 Data Models

4 Data Models
Database Languages
5
Transaction Management
6

7 Storage Management

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

8 Database Users Administrator

9 History of Database Systems

10 Relational Model: Basic Concepts

11 Design issues Mapping, Constraints

12 Keys, E – R Diagram

13 Keys, E – R Diagram

14 Week Entity Sets, Extended E – R Features,


Week Entity Sets, Extended E – R Features,
15
Design of an E – R Database Schema
16
Deduction of an E – R Schema to Tables.
17

18 Deduction of an E – R Schema to Tables.


Integrity Constraints: Domain Constraints, Referential Integrity, Assertions,
19
Triggers & Functional Dependencies .
Relational Database Design: Pitfalls in Relational – Database Design,
Functional Dependencies, Decomposition, Desirable Properties of
Decomposition, Normalization (INF- DKNF), BCNF & Its Comparison
with 3NF.
20 Assertions, Triggers

21 Functional Dependencies

22 Pitfalls in Relational – Database Design

23 Decomposition,
Desirable Properties of Decomposition,
24
Normalization
25

26 Normalization
BCNF & Its Comparison with 3NF.
27

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

BCNF & Its Comparison with 3NF.


28
Query Processing: Measure of Query Cost
29

30 Evaluation of Expressions

31 Evaluation of Expressions
Selection Operation.
32

33 Transaction Concepts & ACID Properties

34 Transaction States, Concurrent Executions

35 Serializability & Its Testing, Guarantee Serializability,

36 Recoverability, Introduction to Concurrency Control,


Locked Base Protocol & Deadlock Handling.
37

38 SQL & Other Relational Languages


Relational Model
39
Relational Model
40

Test Coverage :
Test1:
Introduction: Purpose of Database System; View of Data, Data Models, Database
Languages, Transaction Management, Storage Management, Database Users
Administrator, History of Database Systems.
Database Design and Entity - Relational Model: Basic Concepts, Design issues
Mapping, Constraints, Keys, E – R Diagram, Week Entity Sets, Extended E – R Features,
Design of an E – R Database Schema, Deduction of an E – R Schema to Tables.

Test2:

Integrity Constraints: Domain Constraints, Referential Integrity, Assertions, Triggers &


Functional Dependencies .
Relational Database Design: Pitfalls in Relational – Database Design, Functional
Dependencies, Decomposition, Desirable Properties of Decomposition, Normalization
(INF- DKNF), BCNF & Its Comparison with 3NF.

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Outcome /Benefits of the course:

One can understand the internal details of Relational Algebra. You can write a
efficient query based on many algorithms. This course will help students in developing
software and give them the solution of how to avoid deadlock in real life problem like
banking system, Airline Management System etc., concepts for transaction processing
and the operations relevant to transaction processing, types of failures that may occur
during transaction execution, concurrency control, distributed databases and centralized
databases. Design and build a relational database for a small business application form
given user requirements
This course presents problem-solving logic and skills for analyzing and developing a
database. Topics covered will include database design, administration and application
development. This course is designed to give the student the understanding necessary to
work efficiently within the database environment. Optimize a database for commercial
operation

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Unit I
Introduction: Purpose of Database System; View of Data, Data Models, Database
Languages, Transaction Management, Storage Management, Database Users
Administrator, History of Database Systems.

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

DATABASE SYSTEM CONCEPTS AND ARCHITECTURE

Data Base Management System (DBMS):


DBMS (Database Management System ) consists of a collection of
interrelated data and a collection of programs to access that data .
Database:
Collection of data, programs and files.
Why Database (Needs and Benefits) – An enterprise chooses to
store its operational data in an integrated data base because,
broadly, a DBS provides the enterprise with centralized control of
its operational data as data is one of its most valuable assets.
Benefits of database approach:
 Redundancy can be reduced : If DBA is aware of data
requirements for the applications, it could be controlled to some
extent but not completely eliminated. Redundancy results into
wastage of storage space.
 Inconsistency can be avoided: Ex. Employee EMP3 works in
DEP9(, is represented by two distinct entries in the DB, and that
the system is not aware of this duplication; in other words, the
redundancy is not controlled. Then, there will be some
occasions on which the two entries will not agree, that is, when
one and only one has been updated. At such times the DB is said
to be inconsistent. A DB that is in an inconsistent state is
capable of supplying incorrect or conflicting information.

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

 The Data can be Shared: It might be possible to satisfy the


data requirements of new applications without having to create
any additional stored data.
 Standards can be enforced: With central control of the DB, the
DBA can ensure that all applicable standards are observed in the
representation of data. Applicable standards might include any
or all of the following: Corporate, installation, departmental,
industry, national and international standards.
Standardizing data representation is particularly desirable as an
aid to data interchange, or migration of data between systems.
 Security restrictions can be applied : DBA 1) define security
rules to be checked wherever access is attempted to sensitive
data
2) can ensure that the only means of access data to the DB is
thru the proper channels.
 Integrity can be maintained: Problem of ensuring that the data
in the database is correct and accurate. Due to inconsistency and
lack of integrity redundancy exists in the stored data, DB may
become incorrect.
 Conflicting requirements can be balanced: Knowing the
overall requirements of the enterprise- the DBA can so structure
the system as to provide an overall service that is ‘best for the
enterprise’.

10

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Data Independence:
Data Independence can be defined as the capacity to change
the schema at one level of a database system without having to
change the schema at the next higher level.
For example:
method of representation of alphanumeric data (e.g.,
changing date format to avoid Y2000 problem)
method of representation of numeric data (e.g., integer vs.
long integer or floating-point)
units (e.g., metric vs. furlongs)
We can Define two types of data Independence:
Logical Data Independence:
Capacity to change the conceptual schema with out having to
change external schemas or application programs. We may change
the conceptual schema to expand the database (by adding a record
type or data item) , or to reduce the database (by removing a record
type or data items).
Physical Data Independence:
Capacity to change the Internal schema without having to
change the conceptual or external schema. For example , by
creating additional access structures –to improve the performance
of retrieval or update.

Advantages of using a DBMS

11

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

a) Reduced data redundancy.


b) Reduced updating errors and increased consistency.
c) Greater data integrity and independence from applications
programs.
d) Improved data access to users through use of host and query
languages.
e) Improved data security.
f) Reduced data entry, storage, and retrieval costs.
g) Facilitated development of new applications program

Disadvantages of using a DBMS


a) Database systems are complex, difficult, and time-
consuming to design.
b) Substantial hardware and software start-up costs.
c) Damage to database affects virtually all
applications programs.
d) Extensive conversion costs in moving form a file-
based system to a database system.
e) Initial training required for all programmers and
users

Sr. File Processing System Database Management System


No
1 A file-processing system only A database coordinates the
coordinates physical access to physical and logical access to the
the data data.
2 A file-processing system A DBMS reduces the amount of
introduces the data duplication
amount of data duplication.
3 A file-processing system only A DBMS is designed to allow
allows predetermined access to flexibility in what queries give
data (by specific access to the data.
compiled programs).
4 A file processing system is A DBMS is designed to

12

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

much more coordinate and permit multiple


restrictive in simultaneous data users to access data at the same
access time
5 A file processing system does A DBMS supports improvement
not of
support searching and the searching and the
implementation of right implementation of right
management management.
6 There is no way to restrict A DBMS restricts unauthorized
unauthorized access
access in a file processing
system.
7 There is no way to recover a A DBMS supports backup and
lost file in recovery from system crashes
file-processing system.
8 There is no way to enforces A DBMS enforces integrity
integrity constraints
constraints in a file processing
system.

View of Data :
A major purpose of a DBS is to provide users with an abstract
view of the data. That is, the system hides certain details of how
the data are stored and maintained.
Data Abstraction-
Developers hide the complexity from users thru several levels of
abstraction, to simplify user’s interactions with the system:

13

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

o Physical Level- The lowest level of


abstraction describes how the data are actually stored. The
physical level describes complex low-level data structures in
detail.
o Logical Level- The next- Higher level of
abstraction describes what data are stored in the DB, and what
relationships exist among those data. DBA, who must decide
what information to keep in the database, use the logical level of
abstraction.
o View Level- The highest level of abstraction
describes part of the entire database.

Ex. A banking enterprise may have record types


 Account, with fields account-number and balance
 Employee, with fields employee-name
and salary
At the Physical level, a customer, account, or employee can be
described as a block of consecutive storage locations (words and
bytes). The language compiler hides this level of detail from
programmers. DBA may be aware of certain details of the physical
organization of the data.
At the logical level, each record is described by a type definition
and interrelationship of these records types is defined as well.

14

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Programmers using a programming language work at this level of


abstraction. Similarly, DBA usually work at this level of
abstraction.
Finally, at the view level, computer users see a set of application
programs that hide details of the data types. Similarly, at the view
level, several views of the database are defined, and db users see
these views. In hiding details of the logical level of the db, the
views also provide a security mechanism to prevent users from
accessing certain parts of the db. E.g., tellers in a bank see only the
part of the database that has information on customer accounts;
they cannot access information about salaries of employees.

An Architecture for a Database System / Phases of Database


Design
Three levels- Internal, Conceptual and External Levels
 The Internal Level is the closest to Physical storage- i.e., it is
the one concerned with the way the data is physically stored;

15

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Gives Storage View(Representing the total database as


physically stored)
 The External level is the one closest to the users- i.e., it is the
one concerned with the way the data is viewed by individual
users; and Gives individual user view
 The Conceptual level is a “level of indirection” between the
other two. Depicts Community user
An Example of three levels
EXTERNAL (PL/I) EXTERNAL (COBOL)
DCL 1 EMPP, 01 EMPC
2 EMP# CHAR(6), 02 EMPNO PIC X(6).
2 SAL FIXED 02 DEPTNO PIC X(4).
BIN(31);
CONCEPTUAL
EMPLOYEE
EMPLOYEE_NUMBER
CHARACTER (6)
DEPARTMENT_NUMBER
CHARACTER (4)
SALARY NUMERIC
(5)
INTERNAL
STORED_EMP LENGTH=20
PREFIX TYPE=BYTE(6), OFFSET=0

16

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

EMP# TYPE=BYTE(6), OFFSET=6,


INDEX=EMPX
DEPT# TYPE=BYTE(4), OFFSET=12
PAY TYPE=FULLWORD, OFFSET=16

END USER

USER 1
... USER 2

EXTERNAL
EXTERNAL VIEW 1 EXTERNAL VIEW 2
LEVEL

External/conceptual mapping

CONCEPTUAL CONCEPTUAL
LEVEL SCHEMA

Conceptual/internal mapping

INTERNAL INTERNAL SCHEMA


LEVEL

Data Data Data

STORED DATABASE

17

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Instances and Schemas – DB change over time as information is


inserted and deleted.
Instance:
The collection of information stored in the database at a particular
moment is called an instance of the database.
Schema:
The overall design of the database is called the database schema.
Schemas are changed infrequently, if at all.
Example. The values of the variables in a program at a point in
time correspond to an instance of database schema.
DBS have several schemas, partitioned acc. to the level of
abstraction. The physical schema describes the db design at the
physical level, while the logical schema describes the db design at
the logical level. A db may also have several schemas at the view
level, sometimes called subschemas, that describe different views
of the db.
Database Languages- A DBS provides a data definition
language (DDL) to specify the database schema and a data
manipulation language (DML) to express database queries
and updates and data control language (DCL).

18

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

a) Data definition language – This is the means by


which the content and format of data to be stored is
described and structure of the db is defined, including
relationships between records and indexing strategies.
Often known as a schema.DDL is essentially the link
between the logical and physical views of the db.
Example: in SQL
create table account (account-number char(6),
balance integer)
Primary Functions of DDL
 Describe the schema and subschema
 Describe the fields in each record and the record’s
logical name
 Describe the data type and name of each field
 Indicates the keys on the record
 Provide the data security restrictions
 Provide for logical and physical data independence
 Provide means of associating related data
b) Data Manipulation Language- Data manipulation is
 The retrieval of information stored in the db
 The insertion of new information into the db

19

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

 The deletion of information from the db


 The modification of information stored in the db
A DML is a language that enables users to access or
manipulate data as organized by the appropriate data
model. They are basically 2 types:
 Procedural DMLs require a user to specify what data
are needed and how to get those data. Ex. C Language
Commands, c++, Java, Basic, fortran, cobol, Pascal
 Non Procedural DMLs require a user to specify
what data are needed without specifying how to get
those data.Example Lisp,Prolog
Procedural DML and Non Procedural DML Verbs
are : Delete , Sort, Insert, Display, Add, select etc.

c) Data Control Language: Used for controlling data and


access to the databases.
Primary Functions of DCL
 Aid the physical administration of the db such as
dumping, logging, recovery, reorganization, db
initialization, export and import of data etc.

20

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

 Help the DBA and system designer to coordinate


and keep track of the data on the db such as DD.
 DCL commands: rollback, grant, alter, revoke etc.

Database Administrator – A person who has central


control over the data and programs (System) is called a
database administrator (DBA).
Functions of DBA
 Schema definition: The DBA creates the original
db schema by executing a set of data definition
statements in the DDL.
 Storage structure and access-method definition
and strategy
 Schema and physical-organization modification:
DBA carries out changes to the schema and
physical organization to reflect the changing
needs of the organization, or to alter the physical
organization to improve performance.
 Granting of authorization for data access: To
prevent unauthorized access to the data in db

21

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

DBA grants different types of authorization to db


users.
 Routine maintenance
- Periodic backup either on tapes or onto
remote servers, to prevent loss of data in
case of disasters.
- Database performance tuning and
optimization to ensure that enough free disk
space is available for normal operations, and
upgrading disk space as required.
- Monitoring jobs running on the db and
ensuring that performance is not degraded
by very expensive tasks submitted by some
users.

Database Users
 Naive Users who interact with the system by invoking
one of the application programs that have been written
previously.For example , a bank teller who needs to

22

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

transfer $50 from account A to accout B invokes a


program called transfer.
 Application programmers are computer professional
who write application programs.
 Sophisticated users interact with the system without
writing programs. For example analyst can see total
sales by region (north,south,East, west) , or by product
, or by product and region both.
 Specialized users are users who write specialized
database application. Applications are computer aided
design systems, knowledge base and expert system.
DATA MODELS

1) Entity-Relational model
2) Relational model
3) Network model
4) Hierarchical model

Entity Relational Model


 Primarily a database design tool.
 Complements the relational data model concepts
 Represented in an entity relationship diagram (ERD)
 Based on entities, attributes, and relationships

Advantages of E.R. Model

23

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

 Exceptional conceptual simplicity


 Visual representation
 Effective communication tool
 Integrated with the relational data model

Disadvantages of E.R. Model


 Limited constraint representation
 Limited relationship representation
 No data manipulation language
 Loss of information content
The Relational Model
• Consists of tables; links among entities are maintained
with foreign keys
• Advantages of relational databases
– Same advantages of a network database without
the complications
– Easier to conceptualize and maintain
– Virtually all DBMSs offered for microcomputers
accommodate the relational model
The Network Model
• Allows a record to be linked to more than one parent
• Supports many-to-many relationships
• Advantage of the network model
– Reduced data redundancy

• Disadvantages of the network model
– Complicated to build and difficult to maintain
– Difficult to maintain and navigate

The Hierarchical Model

24

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

• Records are related hierarchically—each category is a


subcategory of the next level up

Advantages

It Promotes data security


It Promotes data independence
It Promotes data integrity(parent/child relationship)
Useful for large databases
Useful when users require a lot of transactions which are
fixed over time
Suitable for large storage media

• Disadvantages of hierarchical databases

– To retrieve a record, a user must start at the root


and navigate the hierarchy.
– If a link is broken, the entire branch is lost.
- Requires considerable data redundancy
Six Major steps that need to be taken in setting up a
database for a particular enterprise.
1) Define the high level requirements of the enterprise
(System requirements Specification )
2) Define a model containing all appropriate types of
data and data relationships.
3) Define the integrity constraints on the data.
4) Define the physical level
5) For each known problem to be solved on regular basis
6) Create/ initialize the database

25

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Responsibilities of a database manager/ Storage


Manager
A storage manager is a program module that provides the
interface between the low level data stored in the
database and the application programs and queries
submitted to the system. The storage manager is
responsible for the interaction with file manager. The
storage manager translates the various DML statements
into low level file system commands. Thus Storage
manager is responsible for storing, retrieving, and
updating data in the database.
The storage manager components include:
 Authorization and integrity manager
 Transaction manager (which ensure database
remains in a consistent state)
 File manager (allocation of space on disk )
 Buffer manager (fetching data from disk into main
memory, and deciding what data to cache in main
memory)

The Storage manager implements several data


structure:
 Data files
 Data dictionary
 Indices

26

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

27

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

History of Database Systems

1950s and early 1960s:


Data processing using magnetic tapes for storage
 Tapes provide only sequential access
Punched cards for input
Late 1960s and 1970s:
Hard disks allow direct access to data
Network and hierarchical data models in widespread use
Ted Codd defines the relational data model
 Would win the ACM Turing Award for this work
 IBM Research begins System R prototype
 UC Berkeley begins Ingres prototype
High-performance (for the era) transaction processing

1980s:
Research relational prototypes evolve into commercial systems
 SQL becomes industrial standard
Parallel and distributed database systems
Object-oriented database systems
1990s:
Large decision support and data-mining applications
Large multi-terabyte data warehouses
Emergence of Web commerce
2000s:
XML and XQuery standards
Automated database administration

28

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Unit II

Database Design and Entity - Relational Model: Basic Concepts, Design issues
Mapping, Constraints, Keys, E – R Diagram, Week Entity Sets, Extended E – R Features,
Design of an E – R Database Schema, Deduction of an E – R Schema to Tables.

29

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

DATA MODEL:

Collection of concepts that can be used to describe the structure if a database.

Use of High – Level Conceptual Data Models

ER Data model:

ER data model is a high level – data model. ER data model perceives the real
world as consisting of basic objects, called entities and relationship among
them.

Entities and Entity Sets

 An entity is an object that exists and is distinguishable from other objects. For
instance, John Harris with S.I.N. 890-12-3456 is an entity, as he can be uniquely
identified as one particular person in the universe.
 An entity may be concrete (a person or a book, for example) or abstract (like a
holiday or a concept).
 An entity set is a set of entities of the same type (e.g., all persons having an
account at a bank).
 Entity sets need not be disjoint. For example, the entity set employee (all
employees of a bank) and the entity set customer (all customers of the bank) may
have members in common.
 An entity is represented by a set of attributes.
o E.g. name, S.I.N., street, city for ``customer'' entity.
o The domain of the attribute is the set of permitted values (e.g. the
telephone number must be seven positive integers).
 Formally, an attribute is a function which maps an entity set into a domain.
o Every entity is described by a set of (attribute, data value) pairs.
o There is one pair for each attribute of the entity set.
o E.g. a particular customer entity is described by the set {(name, Harris),
(S.I.N., 890-123-456), (street, North), (city, Georgetown)}.

An analogy can be made with the programming language notion of type definition.

 The concept of an entity set corresponds to the programming language type


definition.
 A variable of a given type has a particular value at a point in time.
 Thus, a programming language variable corresponds to an entity in the E-R
model.

We will be dealing with five entity sets in this section:

 branch, the set of all branches of a particular bank. Each branch is described by
the attributes branch-name, branch-city and assets.

30

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

 customer, the set of all people having an account at the bank. Attributes are
customer-name, S.I.N., street and customer-city.
 employee, with attributes employee-name and phone-number.
 account, the set of all accounts created and maintained in the bank. Attributes are
account-number and balance.
 transaction, the set of all account transactions executed in the bank. Attributes are
transaction-number, date and amount.

Relationships & Relationship Sets

A relationship is an association between several entities.

A relationship set is a set of relationships of the same type.

Formally it is a mathematical relation on (possibly non-distinct) sets.

A relationship set is a mathematical relation among n  2 entities, each taken from


entity sets

{(e1, e2, … en) | e1  E1, e2  E2, …, en  En}

where (e1, e2, …, en) is a relationship

Entity Sets customer and loan

31

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Customer

Cust-id Name Street City


321-12-3123 Jones Main Harrison
019-28-3746 Smith North Rye
677-89-9011 Hayes MAin Harrison

Loan

Loan-no Amount
L-17 1000
L-23 2000
L-15 1500

Attributes

Consider the entity set employee with attributes employee-name and phone-number.

 We could argue that the phone be treated as an entity itself, with attributes phone-
number and location.
 Then we have two entity sets, and the relationship set EmpPhn defining the
association between employees and their phones.
 This new definition allows employees to have several (or zero) phones.
 New definition may more accurately reflect the real world.
 We cannot extend this argument easily to making employee-name an entity.

32

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

The question of what constitutes an entity and what constitutes an attribute


depends mainly on the structure of the real world situation being modeled,
and the semantics associated with the attribute in question.

33

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

34

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Mapping Constraints

An E-R scheme may define certain constraints to which the contents of a database must
conform.

 Mapping Cardinalities: express the number of entities to which another entity


can be associated via a relationship. For binary relationship sets between entity
sets A and B, the mapping cardinality must be one of:
1. One-to-one: An entity in A is associated with at most one entity in B, and
an entity in B is associated with at most one entity in A.
2. One-to-many: An entity in A is associated with any number in B. An
entity in B is associated with at most one entity in A.
3. Many-to-one: An entity in A is associated with at most one entity in B. An
entity in B is associated with any number in A.
4. Many-to-many: Entities in A and B are associated with any number from
each other.

The appropriate mapping cardinality for a particular relationship set depends


on the real world being modeled. (Think about the CustAcct relationship...)

One to one One to many

Many to one Many to many


 Existence Dependencies: if the existence of entity X depends on the existence of
entity Y, then X is said to be existence dependent on Y. (Or we say that Y is the
dominant entity and X is the subordinate entity.)

35

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

For example,

o Consider account and transaction entity sets, and a relationship log


between them.
o This is one-to-many from account to transaction.
o If an account entity is deleted, its associated transaction entities must also
be deleted.
o Thus account is dominant and transaction is subordinate.

Keys

Differences between entities must be expressed in terms of attributes.

 A superkey is a set of one or more attributes which, taken collectively, allow us


to identify uniquely an entity in the entity set.
 For example, in the entity set customer, customer-name and S.I.N. is a superkey.
 Note that customer-name alone is not, as two customers could have the same
name.
 A superkey may contain extraneous attributes, and we are often interested in the
smallest superkey. A superkey for which no subset is a superkey is called a
candidate key.
 In the example above, S.I.N. is a candidate key, as it is minimal, and uniquely
identifies a customer entity.
 A primary key is a candidate key (there may be more than one) chosen by the
DB designer to identify entities in an entity set.

An entity set that does not possess sufficient attributes to form a primary key is called a
weak entity set. One that does have a primary key is called a strong entity set.

For example,

 The entity set transaction has attributes transaction-number, date and amount.
 Different transactions on different accounts could share the same number.
 These are not sufficient to form a primary key (uniquely identify a transaction).
 Thus transaction is a weak entity set.

Relationship Sets

A relationship is an association among several entities

Example:
Hayes depositor A-102
customer entity relationship set account entity

A relationship set is a mathematical relation among n  2 entities, each taken from entity
sets {(e1, e2, … en) | e1  E1, e2  E2, …, en  En}

36

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

where (e1, e2, …, en) is a relationship

Example: (Hayes, A-102)  depositor

Relationship Set borrower

An attribute can also be property of a relationship set.

For instance, the depositor relationship set between entity sets


customer and account may have the attribute access-date

Degree of a Relationship Set


Refers to number of entity sets that participate in a relationship set.
Relationship sets that involve two entity sets are binary (or degree two). Generally, most
relationship sets in a database system are binary.
Relationship sets may involve more than two entity sets
E.g. Suppose employees of a bank may have jobs (responsibilities) at multiple branches,
with different jobs at different branches. Then there is a ternary relationship set between
entity sets employee, job and branch
Relationships between more than two entity sets are rare. Most relationships are binary.

ER DIAGRAMS

The Entity Relationship Diagram

We can express the overall logical structure of a database graphically with an E-R
diagram.

37

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Its components are:

 rectangles representing entity sets.


 ellipses representing attributes.
 diamonds representing relationship sets.
 lines linking attributes to entity sets and entity sets to relationship sets.
 Double Ellipse, which represent multi valued attribute
 Double Lines , which indicate total participation of an entity in a relationship set
 Double rectangle , which represent week entity sets.

38

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

39

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

E-R Diagram With Composite, Multivalued, and Derived Attributes

a)
Simple Attributes: The attribute that cannot be further divided into smaller parts
and represents the basic meaning is called a simple attribute. e.g. The “First
name”, “Last Name” attributes of a EMPLOYEE entity represent a simple
attribute.

Composite Attributes: The attributes that can be further divided into smaller
units and each individual unit contains a specific meaning. For example, an
attribute name of an entity set EMPLOYEE can be sub-divided into First-name,
Middle-initial, and Last-name.
b)
Single Valued Attribute: The attributes having single value for a particular entity
is called as single-valued attribute. e.g. age is a single valued attribute of a
EMPLOYEE entity.
Multi-valued Attributes: Attributes that have more than one values for a
particular entity is called a multi-valued attribute. Different entities may have
different number of values for these kind of attributes. For multi-valued
attributes we must also specify the minimum and maximum number of values
that can be attached. e.g. phone-number for a EMPLOYEE entity is a multi-valued
attribute.
c) Derived Attributes: The attributes that are not stored directly but can be derived
from stored attributes are called derived attributes. e.g. total-salary of an entity
EMPLOYEE can be calculated from basic-salary attribute.
d) Null Value: An attribute takes null value when an entity does not have a value
for it. The null value indicate “not applicable”- that is, that the value does not
exist for the entity. For example, one may have no middle name. Null can also
designate that an attribute value is unknown. An unknown value may be either
missing (the value does exist, but we do not have that information) or not known
(we do not know whether or not the value actually exists).

40

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Relationship Sets with Attributes

41

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

42

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

43

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

One to One

44

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

45

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

One to many

46

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

47

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Many to one

48

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

49

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

May to many

Summary of Symbols Used in E-R Notation

 Extended E-R diagrams allowing more details/constraints in the real world to be


recorded.
o Composite attributes.
o Derived attributes.
o Subclasses and superclasses.
o Generalization and specialization.

Roles in E-R Diagrams

The function that an entity plays in a relationship is called its role. Roles are normally
explicit and not specified.

50

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

They are useful when the meaning of a relationship set needs clarification.

For example, the entity sets of a relationship may not be distinct. The relationship works-
for might be ordered pairs of employees (first is manager, second is worker).

In the E-R diagram, this can be shown by labelling the lines connecting entities
(rectangles) to relationships (diamonds).

51

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

52

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

E-R diagram with role indicators

Participation of an Entity Set in a Relationship Set

Total participation (indicated by double line): every entity in the entity set
participates in at least one relationship in the relationship set

E.g. participation of loan in borrower is total

every loan must have a customer associated to it via borrower

Partial participation: some entities may not participate in any relationship in the
relationship set

E.g. participation of customer in borrower is partial

53

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

54

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Weak Entity Sets in E-R Diagrams

An entity set that does not have a primary key is referred to as a weak entity set.

The existence of a weak entity set depends on the existence of a identifying entity set

It must relate to the identifying entity set via a total, one-to-many relationship set from
the identifying to the weak entity set

Identifying relationship depicted using a double diamond

The discriminator (or partial key) of a weak entity set is the set of attributes that
distinguishes among all the entities of a weak entity set.

The primary key of a weak entity set is formed by the primary key of the strong entity
set on which the weak entity set is existence dependent, plus the weak entity set’s
discriminator.

We depict a weak entity set by double rectangles.

We underline the discriminator of a weak entity set with a dashed line.

payment-number – discriminator of the payment entity set

Primary key for payment – (loan-number, payment-number)

55

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

56

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

E-R diagram with a weak entity set

Note: the primary key of the strong entity set is not explicitly stored with the weak
entity set, since it is implicit in the identifying relationship.

If loan-number were explicitly stored, payment could be made a strong entity, but
then the relationship between payment and loan would be duplicated by an implicit
relationship defined by the attribute loan-number common to payment and loan

In a university, a course is a strong entity and a course-offering can be modeled as a


weak entity

The discriminator of course-offering would be semester (including year) and section-


number (if there is more than one section)

If we model course-offering as a strong entity we would model course-number as an


attribute.

Then the relationship with course would be implicit in the course-number attribute

Nonbinary Relationships

This E-R diagram says that a customer may have several accounts, each located in a
specific bank branch, and that an account may belong to several different customers.

57

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

58

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

We allow at most one arrow out of a ternary (or greater degree) relationship to
indicate a cardinality constraint

E.g. an arrow from works-on to job indicates each employee works on at most one job
at any branch.

If there is more than one arrow, there are two ways of defining the meaning.

E.g a ternary relationship R between A, B and C with arrows to B and C could


mean

1. each A entity is associated with a unique entity from B and C or

2. each pair of entities from (A, B) is associated with a unique C entity,


and each pair (A, C) is associated with a unique B

Each alternative has been used in different formalisms

To avoid confusion we outlaw more than one arrow

Converting Non-Binary Relationships to Binary Form


In general, any non-binary relationship can be represented using binary relationships by
creating an artificial entity set. Replace R between entity sets A, B and C by an entity set
E, and three relationship sets:
1. RA, relating E and A

2.RB, relating E and B

3. RC, relating E and C

Create a special identifying attribute for E

Add any attributes of R to E

For each relationship (ai , bi , ci) in R, create

1. a new entity ei in the entity set E

2. add (ei , ai ) to RA

3. add (ei , bi ) to RB

4. add (ei , ci ) to RC

59

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

60

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Design of an E-R Database Scheme

The E-R data model provides a wide range of choice in designing a database scheme to
accurately model some real-world situation.

Some of the decisions to be made are

 Using a ternary relationship versus two binary relationships.


 Whether an entity set or a relationship set best fit a real-world concept.
 Whether to use an attribute or an entity set.
 Use of a strong or weak entity set.
 Appropriateness of generalization.
 Appropriateness of aggregation.

Use of Extended E-R Features

We have seen weak entity sets, generalization and aggregation. Designers must decide
when these features are appropriate.

 Strong entity sets and their dependent weak entity sets may be regarded as a
single ``object'' in the database, as weak entities are existence-dependent on a
strong entity.
 It is possible to treat an aggregated entity set as a single unit without concern for
its inner structure details.
 Generalization contributes to modularity by allowing common attributes of
similar entity sets to be represented in one place in an E-R diagram.

Specialization

Top-down design process; we designate subgroupings within an entity set that are
distinctive from other entities in the set.

These subgroupings become lower-level entity sets that have attributes or participate
in relationships that do not apply to the higher-level entity set.

Depicted by a triangle component labeled ISA (E.g. customer “is a” person).

Attribute inheritance – a lower-level entity set inherits all the attributes and
relationship participation of the higher-level entity set to which it is linked.

61

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Generalization

A bottom-up design process – combine a number of entity sets that share the same
features into a higher-level entity set.

Specialization and generalization are simple inversions of each other; they are
represented in an E-R diagram in the same way.

The terms specialization and generalization are used interchangeably.

Can have multiple specializations of an entity set based on different features.

62

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

E.g. permanent-employee vs. temporary-employee, in addition to officer vs. secretary


vs. teller

Each particular employee would be

a member of one of permanent-employee or temporary-employee,

and also a member of one of officer, secretary, or teller

The ISA relationship also referred to as superclass - subclass relationship

Design Constraints on a Specialization/Generalization


Constraint on which entities can be members of a given lower-level entity set.
condition-defined
E.g. all customers over 65 years are members of senior-citizen entity
set; senior-citizen ISA person.
user-defined
Constraint on whether or not entities may belong to more than one lower-level entity
set within a single generalization.
Disjoint
an entity can belong to only one lower-level entity set
Noted in E-R diagram by writing disjoint next to the ISA triangle
Overlapping
an entity can belong to more than one lower-level entity set

Completeness constraint -- specifies whether or not an entity in the higher-level entity


set must belong to at least one of the lower-level entity sets within a generalization.
total : an entity must belong to one of the lower-level entity sets
partial: an entity need not belong to one of the lower-level entity sets

Aggregation

Consider the ternary relationship works-on, which we saw earlier


Suppose we want to record managers for tasks performed by an
employee at a branch

63

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Relationship sets works-on and manages represent overlapping information


Every manages relationship corresponds to a works-on relationship
However, some works-on relationships may not correspond to any manages
relationships
So we can’t discard the works-on relationship
Eliminate this redundancy via aggregation
Treat relationship as an abstract entity
Allows relationships between relationships
Abstraction of relationship into new entity
Without introducing redundancy, the following diagram represents:
An employee works on a particular job at a particular branch
An employee, branch, job combination may have an associated manager

E-R Diagram With Aggregation

64

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

E-R Design Decisions

The use of an attribute or entity set to represent an object.


Whether a real-world concept is best expressed by an entity set or a relationship set.
The use of a ternary relationship versus a pair of binary relationships.
The use of a strong or weak entity set.
The use of specialization/generalization – contributes to modularity in the design.
The use of aggregation – can treat the aggregate entity set as a single unit without
concern for the details of its internal structure

E-R Diagram for a Banking Enterprise

65

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Representing Entity Sets as Tables

66

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

67

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

We use a table with one column for each attribute of the set. Each row in the table
corresponds to one entity of the entity set. For the entity set account We can add, delete
and modify rows (to reflect changes in the real world).

A row of a table will consist of an n-tuple where n is the number of attributes.

Actually, the table contains a subset of the set of all possible rows. We refer to the set of
all possible rows as the cartesian product of the sets of all attribute values.

We may denote this as

D1 X D2

for the account table, where D1and D2 denote the set of all account numbers and all
account balances, respectively.

In general, for a table of n columns, we may denote the cartesian product of D1,D2,
….,Dn by

D1 X D2 X…………X Dn-1 X Dn

A strong entity set reduces to a table with the same attributes

68

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

69

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Customer Table

Loan-number amount
L-11 900
L-14 1500
L-15 1500
L-16 1300
L-17 1000
L-23 2000
L-93 500

Loan table

Composite and Multivalued Attributes

Composite attributes are flattened out by creating a separate attribute for each
component attribute
E.g. given entity set customer with composite attribute name with component
attributes first-name and last-name the table corresponding to the entity set
has two attributes
name.first-name and name.last-name
A multivalued attribute M of an entity E is represented by a separate table EM
Table EM has attributes corresponding to the primary key of E and an
attribute corresponding to multivalued attribute M
E.g. Multivalued attribute dependent-names of employee is represented by a
table
employee-dependent-names( employee-id, dname)
Each value of the multivalued attribute maps to a separate row of the table EM
E.g., an employee entity with primary key John and
dependents Johnson and Johndotir maps to two rows:
(John, Johnson) and (John, Johndotir)

Representing Weak Entity Sets

For a weak entity set, we add columns to the table corresponding to the primary key
of the strong entity set on which the weak set is dependent

70

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

71

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

A weak entity set becomes a table that includes a column for the primary key of the
identifying strong entity set

Representing Relationship Sets as Tables


A many-to-many relationship set is represented as a table with columns for the
primary keys of the two participating entity sets, and any descriptive attributes of the
relationship set.
E.g.: table for relationship set borrower

72

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

73

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Let R be a relationship set involving entity sets .

The table corresponding to the relationship set R has the following attributes:

If the relationship has k descriptive attributes, we add them too:

Redundancy of Tables

Many-to-one and one-to-many relationship sets that are total on the many-side can be
represented by adding an extra attribute to the many side, containing the primary key
of the one side
E.g.: Instead of creating a table for relationship account-branch, add an attribute
branch to the entity set account

74

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

75

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

For one-to-one relationship sets, either side can be chosen to act as the “many” side
That is, extra attribute can be added to either of the tables corresponding to the
two entity sets
If participation is partial on the many side, replacing a table by an extra attribute in
the relation corresponding to the “many” side could result in null values
The table corresponding to a relationship set linking a weak entity set to its
identifying strong entity set is redundant.
E.g. The payment table already contains the information that would appear in
the loan-payment table (i.e., the columns loan-number and payment-number).

Representing Specialization as Tables

Method 1:
Form a table for the higher level entity
Form a table for each lower level entity set, include primary key of higher
level entity set and local attributes

table table attributes


person name, street, city
customer name, credit-rating
employee name, salary

76

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Drawback: getting information about, e.g., employee requires accessing two


tables

Method 2:
Form a table for each entity set with all local and inherited attributes
table table attributes
person name, street, city
customer name, street, city, credit-rating
employee name, street, city, salary

If specialization is total, table for generalized entity (person) not required to


store information
Can be defined as a “view” relation containing union of specialization
tables
But explicit table may still be needed for foreign key constraints
Drawback: street and city may be stored redundantly for persons who are
both customers and employees
Relations Corresponding to Aggregation
To represent aggregation, create a table containing
primary key of the aggregated relationship,
the primary key of the associated entity set
Any descriptive attributes
E.g. to represent aggregation manages between relationship works-on and entity set
manager, create a table
manages(employee-id, branch-name, title, manager-name)
Table works-on is redundant provided we are willing to store null values for attribute
manager-name in table manages

77

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

UNIT III

Relational Model: Structure of Relational Database, Relational Algebra, Operation,


Additional Operations, Calculus, Domain Relational Calculus, Tuple Relational, Query
by Examples.

78

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Relational Algebra

Six basic operators


select: 
project: 
union: 
set difference: –
Cartesian product: x
rename: 

Relation r

A B C D
  1 7
  5 7
  12 3
  23 10

 A=B  D > 5 ( r )

A B C D
  1 7
  23 10

Select Operation
Notation:  p(r)
p is called the selection predicate
Defined as:

p(r) = {t | t  r and p(t)}


Where p is a formula in propositional calculus consisting of terms connected by : 
(and),  (or),  (not)
Each term is one of:
<attribute> op <attribute> or <constant>
where op is one of: =, , >, . <. 
Example of selection:

 branch_name=“Perryridge”(account)

79

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Project Operation – Example

Pr
ojectOper
ati
on–Exampl
e
A B C
Rel
ati
onr
:

 10 1
 20 1
 30 1
 40 2

A C A,C
A (r)
C

 1  1
 1 =  1
  2
1
 2

Project Operation
Notation:
where A1, A2 are attribute names and r is a relation name.
The result is defined as the relation of k columns obtained by erasing the columns that
are not listed
Duplicate rows removed from result, since relations are sets
Example: To eliminate the branch_name attribute of account

account_number, balance (account)

80

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Uni
onOper
ati
on–Exampl
e

A R
Bel
ati
onsr
,s: A B

 1  2
 2  3
 1
s
r

A B

rs
:1
 2
 1
 3

Notation: r  s
Defined as:
r  s = {t | t  r or t  s}
For r  s to be valid.
1. r, s must have the same arity (same number of attributes)
2. The attribute domains must be compatible (example: 2nd column
of r deals with the same type of values as does the 2nd column of s)
Example: to find all customers with either an account or a loan
customer_name (depositor)  customer_name (borrower)

Set Difference Operation – Example

81

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

A RB
el
ati
onsr
,s: A B

 1  2
 2  3
 1
s
r

r–
As:B

 1
 1

Notation r – s
Defined as:
r – s = {t | t  r and t  s}

Set differences must be taken between compatible relations.


r and s must have the same arity
attribute domains of r and s must be compatible

82

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Car
tesi
an-
ProductOper
ati
on– Exampl
e

Rel
ati
onsr
,s:
A B C D E

 1  10 a
 2  10 a
 20 b
r  10 b
s
rxs
:
A B C D E

 1  10 a
 1  10 a
 1  20 b
 1  10 b
 2  10 a
 2  10 a
 2  20 b
 
2 10 b

Cartesian-Product Operation

Notation r x s
Defined as:
r x s = {t q | t  r and q  s}
Assume that attributes of r(R) and s(S) are disjoint. (That is, R  S = ).
If attributes of r(R) and s(S) are not disjoint, then renaming must be used.

83

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Composition of Operations
Can build expressions using multiple operations
Example: A=C(r x s)
rxs

A B C D E
1  10 a

 1  10 a
 1  20 b
 1  10 b
 2  10 a
 2  10 a
 2  20 b
 2  10 b

A=C(r x s)

A B C D E

 1  10 a
 2  10 a
 2  20 b

Rename Operation
Allows us to name, and therefore to refer to, the results of relational-algebra
expressions.
Allows us to refer to a relation by more than one name.
Example:
 x (E)
returns the expression E under the name X
If a relational-algebra expression E has arity n, then

84

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

returns the result of expression E under the name X, and with the
attributes renamed to A1 , A2 , …., An .

Banking Example
branch (branch_name, branch_city, assets)
customer (customer_name, customer_street, customer_city)

account (account_number, branch_name, balance)

loan (loan_number, branch_name, amount)

depositor (customer_name, account_number)

borrower (customer_name, loan_number)

Fi
ndal
ll
oansofov
er$1200

amount>1200(
loan)

Fi
ndt
hel
oannumberf
oreac
hloanofanamountgr
eat
ert
han $1200

l a
oan_number( mount>1200(
loan)
)

Fi
ndt
henamesofal
lcus
tomer
swhohav
eal
oan,anaccount
,orbot
h,f
rom t
hebank

cus
tomer
_name(
bor
r ) c
ower us
tomer
_name(
deposi
t
or)

85

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Fi
ndt
henamesofal
lcus
tomer
swhohav
eal
oanatt
hePer
ryr
idgebr
anc
h.

cust
omer br
_name( anch_name=“
Per
ryr
idge”
b
( or
rower
.l
oan_number=l
oan.
l
oan_number
(bor
rowerxl
oan)
))

Fi
ndthenamesofal
lcust
omerswhohavealoanatthe
Per
ryr
idgebr
anchbutdonothav
eanaccountatanybranc
hof
t
hebank .
cus
t _name(br
omer anch_name=“
Per
ryr
idge”
bor
( rower
.l
oan_number=l
oan.
loan_number
(bor
rowerxl
oan)
))–
cus
tomer
_name(deposi
t
or)

Find the names of all customers who have a loan at the Perryridge branch.

Query 1
customer_name (branch_name = “Perryridge” (
borrower.loan_number = loan.loan_number (borrower x loan)))

Query 2
customer_name(loan.loan_number = borrower.loan_number (
(branch_name = “Perryridge” (loan)) x borrower))

86

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Findthel ar gestaccountbal ance


Strat
egy:
Fi
ndt hosebal ancesthatarenotthelarges
t
Renameaccountr el
ationasdsot hatwecancompareeachaccountbal
ancewi
thal
lot
her
s
Usesetdi fferencet ofindt
hoseaccountbalancest
hatwer
enotfoundint
heearl
i
erst
ep.
Thequer yi s:

bal
ance( )-a
account ccount.
balance
a
( ccount.
balance<d.balance(accountxd(account
))
)

Set-Intersection Operation
Notation: r  s
Defined as:
r  s = { t | t  r and t  s }
Assume:
r, s have the same arity
attributes of r and s are compatible
Note: r  s = r – (r – s)

87

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Set
-I
nter
sect
ionOper
ati
on–Exampl
e

A B Rel
ati
onr
,s:
A B
 1  2
 2  3
 1

r r s s

A B

 2

Not
ati
on:r s
Let r and s be relations on schemas R and S respectively.

Then, r join s is a relation on schema R  S obtained as follows:

Consider each pair of tuples tr from r and ts from s.


If tr and ts have the same value on each of the attributes in R  S, add a tuple
t to the result, where
t has the same value as tr on r
t has the same value as ts on s
Example:
R = (A, B, C, D)
S = (E, B, D)
Result schema = (A, B, C, D, E)

then r join s defined as

r.A, r.B, r.C, r.D, s.E (r.B = s.B  r.D = s.D (r x s))

88

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Nat
uralJoi
nOper
ati
on–Exampl
e

Rel
ati
onsr
,s:

A B C D B D E

 1  a 1 a 
 2  a 3 a 
 4  b 1 a 
 1  a 2 b 
 2  b 3 b 
r s

R S
A B C D E

 1  a 
 1  a 
 1  a 
 1  a 
 2  b 

Division Operation rs

Suited to queries that include the phrase “for all”.


Let r and s be relations on schemas R and S respectively where
R = (A1, …, Am , B1, …, Bn )
S = (B1, …, Bn)
The result of r  s is a relation on schema
R – S = (A1, …, Am)
r  s = { t | t   R-S (r)   u  s ( tu  r ) }
Where tu means the concatenation of tuples t and u to produce a single tuple

89

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Di
visi
onOper
ati
on–Exampl
e

Rel
ati
onsr
,s:
A B
B
1 1
 2
 2
 3
 1 s
 1
 1
 3
 4
 6

 1
2
A rr s
:


Anot
herDi
vi
si
onEx
ampl
e

Rel
ati
onsr
,s:
A B C D E D E

a a 1 a 1
  b 1
 a  a 1
 a  b 1 s
 a  a 1
 a  b 3
 a  a 1
 a  b 1
 a  b 1
r

r s
:
A B C

 a 
 a 

90

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Property
Let q = r  s
Then q is the largest relation satisfying q x s  r
Definition in terms of the basic algebra operation
Let r(R) and s(S) be relations, and let S  R
r  s = R-S (r ) – R-S ( ( R-S (r ) x s ) – R-S,S(r ))
To see why
R-S,S (r) simply reorders attributes of r
R-S (R-S (r ) x s ) – R-S,S(r) ) gives those tuples t in

R-S (r ) such that for some tuple u  s, tu  r.

BankExampl
eQuer
ies

Fi
ndt
henamesofal
lcus
tomer
swhohav
eal
oanandanaccountatbank
.

cus
tomer
_name(
bor
r ) c
ower ust
omer
_name(
deposi
t
or)

Fi
ndt
henameofal
lcus
tomer
swhohav
eal
oanatt
hebankandt
hel
oanamount

cust
omer
_name,l
oan_number
,amount(
bor
rower l
oan)

Find all customers who have an account from at least the “Downtown” and the
Uptown” branches.

Query 1
customer_name (branch_name = “Downtown” (depositor join account ))

customer_name (branch_name = “Uptown” (depositor join account))

91

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Query 2
customer_name, branch_name (depositor account)
 temp(branch_name) ({(“Downtown” ), (“Uptown” )})
Note that Query 2 uses a constant relation.

Find all customers who have an account at all branches located in Brooklyn city.
customer_name, branch_name (depositor account)
 branch_name (branch_city = “Brooklyn” (branch))

Extended Relational-Algebra-Operations

Generalized Projection
Aggregate Functions
Outer Join

Generalized Projection

Extends the projection operation by allowing arithmetic functions to be used in


the projection list.

∏F , F ,...,F ( E )
1 2 n
E is any relational-algebra expression
Each of F1, F2, …, Fn are are arithmetic expressions involving constants and
attributes in the schema of E.
Given relation credit_info(customer_name, limit, credit_balance), find how much
more each person can spend:

customer_name, limit – credit_balance (credit_info)

Aggregate Functions and Operations

Aggregation function takes a collection of values and returns a single value as a


result.
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values
Aggregate operation in relational algebra

92

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

E is any relational-algebra expression


G1, G2 …, Gn is a list of attributes on which to group (can be empty)
Each Fi is an aggregate function
Each Ai is an attribute name

Rel
ati
onr
:
A B C

  7
  7
  3
  10

g sum(c) (r) c)
sum(

27

Rel
ati
onaccountgr
oupedbybr
anch-
name:

br
anch_name account
_numberbal
ance
Perr
yridge A-
102 400
Perr
yridge A-
201 900
Bri
ghton A-
217 750
Bri
ghton A-
215 750
Redwood A-
222 700

br
anch_nameg sum(bal
ance) (account)
br
anch_name sum(
bal
ance)
Perr
yridge 1300
Bri
ghton 1500
Redwood 700

93

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Result of aggregation does not have a name


Can use rename operation to give it a name
For convenience, we permit renaming as part of aggregate operation
br
anch_name g bal
sum( ance)assum_bal
ance

Outer Join

An extension of the join operation that avoids loss of information.


Computes the join and then adds tuples form one relation that does not match tuples
in the other relation to the result of the join.
Uses null values:
null signifies that the value is unknown or does not exist
All comparisons involving null are (roughly speaking) false by definition.
We shall study precise meaning of comparisons with nulls later

Rel
ati
onl
oan

oan_number br
l anch_name amount
L
-170 Downt
own 3000
L
-230 Redwood 4000
L
-260 Per
ryr
idge 1700

Rel
ati
onbor
rower
cus
tomer
_name l
oan_number
Jones L
-170
Smith L
-230
Hayes L
-155

94

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Joi
n

l
oan bor
rower

l
oan_number br
anch_name amount cus
tomer
_name
L
-170 Downt
own 3000 Jones
L
-230 Redwood 4000 Smith

LeftOut
erJ
oin
l
oan bor
rower

oan_number br
l anc
h_name amount cus
tomer
_name
L
-170 Downt
own 3000 Jones
L
-230 Redwood 4000 Smith
L
-260 Per
ryr
idge 1700 null

Ri
ghtOut
erJoin
l
oan bor
rower

l
oan_number br
anch_name amount cust
omer
_name
L
-170 Downt
own 3000 Jones
L
-230 Redwood 4000 Smith
L
-155 nul
l nul
l Hayes

Full
OuterJoin
loan bor
rower

l
oan_number br
anch_name amount cus
tomer
_name
L
-170 Downt
own 3000 Jones
L
-230 Redwood 4000 Smith
L
-260 Per
ryr
idge 1700 nul
l
L
-155 nul
l nul
l Hayes

95

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Null Values
null signifies an unknown value or that a value does not exist.
The result of any arithmetic expression involving null is null.
Aggregate functions simply ignore null values (as in SQL)
For duplicate elimination and grouping, null is treated like any other value, and two
nulls are assumed to be the same (as in SQL)
Comparisons with null values return the special truth value: unknown
If false was used instead of unknown, then not (A < 5)
would not be equivalent to A >= 5
Three-valued logic using the truth value unknown:
OR: (unknown or true) = true,
(unknown or false) = unknown
(unknown or unknown) = unknown
AND: (true and unknown) = unknown,
(false and unknown) = false,
(unknown and unknown) = unknown
NOT: (not unknown) = unknown
In SQL “P is unknown” evaluates to true if predicate P evaluates to unknown
Result of select predicate is treated as false if it evaluates to unknown

Modification of the Database

The content of the database may be modified using the following operations:
Deletion
Insertion
Updating
All these operations are expressed using the assignment operator.

Del
eti
on
Adel
eterequesti
sexpr essedsimil
arl
yt oaquer y,ex
ceptinsteadofdispl
aying
t
upl
estot heuser
,theselectedtupl
esar eremov edfr
om thedat abase.
Candelet
eonlywhol etupl
es;cannotdeletevaluesononl yparti
cularat
tr
ibutes
Adel
etionisexpr
essedi nrelat
i
onalalgebraby:
r r–E
wher erisarelat
ionandEi sarelat
ional
algebraquer y
.

96

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Del
eti
onExampl
es

Del
eteal
laccountr
ecor
dsi
nthePer
ryr
idgebr
anch.
account account–br
anch_name=“
Per
ryr
idge”(account)

Del
eteal
ll
oanr
ecor
dswi
t
hamounti
nther
angeof0t
o50

oan–a
oan l
l mount0andamount 50(l
oan)

Del
eteal
laccount
satbr
anchesl
ocat
edi
nNeedham.

1 b
r r
anch_ci
ty=“
Needham”(
account br
anch)
2  a
r ccount
_number
,br
anch_name,bal
ance(
r1)
r3  cus tomer _name,account
_number(
r2 deposi
t
or)
account account–r2
deposi
tor deposit
or–r3

I
nser
ti
on
Toinser tdataintoar elat
ion,weeither:
speci
fyat upletobei nsert
ed
wr i
teaquer ywhoser esul
tisasetoft upl
est obeinsert
ed
i
nr el
at i
onalalgebra,ani nserti
onisexpressedby :
r r E
wher erisar elati
onandEi sar el
ati
onalalgebraexpressi
on.
Thei nserti
onofasi ngl
et upl
eisexpressedbyl ett
ingE beaconst antr
elat
i
on
containingonet uple.

Example
I
nsertinfor
mat
ioninthedatabasespec
ifyi
ngt
hatSmi
t
hhas$1200i
naccountA-
973att hePer
ryr
idgebranch

account account {(
“A-
973”
,“Perr
yri
dge”,1200)
}
deposi
tor deposi
tor {(
“Smi
th”
,“A-973”
)}

97

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Ex
ampl e
Pr
ovi
deasagi ftf
oral
lloancust
omersi
nt hePerr
yri
dge
br
anc h,a$200savi
ngsaccount
.Lettheloannumberser
ve
astheaccountnumberfort
henewsavingsaccount

r1  (branch_name = “Perryridge” (borrower loan))


account  account  loan_number, branch_name, 200 (r1)
depositor  depositor  customer_name, loan_number (r1)

Updat
ing
A mechanism to change a value in a tuple without charging all values in the tuple

Use the generalized projection operator to do this task

r r1,r2,………rn ( r)

Each Fi is either
the I th attribute of r, if the I th attribute is not updated, or,
if the attribute is to be updated Fi is an expression, involving only constants and
the attributes of r, which gives the new value for the attribute

Example

Make interest payments by increasing all balances by 5 percent.

account  account
_number
,br
anch_name,bal
ance*1.
05(
account
)

Payal
laccount
swit
hbal
ancesover$10,
0006per
centi
nter
est
andpayallot
her
s5percent

account  account_number,branch_name,bal
ance*1.  BAL 10000(
06( account))
  account_number,branch_name,bal 05(
ance*1. BAL 10000( )
account)

98

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Views
In some cases, it is not desirable for all users to see the entire logical model (that is,
all the actual relations stored in the database.)
Consider a person who needs to know a customer’s name, loan number and branch
name, but has no need to see the loan amount.
 name,loan-number,bramch-name ( borrower X loan)

A view provides a mechanism to hide certain data from the view of certain users.
Any relation that is not of the conceptual model but is made visible to a user as a
“virtual relation” is called a view.

View Definition

Create view v as < query expression>

Where < query expression> is any legal relational algebra query expression. The view
name is represented by v

Example

Consider the view (named all-customer) consisting of branches and their


customers.

create view all-customer as


Π branch-name, customer-name(depositor X account)
U Πbranch-name, customer-name(borrower X loan)

We can find all customers of the Perryridge branch by

customer-name(σbranch-name= “Perryridge”(all-customer))

99

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Updates Through View


• Database modifications expressed as views must be translated to
modifications of the actual relations in the database.

• Consider the person who needs to see all loan data in the loanrelation except
amount.The view given to the person, branch-loan, is defined as:

create view branch-loan as


Πbranch-name, loan-number(loan)

•Since we allow a view name to appear wherever a relation name is allowed, the person
may write:

branch-loan ←branch-loan U{(“Perryridge”, L-37)}

Views Defined Using Other Views


• One view may be used in the expression defining another view
• A view relation v1is said to depend directlyon a view relation v2if v2is used
in the expression defining v1
• A view relation v1is said to depend onview relation v2if either v1
depends directly to v2 or there is a path of dependencies from v1to v2
• A view relation vis said to be recursiveif it depends on itself.

Why do we use Relational Algebra?


Because:
•It is mathematically defined (where relations are sets)
•We can prove that two relational algebra expressions are equivalent.
For example:

σcond1(σcond2R)≡σcond2(σcond1R)≡σcond1 and cond2R


R1 ⋈condR2≡σcond(R1 X R2)
R1 ÷R2≡πx(R1) -πx((πxR1) X R2) -R1)

Uses of Relational Algebra Equivalences


•To help query writers -they can write queries in several different
ways
• To help query optimizers -they can choose among different ways
to execute the query
and in both cases we know for surethat the two queries (the original and the
replacement) are identical…that they will produce the same answer

100

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Tuple Relational Calculus

A nonprocedural query language, where each query is of the form


{t | P (t ) }
It is the set of all tuples t such that predicate P is true for t
t is a tuple variable, t [A ] denotes the value of tuple t on attribute A
t  r denotes that tuple t is in relation r
P is a formula similar to that of the predicate calculus

Predicate Calculus Formula


1. Set of attributes and constants
2. Set of comparison operators: (e.g., , , , , , )
3. Set of connectives: and (), or (v)‚ not ()
4. Implication (): x  y, if x if true, then y is true
x  y x v y
5. Set of quantifiers:
o  t  r (Q (t ))  ”there exists” a tuple in t in relation r
such that predicate Q (t ) is true
o t  r (Q (t ))  Q is true “for all” tuples t in relation r

Banking Example
branch (branch_name, branch_city, assets )
customer (customer_name, customer_street, customer_city )
account (account_number, branch_name, balance )
loan (loan_number, branch_name, amount )
depositor (customer_name, account_number )
borrower (customer_name, loan_number )

Find the loan_number, branch_name, and amount for loans of over $1200
Answer

t|t l
{ oan t[
amount] 1200}

Find the loan number for each loan of an amount greater than $1200

{t |  s  loan (t [loan_number ] = s [loan_number ]  s [amount ]  1200)}

Notice that a relation on schema [loan_number ] is implicitly defined by


the query

101

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Find the names of all customers having a loan, an account, or both at the bank

{t | s  borrower ( t [customer_name ] = s [customer_name ])


 u  depositor ( t [customer_name ] = u [customer_name ])

Find the names of all customers who have a loan and an account at the bank
{t | s  borrower ( t [customer_name ] = s [customer_name ])
 u  depositor ( t [customer_name ] = u [customer_name] )

Find the names of all customers having a loan at the Perryridge branch

{t | s  borrower (t [customer_name ] = s [customer_name ]


 u  loan (u [branch_name ] = “Perryridge”
 u [loan_number ] = s [loan_number ]))}
Find the names of all customers who have a loan at the
Perryridge branch, but no account at any branch of the bank
{t | s  borrower (t [customer_name ] = s [customer_name ]
 u  loan (u [branch_name ] = “Perryridge”
 u [loan_number ] = s [loan_number ]))
 not v  depositor (v [customer_name ] =
t [customer_name ])}
Find the names of all customers having a loan from the Perryridge branch, and the
cities in which they live
{t | s  loan (s [branch_name ] = “Perryridge”
 u  borrower (u [loan_number ] = s [loan_number ]
 t [customer_name ] = u [customer_name ])
  v  customer (u [customer_name ] = v [customer_name ]
 t [customer_city ] = v [customer_city ])))}

102

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Domain Relational Calculus


A nonprocedural query language equivalent in power to the tuple relational calculus
Each query is an expression of the form:

{  x1, x2, …, xn  | P (x1, x2, …, xn)}


x1, x2, …, xn represent domain variables
P represents a formula similar to that of the predicate calculus

Find the loan_number, branch_name, and amount for loans of over $1200

{ l, b, a  |  l, b, a   loan  a > 1200}

Find the names of all customers who have a loan of over $1200
{ c  |  l, b, a ( c, l   borrower   l, b, a   loan  a > 1200)}

Find the names of all customers who have a loan from the Perryridge branch
and the loan amount:
o { c, a  |  l ( c, l   borrower  b ( l, b, a   loan 
b = “Perryridge”))}
o { c, a  |  l ( c, l   borrower   l, “ Perryridge”, a   loan)}
Find the names of all customers having a loan, an account, or both at the
Perryridge branch:
{ c  |  l (  c, l   borrower
  b,a ( l, b, a   loan  b = “Perryridge”))
  a ( c, a   depositor
  b,n ( a, b, n   account  b = “Perryridge”))}

Find the names of all customers who have an account at all branches located in
Brooklyn:
{ c  |  s,n ( c, s, n   customer) 
 x,y,z ( x, y, z   branch  y = “Brooklyn”) 
 a,b ( x, y, z   account   c,a   depositor)}
Safety of Expressions
The expression:
{  x1, x2, …, xn  | P (x1, x2, …, xn )}
is safe if all of the following hold:
1. All values that appear in tuples of the expression are values from dom (P ) (that is,
the values appear either in P or in a tuple of a relation mentioned in P ).
2. For every “there exists” subformula of the form  x (P1(x )), the subformula is
true if and only if there is a value of x in dom (P1) such that P1(x ) is true.
3. For every “for all” subformula of the form x (P1 (x )), the subformula is true if
and only if P1(x ) is true for all values x from dom (P1).

103

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Query-by-Example (QBE)

QBE — Basic Structure

A graphical query language which is based (roughly) on the domain relational


calculus
Two dimensional syntax – system creates templates of relations that are requested by
users
Queries are expressed “by example”

QBE Skeleton Tables for the Bank Example

104

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

105

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

106

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Queries on One Relation


Find all loan numbers at the Perryridge branch.

107

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

108

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

_x is a variable (optional; can be omitted in above query)


P. means print (display)
duplicates are removed by default
To retain duplicates use P.ALL

109

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

110

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Display full details of all loans

Method 1:

P.
_x P.
_y P.
_z

Met
hod2:Shor
thandnot
ati
on

Fi
ndt
hel
oannumberofal
ll
oanswi
t
hal
oanamountofmor
ethan$700

Fi
ndnamesofal
lbr
anchest
hatar
enotl
ocat
edi
nBr
ookl
yn

111

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Fi
ndt
hel
oannumber
sofal
ll
oansmadej
oi
ntl
ytoSmi
t
handJones
.

Fi
ndal
lcust
omer
swhol
i
vei
nthes
ameci
tyasJones

Fi
ndt
henamesofal
lcus
tomer
swhohav
eal
oanf
rom t
hePer
ryr
idgebr
anch.

112

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Fi
ndt
henamesofal
lcust
omer
swhohav
ebot
hanaccountandal
oanatt
hebank
.

113

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Negat
ioni
nQBE

Fi
ndt
henamesofal
lcust
omer
swhohav
eanaccountatt
hebank
,butdonothav
eal
oanf
rom t
h

¬means“
ther
edoesnotexi
st”

Fi
ndal
lcust
omer
swhohav
eatl
eas
ttwoaccount
s.

¬means“
notequal
to”

114

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

TheCondi
ti
onBox

Al
l
owst
heexpr
essi
onofconst
rai
ntsondomai
nvar
iabl
est
hatar
eei
t
heri
nconv
eni
entori
mpossi
b

Complexcondi
t
ionscanbeusedincondit
ionbox
es
Example:Fi
ndthel
oannumbersofal
lloansmadetoSmi
t
h,t
oJones
,ort
obot
hjoi
ntl
y

115

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

QBEsuppor
tsani
nter
est
i
ngs
ynt
axf
orexpr
essi
ngal
t
ernat
i
vev
alues

Fi
ndal
laccountnumber
swi
t
habal
ancegr
eat
ert
han$1,
300andl
esst
han$1,
500

Fi
ndal
laccountnumber
swi
t
habal
ancegr
eat
ert
han$1,
300and l
esst
han$2,
000butnotex
act

116

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Fi
ndal
lbr
anc
hest
hathav
eas
set
sgr
eat
ert
hant
hoseofatl
eas
tonebr
anchl
ocat
edi
n

Find the customer_name, account_number, and balance for all customers who have an
account at the Perryridge branch.

117

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

118

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Or
der
ingt
heDi
spl
ayofTupl
es

AO =ascendi
ngor
der;DO =descendingorder
.
Exampl
e:li
sti
nascendi
ngalphabet
icalor
derallcust
omer
swhohav
eanaccountatt
hebank

Whensort
ingonmul ti
pl
eattr
ibut
es,t
hesor
ti
ngorderi
sspeci
fiedbyi
ncl
udi
ngwit
heachsortope
Ex
ample:Listal
laccountnumbersatt
hePerr
yri
dgebranchi
nascendi
ngal
phabet
icor
derwitht
h

Aggr
egat
eOper
ati
ons

Theaggr
egat
eoper
ator
sar
eAVG,MAX,MI
N,SUM,andCNT

Theaboveoper
ator
smustbepostfix
edwith“ALL”(
e.g.
,SUM.ALL.orAVG.ALL.
_x)t
oensur
etha
Exampl
e:Findt
hetotal
bal
anceofalltheaccount
smaintai
nedatt
hePerryr
idgebr
anch.

119

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

UNQ is used to specify that we want to eliminate duplicates


Find the total number of customers having an account at the bank.

120

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

121

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Quer
yExampl
es

Fi
ndt
heav
eragebal
anceateachbr
anch.

The“
G”i
n“P.
G”i
sanal
ogoust
oSQL
’sgr
oupbycons
truct

The“
ALL
”int
he“
P.AVG.
ALL
”ent
ryi
nthebal
ancecol
umnensur
est
hatal
lbal
ancesar
econsi
der
e

Tofindt
heav
erageaccountbal
anceatonl
ythosebr
ancheswher
etheav
erageaccountbal
ance

Find all customers who have an account at all branches located in Brooklyn

122

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Modi
ficat
ionoft
he

Del
eti
onoft
upl
esf
rom ar
elat
i
oni
sexpr
essedbyuseofaD.command.I
nthecasewher

Del
etecust
omerSmi
t
h

Del
etet
hebr
anch_ci
tyv
alueoft
hebr
anchwhosenamei
s“Per
ryr
idge”
.

123

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Delet
ealll
oanswit
haloanamountgr
eatert
han$1300and l
esst
han$1500.
Forconsi
st
ency,wehavet
odelet
einf
ormati
onfr
om l
oanandborr
owert
ables

Del
eteal
laccount
satbr
anc
hesl
ocat
edi
nBr
ookl
yn.

124

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Modi
ficat
ionoft
heDat
abase–I
nser
tion

I
nser
ti
onisdonebypl aci
ngtheI.oper
atori
nthequeryexpressi
on.
I
nser
tthef
actthataccountA-9732att
hePer r
yri
dgebranchhasabalanceof$700.

Modi
ficat
iono

Pr
ovi
deasagi
ftf
oral
ll
oancus
tomer
soft
hePer
ryr
idgebr
anch,anew$200savi
ngsaccountf
ore

125

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Modi
ficat
ionoft
heDat
abase–Updat

Uset
heU.oper
atort
ochangeav
aluei
nat
upl
ewi
t
houtchangi
ngal
lval
uesi
nthet
upl
e.QBEdo

Updat
etheassetv
alueoft
hePer
ryr
idgebr
ancht
o$10,
000,
000.

I
ncr
easeal
lbal
ancesby5per
cent
.

126

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

UNIT IV

SQL & Other Relational Languages: Structures, Set Operations, Aggregate Functions,
Null Values, Nested Sub – Queries, Derived Relations, Joined Relations, DDL, Other
SQL features.

127

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Domain Types in SQL

char(n). Fixed length character string, with user-specified length n.


varchar(n). Variable length character strings, with user-specified maximum length n.
int. Integer (a finite subset of the integers that is machine-dependent).
smallint. Small integer (a machine-dependent subset of the integer domain type).
numeric(p,d). Fixed point number, with user-specified precision of p digits, with n
digits to the right of decimal point.
real, double precision. Floating point and double-precision floating point numbers, with
machine-dependent precision.
float(n). Floating point number, with user-specified precision of at least n digits.

Create Table Construct


An SQL relation is defined using the create table command:
create table r (A1 D1, A2 D2, ..., An Dn,
(integrity-constraint1),
...,
(integrity-constraintk))
r is the name of the relation
each Ai is an attribute name in the schema of relation r
Di is the data type of values in the domain of attribute Ai

Example:
create table branch
(branch_name char(15) not null,
branch_city char(30),
assets integer)

Integrity Constraints in Create Table


not null
primary key (A1, ..., An )
Example: Declare branch_name as the primary key for branch
.
create table branch
(branch_name char(15),
branch_city char(30),
assets integer,
primary key (branch_name))

Drop and Alter Table Constructs


The drop table command deletes all information about the dropped relation from the
database.
The alter table command is used to add attributes to an existing relation:
alter table r add A D

128

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

where A is the name of the attribute to be added to relation r and D is the domain of
A.
All tuples in the relation are assigned null as the value for the new attribute.
The alter table command can also be used to drop attributes of a relation:
alter table r drop A
where A is the name of an attribute of relation r
Dropping of attributes not supported by many databases

Basic Query Structure


SQL is based on set and relational operations with certain modifications and
enhancements
A typical SQL query has the form:

select A1, A2, ..., An


from r1, r2, ..., rm
where P
Ai represents an attribute
Ri represents a relation
P is a predicate.
This query is equivalent to the relational algebra expression.

The result of an SQL query is a relation.

The select Clause


The select clause list the attributes desired in the result of a query
1. corresponds to the projection operation of the relational algebra
Example: find the names of all branches in the loan relation:
select branch_name
from loan
In the relational algebra, the query would be:
branch_name (loan)
NOTE: SQL names are case insensitive (i.e., you may use upper- or lower-case
letters.)
E.g. Branch_Name ≡ BRANCH_NAME ≡ branch_name
Some people use upper case wherever we use bold font
SQL allows duplicates in relations as well as in query results.
To force the elimination of duplicates, insert the keyword distinct after select.
Find the names of all branches in the loan relations, and remove duplicates
select distinct branch_name
from loan
The keyword all specifies that duplicates not be removed.
select all branch_name
from loan
An asterisk in the select clause denotes “all attributes”
select *
from loan

129

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

The select clause can contain arithmetic expressions involving the operation, +, –, ,
and /, and operating on constants or attributes of tuples.
The query:
select loan_number, branch_name, amount  100
from loan
would return a relation that is the same as the loan relation, except that the value
of the attribute amount is multiplied by 100.

The where Clause


The where clause specifies conditions that the result must satisfy
Corresponds to the selection predicate of the relational algebra.
To find all loan number for loans made at the Perryridge branch with loan amounts
greater than $1200.
select loan_number
from loan
where branch_name = 'Perryridge' and amount > 1200
Comparison results can be combined using the logical connectives and, or, and not.
Comparisons can be applied to results of arithmetic expressions.

SQL includes a between comparison operator


Example: Find the loan number of those loans with loan amounts between $90,000
and $100,000 (that is,  $90,000 and  $100,000)

select loan_number
from loan
where amount between 90000 and 100000

The from Clause


The from clause lists the relations involved in the query
Corresponds to the Cartesian product operation of the relational algebra.
Find the Cartesian product borrower X loan
select 
from borrower, loan
Find the name, loan number and loan amount of all customers
having a loan at the Perryridge branch.
select customer_name, borrower.loan_number, amount
from borrower, loan
where borrower.loan_number = loan.loan_number and
branch_name = 'Perryridge'

130

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

The Rename Operation


The SQL allows renaming relations and attributes using the as clause:
old-name as new-name
Find the name, loan number and loan amount of all customers; rename the column
name loan_number as loan_id.
select customer_name, borrower.loan_number as loan_id, amount
from borrower, loan
where borrower.loan_number = loan.loan_number

Tuple Variables
Tuple variables are defined in the from clause via the use of the as clause.
Find the customer names and their loan numbers for all customers having a loan at
some branch.
select customer_name, T.loan_number, S.amount
from borrower as T, loan as S
where T.loan_number = S.loan_number

String Operations
SQL includes a string-matching operator for comparisons on character strings. The
operator “like” uses patterns that are described using two special characters:
percent (%). The % character matches any substring.
underscore (_). The _ character matches any character.
Find the names of all customers whose street includes the substring “Main”.
select customer_name
from customer
where customer_street like '% Main%'
Match the name “Main%”
like 'Main\%' escape '\'
SQL supports a variety of string operations such as
concatenation (using “||”)
converting from upper to lower case (and vice versa)
finding string length, extracting substrings, etc.

Ordering the Display of Tuples


List in alphabetic order the names of all customers having a loan in Perryridge branch
select distinct customer_name
from borrower, loan
where borrower loan_number = loan.loan_number and
branch_name = 'Perryridge'
order by customer_name
We may specify desc for descending order or asc for ascending order, for each
attribute; ascending order is the default.
Example: order by customer_name desc

131

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Set Operations
Find all customers who have a loan, an account, or both:

(select customer_name from depositor)


union
(select customer_name from borrower)

Find all customers who have both a loan and an account.


(select customer_name from depositor)
intersect
(select customer_name from borrower)

Find all customers who have an account but no loan.

(select customer_name from depositor)


except
(select customer_name from borrower)

Aggregate Functions

These functions operate on the multiset of values of a column of a relation, and return
a value
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values

Find the average account balance at the Perryridge branch.


select avg (balance)
from account
where branch_name = 'Perryridge

Find the number of tuples in the customer relation.


select count (*)
from customer
Find the number of depositors in the bank.
select count (distinct customer_name)
from depositor

132

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Aggregate Functions – Group By


Find the number of depositors for each branch.
select branch_name, count (distinct customer_name)
from depositor, account
where depositor.account_number = account.account_number
group by branch_name
Note: Attributes in select clause outside of aggregate functions must
appear in group by list

Aggregate Functions – Having Clause


Find the names of all branches where the average account balance is more than
$1,200.
select branch_name, avg (balance)
from account
group by branch_name
having avg (balance) > 1200
Note: predicates in the having clause are applied after the
formation of groups whereas predicates in the where
clause are applied before forming groups

Null Values
It is possible for tuples to have a null value, denoted by null, for some of their
attributes
null signifies an unknown value or that a value does not exist.
The predicate is null can be used to check for null values.
Example: Find all loan number which appear in the loan relation with null
values for amount.
select loan_number
from loan
where amount is null
The result of any arithmetic expression involving null is null
Example: 5 + null returns null
Any comparison with null returns unknown
Example: 5 < null or null <> null or null = null
Three-valued logic using the truth value unknown:
OR: (unknown or true) = true,
(unknown or false) = unknown
(unknown or unknown) = unknown
AND: (true and unknown) = unknown,
(false and unknown) = false,
(unknown and unknown) = unknown
NOT: (not unknown) = unknown
“P is unknown” evaluates to true if predicate P evaluates to unknown
Result of where clause predicate is treated as false if it evaluates to unknown

133

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Null Values and Aggregates


Total all loan amounts
select sum (amount )
from loan
Above statement ignores null amounts
Result is null if there is no non-null amount
All aggregate operations except count(*) ignore tuples with null values on the
aggregated attributes.

Nested Subqueries
A subquery is a select-from-where expression that is nested within another query.
A common use of subqueries is to perform tests for set membership, set comparisons,
and set cardinality
Find all customers who have both an account and a loan at the bank.
select distinct customer_name
from borrower
where customer_name in (select customer_name
from depositor )
Find all customers who have a loan at the bank but do not have
an account at the bank
select distinct customer_name
from borrower
where customer_name not in (select customer_name
from depositor )
Find all customers who have both an account and a loan at the Perryridge branch
select distinct customer_name
from borrower, loan
where borrower.loan_number = loan.loan_number and
branch_name = 'Perryridge' and
(branch_name, customer_name ) in
(select branch_name, customer_name
from depositor, account
where depositor.account_number =
account.account_number )
Set Comparison
Find all branches that have greater assets than some branch located in Brooklyn.
select distinct T.branch_name
from branch as T, branch as S
where T.assets > S.assets and
S.branch_city = 'Brooklyn'
Find the names of all branches that have greater assets than all branches located in
Brooklyn.
select branch_name
from branch

134

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

where assets > all


(select assets
from branch
where branch_city = 'Brooklyn')

Derived Relations
Find the average account balance of those branches where the average account
balance is greater than $1200.
select branch_name, avg_balance
from (select branch_name, avg (balance)
from account
group by branch_name )
as branch_avg ( branch_name, avg_balance )
where avg_balance > 1200
Note that we do not need to use the having clause, since we compute the
temporary (view) relation branch_avg in the from clause, and the attributes of
branch_avg can be used directly in the where clause.

Views
In some cases, it is not desirable for all users to see the entire logical model (that is,
all the actual relations stored in the database.)
Consider a person who needs to know a customer’s name, loan number and branch
name, but has no need to see the loan amount. This person should see a relation
described, in SQL, by

(select customer_name, borrower.loan_number, branch_name


from borrower, loan
where borrower.loan_number = loan.loan_number )

A view provides a mechanism to hide certain data from the view of certain users.
Any relation that is not of the conceptual model but is made visible to a user as a
“virtual relation” is called a view.
A view is defined using the create view statement which has the form

create view v as < query expression >

where <query expression> is any legal SQL expression. The view name is
represented by v.
Once a view is defined, the view name can be used to refer to the virtual relation that
the view generates.

135

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

When a view is created, the query expression is stored in the database; the expression
is substituted into queries using the view.

A view consisting of branches and their customers


create view all_customer as
(select branch_name, customer_name
from depositor, account
where depositor.account_number =
account.account_number )
union
(select branch_name, customer_name
from borrower, loan
where borrower.loan_number = loan.loan_number )

Find all customers of the Perryridge branch


select customer_name
from all_customer
where branch_name = 'Perryridge'

Views Defined Using Other Views


One view may be used in the expression defining another view
A view relation v1 is said to depend directly on a view relation v2 if v2 is used in the
expression defining v1
A view relation v1 is said to depend on view relation v2 if either v1 depends directly
to v2 or there is a path of dependencies from v1 to v2
A view relation v is said to be recursive if it depends on itself.
View Expansion
A way to define the meaning of views defined in terms of other views.
Let view v1 be defined by an expression e1 that may itself contain uses of view
relations.
View expansion of an expression repeats the following replacement step:
repeat
Find any view relation vi in e1
Replace the view relation vi by the expression defining vi
until no more view relations are present in e1
As long as the view definitions are not recursive, this loop will terminate

Modification of the Database – Deletion


Delete all account tuples at the Perryridge branch
delete from account
where branch_name = 'Perryridge'

Delete all accounts at every branch located in the city ‘Needham’.

136

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

delete from account


where branch_name in (select branch_name
from branch
where branch_city = 'Needham')

Delete the record of all accounts with balances below the average at the bank.
delete from account
where balance < (select avg (balance )
from account )
Problem: as we delete tuples from deposit, the average balance changes
Solution used in SQL:
First, compute avg balance and find all tuples to delete
Next, delete all tuples found above (without recomputing avg or retesting the
tuples)

Modification of the Database – Insertion


Add a new tuple to account
insert into account values ('A-9732', 'Perryridge', 1200)
or equivalently

insert into account (branch_name, balance, account_number)


values ('Perryridge', 1200, 'A-9732')

Add a new tuple to account with balance set to null


insert into account values ('A-777','Perryridge', null )

Modification of the Database – Updates


Increase all accounts with balances over $10,000 by 6%, all other accounts receive 5%.
Write two update statements:
update account
set balance = balance  1.06
where balance > 10000

update account
set balance = balance  1.05
where balance  10000

Same query as before: Increase all accounts with balances over $10,000 by 6%, all
other accounts receive 5%.
update account
set balance = case
when balance <= 10000 then balance *1.05

137

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

else balance * 1.06


end

Update of a View
Create a view of all loan data in the loan relation, hiding the amount attribute
create view loan_branch as
select loan_number, branch_name
from loan
Add a new tuple to branch_loan
insert into branch_loan
values ('L-37‘, 'Perryridge‘)
This insertion must be represented by the insertion of the tuple
('L-37', 'Perryridge', null )
into the loan relation
Some updates through views are impossible to translate into updates on the database
relations
create view v as
select loan_number, branch_name, amount
from loan
where branch_name = ‘Perryridge’
insert into v values ( 'L-99','Downtown', '23')

Others cannot be translated uniquely


insert into all_customer values ('Perryridge', 'John')
 Have to choose loan or account, and
create a new loan/account number!
Most SQL implementations allow updates only on simple views (without aggregates)
defined on a single relation
Joined Relations
Join operations take two relations and return as a result another relation.
These additional operations are typically used as subquery expressions in the from
clause
Join condition – defines which tuples in the two relations match, and what attributes
are present in the result of the join.
Join type – defines how tuples in each relation that do not match any tuple in the
other relation (based on the join condition) are treated.

Join types Join condition


Inner join Natural
Left outer join On <predicates>
Right outer join Using (A1,A2,,,,,An)
Full outer join

138

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

loan inner join borrower on


loan.loan_number = borrower.loan_number

139

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

140

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

loan left outer join borrower on


loan.loan_number = borrower.loan_number

141

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

142

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

loan natural inner join borrower

143

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

144

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

loan natural right outer join borrower

loan full outer join borrower using (loan_number)

Find all customers who have either an account or a loan (but not both) at the bank.
select customer_name
from (depositor natural full outer join borrower )
where account_number is null or loan_number is null

Database Schema

branch (branch_name, branch_city, assets)


customer (customer_name, customer_street, customer_city)
loan (loan_number, branch_name, amount)
borrower (customer_name, loan_number)
account (account_number, branch_name, balance)
depositor (customer_name, account_number)

UNIT V
Relational Database Design: Pitfalls in Relational – Database Design, Functional
Dependencies, Decomposition, Desirable Properties of Decomposition, Normalization
(INF- DKNF), BCNF & Its Comparison with 3NF.

145

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Normalization
This process of reducing a relation into simpler structures is the process of
Normalisation.
Normalisation may be defined as a step by step reversible process of transforming an
unnormalised relation into relations with progressively simpler structures. Since the
process is reversible, no information is lost in the transformation.
Normalisation removes (or more accurately, minimises) the undesirable properties by
working through a series of stages called Normal Forms. Originally, Codd defined three
types of undesirable properties:

 Data aggregates
 Partial key dependency
 Indirect key dependency

146

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Stages of Normalisation
Unnormalised
(UDF)
Remove repeating groups
First normal form
(1NF)
Remove partial dependencies
Second normal form
(2NF)
Remove transitive dependencies
Third normal form
(3NF)
Remove remaining functional
dependency anomalies
Boyce-Codd normal
form (BCNF)
Remove multivalued dependencies
Fourth normal form
(4NF)
Remove remaining anomalies
Fifth normal form
(5NF)

Goals of Normalisation
• Eliminate certain kinds of redundancy
• avoid certain update anomalies
• good representation of real world
• simplify enforcement of DB integrity

Bad Design
Alternatively, why don't we re-structure our relation such that we do not restrict the
number of transactions per customer. We can do this with the following structure:

This way, a customer can have just any number of Part transactions without worrying
about any upper limit or wasted space through null values (as it was with the previous
structure).

147

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Constructing a query to "Find which customer(s) bought P# 2" is not as cumbersome as


before as one can now simply state: P# = 2.

But again, this structure is not without its faults:

 It seems a waste of storage to keep repeated values of Cname, Ccity and Cphone.
 If C# 1 were to change his telephone number, we would have to ensure that we
update ALL occurrences of C# 1's Cphone values. This means updating tuple 1,
tuple 2 and all other tuples where there is an occurrence of C# 1. Otherwise, our
database would be left in an inconsistent state.
 Suppose we now have a new customer with C# 4. However, there is no part
transaction yet with the customer as he has not ordered anything yet. We may find
that we cannot insert this new information because we do not have a P# which
serves as part of the 'primary key' of a tuple. (A primary key cannot have null
values).

Suppose the third transaction has been canceled, i.e. we no longer need information about
25 of P# 1 being ordered on 26 Jan. We thus delete the third tuple. We are then left with
the following relation:

But then, suppose we need information about the customer "Martin", say the city he is
located in. Unfortunately as information about Martin was held in only that tuple and
having the entire tuple deleted because of its P# transaction, meant also that we have lost
all information about Martin from the relation.

As illustrated in the above instances, we note that badly designed, unnormalised relations
waste storage space. Worse, they give rise to the following storage irregularities:

 Update anomaly: Data inconsistency or loss of data integrity can arise from data
redundancy/repetition and partial update.
 Insertion anomaly: Data cannot be added because some other data is absent.
 Deletion anomaly: Data maybe unintentionally lost through the deletion of other
data.

The Need for Normalisation


Intuitively, it would seem that these undesirable features can be removed by breaking a
relation into other relations with desirable structures. We shall attempt by splitting the
above Transaction relation into the following two relations, Customer and Transaction,
which can be viewed as entities with a one to many relationship.

148

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Figure 4-2: 1:M data relationships

Let us see if this new design will alleviate the above storage anomalies:

Update anomaly

If C# 1 were to change his telephone number, as there is only one occurrence of the tuple
in the Customer relation, we need to update only that one tuple as there are no
redundant/duplicate tuples.

Addition anomaly

Adding a new customer with C# 4 can be easily done in the Customer relation of which
C# serves as the primary key. With no P# yet, a tuple in Transaction need not be created.

Deletion anomaly

Canceling the third transaction about 25 of P# 1 being ordered on 26 Jan would now
mean deleting only the third tuple of the new Transaction relation above. This leaves
information about Martin still intact in the new Customer relation.

We shall now show a more formal process on how we can decompose relations into
multiple relations by using the Normal Form rules for structuring.

First Normal Form (1NF)


The purpose of the First Normal Form (1NF) is to simplify the structure of a relation by
ensuring that it does not contain data aggregates or repeating groups. By this we mean
that no attribute value can have a set of values. In the example below, any one customer
has a group of several telephone entries:

Figure 4-3: Presence of repeating groups

149

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

This is thus not in 1NF. It must be "flattened". This can be achieved by ensuring that
every tuple defines a single entity by containing only atomic values. One can either re-
organise into one relation as in:

Figure 4-4a: Atomic values in tuples

or split into multiple relations as in:

Figure 4-4b: Reduction to 1NF

Note that earlier we defined 1NF as one of the characteristics of a relation . Thus we
consider that every relation is at least in the first normal form (thus the Figure 4-3 is not
even a relation). The Transaction relation of Figure 4-2 is however a 1NF relation.

We may thus generalise by saying that "A relation is in the 1NF if the values in the
relation are atomic for every single attribute of the relation".

Before we can look into the next two normal forms, 2NF and 3NF, we need to first
explain the notion of "functional dependency" as these two forms are constrained by
functional dependencies.

Functional Dependencies
Determinant
The value of an attribute can uniquely determine the value in another attribute.

C# uniquely determines Cname

150

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

C# also uniquely determines Ccity as well as Cphone

C#Cname
C#Ccity
C#Cphone
(C#, P#, Date)Qnt
The value of the attribute on the left-hand side of the arrow is the determinant because its
value uniquely determines the value of the attribute on the right.

Note also that:

(Ccity, Cphone)Cname
(Ccity, Cphone)C#

Figure 4-5: Functional dependencies in the Transaction relation

Similarly, "(C#, P#, Date) is a determinant of Qnt" is thus also "Qnt is functionally
dependent on the set of attributes (C#, P#, Date)". The set of attributes is also known as a
composite attribute.

Figure 4-6: Functional dependency on a composite attribute

Full Functional Dependence


"If an attribute (or a set of attributes) A is a determinant of an attribute (or a set of
attributes) B, then B is said to be fully functionally dependent on A"

and likewise

"Given a relation R, attribute B of R is fully functionally dependent on attribute A of R if


it is functionally dependent on A and not functionally dependent on any subset of A (A
must be composite)".

151

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Figure 4-7: Functional dependencies in the Transaction relation

For the Transaction relation, we may now say that:

Cname is fully functionally dependent on C#


Ccity is fully functionally dependent on C#
Cphone is fully functionally dependent on C#
Qnt is fully functionally dependent on (C#, P#, Date)
Cname is not fully functionally dependent on (C#, P#, Date), it is only
partially dependent on it (and similarly for Ccity and Cphone).
Having understood about determinants and functional dependencies, we are now in a
position to explain the rules of the second and third normal forms.

Second Normal Form (2NF)


Consider again the Transaction relation which was in 1NF.

1. Update
What happens if Codd's telephone number changes and we update only the first tuple (but
not the second)?

2. Insertion
If we wish to introduce a new customer, we cannot do so unless an appropriate
transaction is effected.

152

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

3. Deletion
If the data about a transaction is deleted, the information about the customer is also
deleted. If this happens to the last transaction for that customer the information about the
customer will be lost.

Clearly, the Transaction relation although it is normalised to 1NF still have storage
anomalies. The reason for such violations to the database's integrity and consistency rules
is because of the partial dependency on the primary key.

The determinant (C#, P#, Date) is the composite key of the Transaction relation - its
value will uniquely determine the value of every other non-key attribute in a tuple of the
relation. Note that whilst Qnt is fully functionally dependent on all of (C#, P#, Date),
Cname, Ccity and Cphone are only partially functionally dependent on the composite key
(as they each depend only on the C# part of the key only but not on P# or Date).

The problems are avoided by eliminating partial key dependence in favour of full
functional dependence, and we can do so by separating the dependencies as follows:

153

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

The source relation is thus split into two (or more) relations whereby each resultant
relation no longer has any partial key dependencies:

Figure 4-8: Relations in 2NF

We now have two relations, both of which are in the second normal form.

Update anomaly
There are no redundant/duplicate tuples in the relation, thus updates are done just at one
place without any worry for database inconsistencies.

Addition anomaly
Adding a new customer can be done in the Customer relation without concern whether
there is a transaction for a part or not.

154

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Deletion anomaly
Deleting a tuple in Transaction does not cause loss of information about Customer details.

More generally, we shall summarise by stating the following:

Suppose, there is a relation R

where the composite attribute (K1, K2) is the Primary Key. Suppose also that there exist
the following functional dependencies:
(K1, K2)I1
i.e. a full functional dependency on the composite key (K1, K2).

K2I2
i.e. a partial functional dependency on the composite key (K1, K2).

The partial dependencies on the primary key must be eliminated. The reduction of 1NF
into 2NF consists of replacing the 1NF relation by appropriate "projections" such that
every non-key attribute in the relations are fully functionally dependent on the primary
key of the respective relation. The steps are:

If a relation has the same determinant as another relation, place the dependent attributes
of the relation to be non-key attributes in the other relation for which the determinant is a
key.

Figure 4-9: Reduction of 1NF into 2NF

Thus, "A relation R is in 2NF if it is in 1NF and every non-key attribute is fully
functionally dependent on the primary key".

Closure of Attribute Sets

155

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Given a set of attributes a, define the closure of a under F (denoted by a+) as the set
of attributes that are functionally determined by a under F

Algorithm to compute a+, the closure of a under F


result := a;
while (changes to result) do
for each    in F do
begin
if   result then result := result  
end

Q Compute the closure of the following set F of functional dependencies for relation
schema R = (A,B,C,D,E)
A →BC
CD →E
B →D
E →A
List the candidate keys for R.
(b) Using the functional dependencies of compute B+.

1) First we check about A

Result ={A}+

A →BC Result = {A,B,C}


CD →E Result = {A,B,C}
B →D Result = {A,B,C,D}
E →A Result = {A,B,C,D}

Repeat again

A →BC Result = {A,B,C,D}


CD →E Result = {A,B,C,D,E}

Therefore A is a candidate key

2) second we check about CD

Result ={CD}+

A →BC Result = {C,D }


CD →E Result = { C,D,E}

156

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

B →D Result = { C,D,E }
E →A Result = { A,C,D,E }

Repeat again

A →BC Result = {A,B,C,D,E}

Therefore CD is a candidate key

3) Third we check about B

Result ={B}+

A →BC Result = { B }
CD →E Result = { B }
B →D Result = { B,D}
E →A Result = { B,D}

Repeat again

A →BC Result = {B,D}


CD →E Result = {B,D}
B →D Result = { B,D}
E →A Result = { B,D}

Therefore B is not a candidate key

4) We check about E
Result ={E}+

A →BC Result = { E }
CD →E Result = { E }
B →D Result = { E}
E →A Result = { E,A}

Repeat again

A →BC Result = { E,A,B,C}


CD →E Result = { E,A,B,C}
B →D Result = { E,A,B,C,D}

Therefore E is a candidate key

(b) Using the functional dependencies of Q27a compute B+.

157

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Result ={B}+

A →BC Result = { B }
CD →E Result = { B }
B →D Result = { B,D}
E →A Result = { B,D}

Repeat again

A →BC Result = {B,D}


CD →E Result = {B,D}
B →D Result = { B,D}
E →A Result = { B,D}

Therefore B is not a candidate key

Q Suppose that we decompose the scheme R=(A,B,C,D,E) into


(A,B,C)
(A,D,E)
Show that this decomposition is a lossless-join decomposition if the following set
F of functional dependencies holds:
A →BC
CD →E
B →D
E →A
A decomposition {R1, R2} is a lossless-join decomposition
if R1 ∩ R2 → R1
or R1 ∩ R2 → R2.
Let R1 = (A, B, C),
R2 = (A, D, E),
and R1 ∩ R2 = A.

We have to find out whether A is a candidate key or not.


If yes than decomposition is lossless.
Result ={A}+

A →BC Result = {A,B,C}


CD →E Result = {A,B,C}
B →D Result = {A,B,C,D}
E →A Result = {A,B,C,D}

Repeat again

A →BC Result = {A,B,C,D}

158

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

CD →E Result = {A,B,C,D,E}

Therefore A is a candidate key and decomposition is lossless


Q List all functional dependencies satisfied by the relation of figure

A B C
a1 b1 c1
a1 b1 c2
a2 b1 c1
a2 b1 c2
Ans
A->B is satisfied since t3(a2,b1) and t4(a2,b1)
A-C is not satisfied since t1(a1,c1) and t2(a1,c2)
B->A is not satisfied since t1(b1,a1) and t3(b1,a2)
C->B is satisfied since t1(c1,b1) and t3(c1,b1)
AB->C is not satisfied since t1(a1,b1,c1) and t2(a1,b1,c2)
BC->A is not satisfied since t1(b1,c1,a1) and t3(b1,c1,a2)
Now trivial functional Dependencies
AB-> A is satisfied since t1(a1,b1,a1) and t2(a1,b1,a1)
BC->A is satisfied
CA->B is satisfied

Boyce-Codd Normal Form (BCNF)


A relation schema R is in BCNF with respect to a set F of functional dependencies if for
all functional dependencies in F+ of the form



where   R and   R, at least one of the following holds:


a)    is trivial (i.e.,   )
b)  is a superkey for R
Example schema not in BCNF:

bor_loan = ( customer_id, loan_number, amount )

because loan_number  amount holds on bor_loan but loan_number is not a superkey

Example BCNF
• R = (A, B, C)
• F = (A==> B,
B==> C)
• R is not in BCNF
• Decomposition R1 = (A, B), R2 = (B, C)
– R1 and R2 are in BCNF

159

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

– Lossless-join decomposition
– Dependency preserving
BCNF and Dependency Preservation
Constraints, including functional dependencies, are costly to check in practice unless
they pertain to only one relation
If it is sufficient to test only those dependencies on each individual relation of a
decomposition in order to ensure that all functional dependencies hold, then that
decomposition is dependency preserving.
Because it is not always possible to achieve both BCNF and dependency
preservation, we consider a weaker normal form, known as third normal form.

Third Normal Form


A relation schema R is in third normal form (3NF) if for all:
   in F+
at least one of the following holds:
   is trivial (i.e.,   )
 is a superkey for R
Each attribute A in  –  is contained in a candidate key for R.
(NOTE: each attribute may be in a different candidate key)
If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditions
above must hold).
Third condition is a minimal relaxation of BCNF to ensure dependency preservation (

Example
• R = (J, K, L)
• F = (JK==> L,
L==> K)
• Two candidate keys: JK and JL
• R is in 3NF
– JK==>L JK is a superkey
– L==>K K is contained in a candidate key
• BCNF decomposition has R1 = (J, L), R2 = (J, K)
– testing for JK==>L requires a join
• There is some redundancy in this schema

Comparison of BCNF and 3NF

• It is always possible to decompose a relation into relations in 3NF such that:


– the decomposition is lossless
– the dependencies are preserved

• It is always possible to decompose a relation into relations in BCNF such that:


– the decomposition is lossless
– but it may not be possible to preserve dependencies

160

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Example of problems due to redundancy in 3NF


– R = (J, K, L) J L K
F = (JK==> L, L==> K) j1 l1 k1
j2 l1 k1
j3 l1 k1
null l2 k2
A schema that is in 3NF but not BCNF has the problems of:
– repetition of information (e.g., the relationship between l1 and k1)
– need to use null values (e.g., to represent the relationship between l2 and
k2 when there is no corresponding value for attribute J)

Closure of a Set of Functional Dependencies


Given a set F set of functional dependencies, there are certain other functional
dependencies that are logically implied by F.
For example: If A  B and B  C, then we can infer that A  C
The set of all functional dependencies logically implied by F is the closure of F.
We denote the closure of F by F+.
We can find all of F+ by applying Armstrong’s Axioms:
if   , then    (reflexivity)
if   , then      (augmentation)
if   , and   , then    (transitivity)
We can further simplify manual computation of F+ by using the following additional
rules.
If    holds and    holds, then     holds (union)
If     holds, then    holds and    holds (decomposition)
If    holds and     holds, then     holds (pseudotransitivity)
The above rules can be inferred from Armstrong’s axioms.

Extraneous Attributes
Consider a set F of functional dependencies and the functional dependency    in F.
Attribute A is extraneous in  if A  
and F logically implies (F – {  })  {( – A)  }.
Attribute A is extraneous in  if A  
and the set of functional dependencies
(F – {  })  { ( – A)} logically implies F.
Note: implication in the opposite direction is trivial in each of the cases above, since a
“stronger” functional dependency always implies a weaker one
Example: Given F = {A  C, AB  C }
B is extraneous in AB  C because {A  C, AB  C} logically implies A  C (I.e. the
result of dropping B from AB  C).
Example: Given F = {A  C, AB  CD}
C is extraneous in AB  CD since AB  C can be inferred even after deleting C
Testing if an Attribute is Extraneous
Consider a set F of functional dependencies and the functional dependency    in F.
To test if attribute A   is extraneous in 

161

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

compute ({} – A)+ using the dependencies in F


check that ({} – A)+ contains ; if it does, A is extraneous in 
To test if attribute A   is extraneous in 
1.compute + using only the dependencies in
F’ = (F – {  })  { ( – A)},
2. check that + contains A; if it does, A is extraneous in 

Canonical Cover
A canonical cover for F is a set of dependencies Fc such that F logically implies all
dependencies in Fc, and Fc logically implies all dependencies in F, and
No functional dependency in Fc contains an extraneous attribute, and
Each left side of functional dependency in Fc is unique.
To compute a canonical cover for F:
repeat
Use the union rule to replace any dependencies in F
1  1 and 1  2 with 1  1 2
Find a functional dependency    with an
extraneous attribute either in  or in 
If an extraneous attribute is found, delete it from   
until F does not change
Note: Union rule may become applicable after some extraneous attributes have been
deleted, so it has to be re-applied
Computing a Canonical Cover
R = (A, B, C)
F = {A  BC
BC
AB
AB  C}
Combine A  BC and A  B into A  BC
Set is now {A  BC, B  C, AB  C}
A is extraneous in AB  C
Check if the result of deleting A from AB  C is implied by the other dependencies
Yes: in fact, B  C is already present!
Set is now {A  BC, B  C}
C is extraneous in A  BC
Check if A  C is logically implied by A  B and the other dependencies
Yes: using transitivity on A  B and B  C.
Can use attribute closure of A in more complex cases
The canonical cover is: A B
BC
Fourth Normal Form

 An entity type is in 4NF if it is in 3NF and there are no multivalued


dependencies between its attribute types
 Any entity type in 3NF is transformed to 4NF
 Detect any multivalued dependencies

162

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

 Decompose entity type

BOOK_NO
AUTHOR_NOAnomaly SUBJECT
because the same set of SUBJECT is associated with each AUTHOR
A1 B1 Comp. Sc.

A1 B1 Maths
multidetermines
A2 B1 Comp. Sc.
BOOK_NO AUTHOR_NO

A2 B1 Maths

A3 B2 Maths BOOK_NO SUBJECT

AUTHOR_NO BOOK_NO SUBJECT

A1 B1 Comp. Sc.

A1 B1 Maths IN 4th NORMAL FORM


A2 B1 Comp. Sc.
AUTHOR (Author_no, Author_name)
A2 B1 Maths BOOK (Book_no, Book_Title)
A3 B2 Maths AUTHOR-BOOK (Author_no, Book_no)
BOOK-SUBJECT (Book_no, Subject)

AUTHOR_NO BOOK_NO BOOK_NO SUBJECT

A1 B1 B1 Comp. Sc.

A2 B1 B1 Maths

A3 B2 B2 Maths

UNIT VI
163

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Integrity Constraints: Domain Constraints, Referential Integrity, Assertions, Triggers &


Functional Dependencies.

Integrity Constraints
Integrity constraints guard against accidental damage to the database, by ensuring that
authorized changes to the database do not result in a loss of data consistency.
 checking account must have a balance greater than $10,000.00

164

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

 salary of a bank employee must be at least $4.00 an hour


 customer must have a (non-null) phone number

Constraints on a Single Relation


 not null
 primary key
 unique
 check (P ), where P is a predicate

Not Null Constraint


 Declare branch_name for branch is not null
branch_name char(15) not null

 Declare the domain Dollars to be not null

create domain Dollars numeric(12,2) not null

The Unique Constraint


 unique ( A1, A2, …, Am)
 The unique specification states that the attributes
A1, A2, … Am
form a candidate key.
 Candidate keys are permitted to be null (in contrast to primary keys).

The check clause


check (P ), where P is a predicate

Example: Declare branch_name as the primary key for branch and ensure that the
values of assets are non-negative.create table branch
(branch_name char(15),
branch_city char(30),
assets integer,
primary key (branch_name),
check (assets >= 0))
 Use check clause to ensure that an hourly_wage domain allows only values
greater than a specified value.
create domain hourly_wage numeric(5,2)
constraint value_test check(value > = 4.00)
 The domain has a constraint that ensures that the hourly_wage is greater than
4.00
 The clause constraint value_test is optional; useful to indicate which
constraint an update violated

165

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Assertions
An assertion is a predicate expressing a condition that we wish the database always to
satisfy.
An assertion in SQL takes the form
create assertion <assertion-name> check <predicate>

Every loan has at least one borrower who maintains an account with a minimum
balance or $1000.00

create assertion balance_constraint check


(not exists (
select *
from loan
where not exists (
select *
from borrower, depositor, account
where loan.loan_number = borrower.loan_number
and borrower.customer_name = depositor.customer_name
and depositor.account_number = account.account_number
and account.balance >= 1000)))

Referential Integrity
The foreign key clause lists the attributes that comprise the foreign key and
the name of the relation referenced by the foreign key. By default, a foreign
key references the primary key attributes of the referenced table.

create table customer


(customer_name char(20),
customer_street char(30),
customer_city char(30),
primary key (customer_name ))
create table branch
(branch_name char(15),
branch_city char(30),
assets numeric(12,2),
primary key (branch_name ))

create table account


(account_number char(10),
branch_name char(15),
balance integer,
primary key (account_number),
foreign key (branch_name) references branch )

166

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

create table depositor


(customer_name char(20),
account_number char(10),
primary key (customer_name, account_number),
foreign key (account_number ) references account,
foreign key (customer_name ) references customer )

Q Create the table described below:


Table Name: client_master
Description: Used to store information about clients.

Column Name Date Type Size Attribute


client_no Varchar2 6 Primary key / First letter must
start with ‘C’
Name Varchar2 20
Address1 Varchar2 30
Address2 Varchar2 30
City Varchar2 15
Pincode number 8
State Varchar2 15
bal_due number 10,2
Solution
create table client_master(
client_no varchar2(6) primary key check(client_no like 'C%'),
name varchar2(20),
address1 varchar2(30),
address2 varchar2(30),
city varchar2(15),
pincode number(8),
state varchar2(15),
bal_due number(10,2)
)
Q2)
Table Name: product_master
Description: Used to store information about products.

Column Name Date Type Size Attribute


product_no Varchar2 6 Primary key / First letter must
start with ‘P’
description Varchar2 15
profit_percent number 4,2
Unit_measure Varchar2 10
Qty_on_hand number 8
reorder_lvl number 8
Sale_price number 8,2 Cannot be 0
cost_price number 8,2 Cannot be 0

167

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Solution
create table product_master(
product_no varchar2(6) primary key check(product_no like 'P%'),
description varchar2(15),
profit_percent number(4,2),
unit_mesure varchar2(10),
qty_on_hand number(8),
record_lvl number(8),
sell_price number(8,2) check(sell_price>0),
cost_price number(8,2) check(cost_price>0)
);
QTable Name: salesman_master
Description: Used to store information about salesman working in the company.
Column Name Date Type Size Attribute
salesman_no Varchar2 6 Primary key / First letter must
start with ‘S’
salesman_name Varchar2 20 Not null
address1 Varchar2 30
address2 Varchar2 30
city Varchar2 20
pincode Varchar2 8
state Varchar2 20
sal_amt number 8,2 Not null, Cannot be 0
tgt_to_get number 6,2 Not null, Cannot be 0
ytd_sales number 6,2 Not null
remarks Varchar2 60
Solution
create table salesman_master(
salesman_no varchar2(6) primary key check(salesman_no like 'S%'),
salesman_name varchar2(20) not null,
address1 varchar2(30),
address2 varchar2(30),
city varchar2(20),
pincode varchar2(8),
state varchar2(20),
sal_amt number(8,2) not null check (sal_amt<>0),
tgt_to_get number(6,2) not null check(tgt_to_get <>0),
ytd_sales number(6,2) not null,
remarks varchar2(60)
);
QTable Name: salesman_order
Description: Used to store information about clients order
Column Name Date Type Size Attribute
Order_no Varchar2 6 Primary key / First letter must
start with ‘O’
Order_date date

168

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Client_no Varchar2 6 Foreign key references client_no


of client master table
Dely_addr Varchar2 25
Salesman_no Varchar2 6 Foreign key references
salesman_no of
salesman_master table
Dely_type char 1
Billed_yn char 1
Dely_date date Default F
Order_satatus Varchar2 10 Values(‘In process’, ’fulfilled’,
’Back order’, ’Cancelled’)
Solution

create table salesman_order (


order_no varchar2(6) primary key check(order_no like 'O%'),
order_date date,
client_no varchar2(6) references client_master(client_no),
dely_addr varchar2(25),
salesman_no varchar2(6) references salesman_master(salesman_no),
dely_type char(1) default 'F',
billed_yn char(1),
dely_date date ,
order_status varchar2(10) check(order_status in('InProcess','Fulfilled','BackOrd',
'Cancelled'))
);
Q)
Table Name: salesman_order_detail
Description: Used to store information about clients order with detail of each
product ordered
Column Name Date Type Size Attribute
Order_no Varchar2 6 Primary key / Foreign key
references order_no of the
salesman_order_detail
Product_no Varchar2 6 Primary key / Foreign key
references product_no of the
product_master table
Qty_ordered Number 8
Qty_disp Number 8
Product_rate Number 10,2

create table sales_order_details(


order_no varchar2(6) references salesman_order(order_no),
product_no varchar2(6) references product_master(product_no),
qty_ordered number(8),
qty_disp number(8),
product_rate number(10,2)

169

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

);

Triggers
A trigger is a statement that is executed automatically by the system as a side effect of a
modification to the database.
To design a trigger mechanism, we must:
Specify the conditions under which the trigger is to be executed.
Specify the actions to be taken when the trigger executes.
Trigger Example
Suppose that instead of allowing negative account balances, the bank deals with
overdrafts by setting the account balance to zero creating a loan in the amount of the
overdraft giving this loan a loan number identical to the account number of the
overdrawn account
The condition for executing the trigger is an update to the account relation that results
in a negative balance value.

create trigger overdraft-trigger after update on account


referencing new row as nrow
for each row
when nrow.balance < 0
begin atomic
insert into borrower
(select customer-name, account-number
from depositor
where nrow.account-number =
depositor.account-number)
insert into loan values (n.row.account-number, nrow.branch-name, – nrow.balance);
update account set balance = 0
where account.account-number = nrow.account-number
end
Triggering Events and Actions in SQL
Triggering event can be insert, delete or update
Triggers on update can be restricted to specific attributes
E.g. create trigger overdraft-trigger after update of balance on account
Values of attributes before and after an update can be referenced
referencing old row as : for deletes and updates
referencing new row as : for inserts and updates
Triggers can be activated before an event, which can serve as extra constraints. E.g.
convert blanks to null.
create trigger setnull-trigger before update on r
referencing new row as nrow
for each row
when nrow.phone-number = ‘ ‘
set nrow.phone-number = null

170

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Statement Level Triggers


Instead of executing a separate action for each affected row, a single action can be
executed for all rows affected by a transaction
Use for each statement instead of for each row
Use referencing old table or referencing new table to refer to temporary
tables (called transition tables) containing the affected rows
Can be more efficient when dealing with SQL statements that update a large
number of rows

When Not To Use Triggers


Triggers were used earlier for tasks such as maintaining summary data (e.g. total
salary of each department)Replicating databases by recording changes to special
relations (called change or delta relations) and having a separate process that applies
the changes over to a replica
There are better ways of doing these now:
Databases today provide built in materialized view facilities to maintain
summary data
Databases provide built-in support for replication
Encapsulation facilities can be used instead of triggers in many cases
Define methods to update fields
Carry out actions as part of the update methods instead of
through a trigger

UNIT VII

171

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Query Processing: Measure of Query Cost, Evaluation of


Expressions, Selection Operation.

Basic Steps in Query Processing

172

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

1. Parsing and translation


2. Optimization
3. Evaluation

 Parsing and translation


 translate the query into its internal form. This is then
translated into relational algebra.
 Parser checks syntax, verifies relations
 Evaluation
 The query-execution engine takes a query-evaluation
plan, executes that
plan, and returns the answers to the query.
Basic Steps in Query Processing : Optimization
 A relational algebra expression may have many
equivalent expressions
E.g., balance2500(balance(account)) is
equivalent to
balance(balance2500(account))
 Each relational algebra operation can be evaluated
using one of several different algorithms

173

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

 Correspondingly, a relational-algebra expression can


be evaluated in many ways.
 Annotated expression specifying detailed evaluation
strategy is called an evaluation-plan.
E.g., can use an index on balance to find accounts
with balance < 2500,
or can perform complete relation scan and discard
accounts with balance  2500
Query Optimization: Amongst all equivalent
evaluation plans choose the one with lowest cost.
 Cost is estimated using statistical
information from the database catalog
e.g. number of tuples in each relation, size
of tuples, etc.

Measures of Query Cost


 Cost is generally measured as total
elapsed time for answering query
 Many factors contribute to time cost
disk accesses, CPU, or even network
communication
 Typically disk access is the predominant cost,
and is also relatively easy to estimate.
Measured by taking into account
 Number of seeks * average-seek-cost
 Number of blocks read * average-block-read-
cost

174

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

 Number of blocks written * average-block-write-


cost
Cost to write a block is greater than cost
to read a block
 data is read back after being
written to ensure that the write was
successful
 For simplicity we just use the number of block
transfers from disk and the number of seeks as
the cost measures
 tT – time to transfer one block
 tS – time for one seek
 Cost for b block transfers plus S seeks
b * tT + S * tS
 We ignore CPU costs for simplicity
 Real systems do take CPU cost into account
 We do not include cost to writing output to
disk in our cost formulae
 Several algorithms can reduce disk IO by using
extra buffer space
 Amount of real memory available to buffer
depends on other concurrent queries and OS
processes, known only during execution
 We often use worst case estimates, assuming
only the minimum amount of memory needed
for the operation is available
 Required data may be buffer resident already,
avoiding disk I/O

175

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

 But hard to take into account for cost


estimation
Statistical Information for Cost Estimation
1) Catalog Information
 nr:numberoft uplesinar elati
onr.
 br:numberofbl ockscontainingtupl
esofr.
 sr:siz
eofat upleofr.
 f:bl
r ocki
ngfactorofr— i .e.
,thenumberoftupl esofrt
hatfitint
oone
block.
 V(A,r):numberofdistinctvaluesthatappearinrforattri
buteA;same
ast hesieofA(
z )
r.
 SC(A,r):sel ecti
oncardinali
tyofattr
ibuteAofrelati
onr;average
numberofr ecordsthatsat i
sfyequali
tyonA.
I
ftupl esofrarestoredtogetherphy si
call
yinafil
e,then:
nr
b =⌈ f ⌉ r
r

 f
i:averagefan-outofinternalnodesofindexi,f
or
t
ree-struct
uredindicessuc hasB+- tr
ees.
 HTi:numberofl evelsinindexi —i .
e.,t
heheightofi
.
Forabalancedt r
eeindex( suchasB+- tree)onatt
ri
buteAofrel
ati
onr
,
HTi=l
ogfi(
V(A,
r)
).
Forahashi
ndex
,HTii
s1.
:numberofl
LBi owest
-l
evel
indexbl
ocksi
ni— i
.
e,t
henumberofbl
ocksatt
he
l
eaflev
eloft
hei
ndex.

Measur
esofQuer
yCost
Recal
lthat
 Typi
callydi
skaccessi sthepredominantcost,andisal
sorelat
i
v el
y
easyt
oestimate.
 Thenumberofbl ocktransf
ersf
rom disk isusedasameasur eofthe
act
ualcostofev aluat
i
on.
 Iti
sassumedt hatal lt
ransf
ersofblockshavethesamecost.
 Reallif
eoptimizersdonotmak ethisassumption,anddi
st
ingui
sh
betweens equentialandrandom di
skaccess
 Wedonoti ncl
udecostt owri
ti
ngout puttodi
sk .
Wer ef
ertothecostes ti
mateofalgori
thm AasEA

176

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

2 Sel
ect
ionSi
zeEst
imat
ion
Equal
it
ysel
ect
ion A=v(
r)
 SC(A,r):numberofr ecordsthatwil
lsat
is
fythesel
ect
i
on
 SC(A,r
)/r — numberofbl
f ocksthatt
heserecor
dswil
loccupy
E.
g.Binar
ysearchcostes t
imatebecomes

SC( A , r )
Ea 2 =⌈log 2 (br )⌉+⌈ ⌉−1
fr
 Equal
i
tycondi
t
iononak
eyat
tr
ibut
e:SC(
A,)=1
r

St
ati
sti
calI
nfor
mat
ionf
orExampl
es

 facc
ount=20 ( 20t uplesofaccountfitinonebl ock)
 V( branch- name,account )=50 ( 50br anches)
 V( bal ance,account )=500 ( 500di fferentbalancev
alues)
 account=10000 ( accounthas10, 000t uples)
 Assumet hef ol
lowi ngindi
cesexistonaccount :
 Apr imar y +
,B- t
reei ndexforat
tr
ibutebr anch-name
 As econdar y,B-+
t
reeindexforattr
ibut ebalance

Sel
ect
ionsI
nvol
vingCompar
isons

 Select
ionsofthef
orm AV()(
r caseofA V()i
r ssymmet
ri
c)
 Letcdenotet heesti
matednumberoft uplessati
sfyi
ngt
hecondi
t
ion.
 Ifmin(
A,r)andmax(A,r)areavail
abl
eincat al
og
 C=0i fv<mi n(A,r
)

 C=nrifv>=max(a,
r)
And
nr*(
v-mi
n(A,
r)
)/(
max(
A,r
)–mi
n(A,
r)
) ot
her
wise

 I
nabs
enceofst
ati
st
icali
nfor
mat
i
onci
sassumedt
obenr/2.

I
mpl
ement
ati
onofCompl
exSel
ect
ions

 Theselect
ivi
tyofacondi t
ioniist
heprobabi
l
it
ythatatupl
einthe
rel
ati
onrsatsfiesi.I
i fsi i
sthenumberofsati
sfy
ingt
uplesi
nr,the
sel
ecti
vi
tyofii sgi
venbysi/nr.

177

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

 Conj
unct
ion:1 2... n(
r).Theest
i
mat
efornumberof

tupl
esint
her
es ul
tis:
n
Nr*(s1*s2*
……..sn)/nr

Di
sjuncti
on:1 2... n(
r. Est
) imat
ednumberoft
upl
es:
1–(1-s1/
nr)
*(1-s2/ nr) *………( 1-
sn/
nr)

Joi
nOper
ati
on:Runni
ngExampl
e

Runni ngex ampl e:


deposi t
orX cus t
omer
Cat
al ogi nfor mat i
onforjoinexamples:
ncustomer=10, 000.
fcusomer =2
t 5,whichi mpli
esthat
bcustomer=10000/ 25=400.
ndepositor=5000.
fdeposi
tor =5 0,whichi mpli
esthat
bdepositor=5000/ 50=100.
V( cus tomer -
name,deposi t
or)=2500,whi
chimpl
i
est
hat,onaver
age,each
cust omerhast woaccount s.
Alsoassumet hatcus t
omer-nameindeposi
ori
t saf
orei
gnkeyoncustomer.

Est
imat
ionoft
heSi
zeofJoi
ns

 TheCar t
esianproductrxscont ainsnr.nst
uples;eachtupleoccupies
sr+ssbytes.
 I fR S=,t henr| X|si sthes ameasrxs .
 I fR Si sak eyforR,t henat upleofswi ll
joinwithatmostonet uple
from r
t
heref
ore,thenumberoft uplesinr| X|si snogreaterthanthenumberof
t
upl
esins .
 I fR Si nSisaf or ei
gnk eyinSr eferenci
ngR,t henthenumberof
tupl
esinr si sex actl
ythes ameast henumberoft uplesins.
 Thecasef orR Sbei ngaf orei
gnk eyref
erencingSi ssymmetri
c .
 I ntheexamplequer ydeposit
or|X| cus tomer,customer-namein
deposiori
t saf or
eignk eyofcus t
omer

178

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

hence,t
her
esul
thasex
act
l
yndepositort
upl
es,whi
chi
s5000

fR S={
I A}i
snotakeyforRorS.
I
fweassumethatev
eryt
upleti
nRproducest
upl
esi
nR S,t
henumberof
t
upl
esi
nR Si
ses
ti
mat
edt
obe:

nr∗n s
V ( A ,s)
I
fther
ever sei str
ue,t heesti
mat
eobt
ainedwi
l
lbe:
nr∗ns
V ( A ,r )

Thel oweroft hesetwoest i


matesisprobabl
ythemoreaccur
ateone.
Comput et hesizeestimatesfordeposit
or cusomerwi
t t
houtusi
ng
i
nformat i
onaboutf orei
gnk eys:
V(cus t
omer -
name,deposi t )=2500,and
or
V(cus t
omer -
name,cus t
omer)=10000
Thetwoes ti
matesar e
5000*10000/ 2500=20, 000and5000*10000/ 10000=5000
Wechooset hel owerestimate,whichinthi
scase,i
sthesameasourearl
ier
comput ati
onusingf or
eignk eys.

Si
zeEst
imat
ionf
orOt
herOper
ati
ons

Projecti
on:est i
matedsi zeofA( ) = V(
r A,)
r
Aggr egati
on:es t
imatedsi z
eofAgF( r) =V(A,)
r
Setoper ati
ons
Foruni ons/i
nter
secti
onsofsel ecti
onsont hesamer el
ati
on:r
ewrit
eandus e
siz
eest imateforselecti
ons
E.g.1(r) 2()canber
r ewr
itenas12(
t r)
Foroper ati
onsondi ffer
entrelat
ions:
 est imatedsizeofr s =si zeofr+sizeofs.
 est imatedsizeofr s =mi nimum siz
eofrandsi zeofs.
 est imatedsizeofr–s =r .
Allt
het hreeestimatesmaybequi t
einaccur
ate,butpr
ovideupperboundson
thesi z
es.

Outer join:
Estimated size of r s = size of r join s + size of r

Case of right outer join is symmetric


Estimated size of r full Join s = size of r j
oi
n s + size of r + size of s

179

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Est
imat
ionofNumberofDi
sti
nctVal
ues
Sel
ect ions: (
r)
 If forcesAt otak eas peci
fiedv alue:V( A, (
r))=1.
e. g.,A=3
 If forcesAt otak eononeofas peci fieds etofval
ues:
V(A, (
r))=numberofspec ifiedv alues.
(e. g.,(A=1VA=3VA=4) )
,
 Ifthesel ectioncondi t
ion i
soft hef orm Aopr
estimatedV( A,  (r))=V( A.)*s
r
wher esi sthesel ecti
vi
tyoft hesel ecti
on.
 Inal ltheothercases :useappr oximat ees t
imateof
min(V(A, )
r,n (r))
Mor eaccurateest imatecanbegotusi ngpr obabili
tytheor
y,butt
hisonewor
ks
finegener all
y

Joi
ns:r X s
 Ifallatt
ri
butesinAar efrom r
est
i
mat ed V(A,rX s)=mi n(V(A,
r),nrx s)
 IfAcont ai
nsat t
ri
butesA1f r
om randA2f r
om s,t henestimated
V(A,rXs)=
min(V(A1,r)
*V(A2–A1, s)
,V(A1–A2, r)
*V(A2,
s) ,nrx s)
Moreaccuratees t
imat
ecanbegotusi ngprobabil
it
yt heory,butthi
sone
worksfinegenerall
y

Estimation of distinct values are straightforward for projections.


They are the same in A (r) as in r.
The same holds for grouping attributes of aggregation.
For aggregated values
 For min(A) and max(A), the number of distinct values can be
estimated as min(V(A,r), V(G,r)) where G denotes grouping attributes
 For other aggregates, assume all values are distinct, and use V(G,r)

Transformation of Relational Expressions


Two relational algebra expressions are said to be equivalent if on every legal database
instance the two expressions generate the same set of tuples
o Note: order of tuples is irrelevant
In SQL, inputs and outputs are multisets of tuples
o Two expressions in the multiset version of the relational algebra are said to
be equivalent if on every legal database instance the two expressions
generate the same multiset of tuples
An equivalence rule says that expressions of two forms are equivalent
o Can replace expression of first form by second, or vice versa

180

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

(Algebric Manipulation ) Equivalence Rules


1.Conjunctive selection operations can be deconstructed into a sequence of individual
selections.
1 2 (E) = 1(2(E))
2.Selection operations are commutative.
1(2(E)) = 2(1(E))
3. Only the last in a sequence of projection operations is needed, the others can be
omitted.
L1(L2(-----------------(Ln (E)…..))= L1 (E)
4. Selections can be combined with Cartesian products and theta joins.
. (E1 X E2) = E1  E2
. 1(E1 X 2 E2) = E1 X 1 2 E2

5. Theta-join operations (and natural joins) are commutative.


E1 X  E2 = E2 X  E1
6. (a) Natural join operations are associative:
(E1 X E2) X E3 = E1 X (E2 X E3)

(b) Theta joins are associative in the following manner:

(E1 X 1 E2) X 2  3 E3 = E1 X 1 3 (E2 X 2 E3)

where 2 involves attributes from only E2 and E3.


7. The selection operation distributes over the theta join operation under the
following two conditions:
(a) When all the attributes in 0 involve only the attributes of one
of the expressions (E1) being joined.

0E1 X  E2) = (0(E1)) X  E2

181

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

(b) When  1 involves only the attributes of E1 and 2 involves


only the attributes of E2.
1 E1 X  E2) = (1(E1)) X  ( (E2))
8. The projections operation distributes over the theta join operation as follows:
(a) Let L1 and L2 be sets of attributes from E1 and E2, respectively.
if  involves only attributes from L1  L2:
L1 U L1 E1 X  E2) = ( L1(E1)) X  ( L2(E2))
(b) Consider a join E1  E2.
o Let L1 and L2 be sets of attributes from E1 and E2, respectively.
o Let L3 be attributes of E1 that are involved in join condition , but are not
in L1  L2, and
o let L4 be attributes of E2 that are involved in join condition , but are not
in L1  L2.


L
1
(E
21
L 
E
2
.....
) 
L
1L
((
2 L
1
())
E
31
L 
(
L (
E
......
242
L )
9. The set operations union and intersection are commutative
E1  E2 = E2  E1
E1  E2 = E2  E1
n (set difference is not commutative).
10. Set union and intersection are associative.
(E1  E2)  E3 = E1  (E2  E3)
(E1  E2)  E3 = E1  (E2  E3)
11. The selection operation distributes over ,  and –.
 (E1 – E2) =  (E1) – (E2)
and similarly for  and  in place of –
Also:  (E1 – E2) = (E1) – E2
and similarly for  in place of –, but not for 

12. The projection operation distributes over union


L(E1  E2) = (L(E1))  (L(E2))
Transformation Example

Query: Find the names of all customers who have an account at some branch located
in Brooklyn.
customer-name(branch-city = “Brooklyn”
(branch (account depositor)))
Transformation using rule 7a.
customer-name
((branch-city =“Brooklyn” (branch))
(account depositor))
Performing the selection as early as possible reduces the size of the relation to be
joined.
Query: Find the names of all customers with an account at a Brooklyn branch whose
account balance is over $1000.
customer-name((branch-city = “Brooklyn”  balance > 1000
(branch (account depositor)))

182

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Transformation using join associatively (Rule 6a):


customer-name((branch-city = “Brooklyn”  balance > 1000
(branch (account)) depositor)
Second form provides an opportunity to apply the “perform selections early” rule,
resulting in the subexpression
branch-city = “Brooklyn” (branch)  balance > 1000 (account)
Thus a sequence of transformations can be useful

Projection Operation Example

customer-name((branch-city = “Brooklyn” (branch) account) depositor)

When we compute
(branch-city = “Brooklyn” (branch) account )
we obtain a relation whose schema is:
(branch-name, branch-city, assets, account-number, balance)
Push projections using equivalence rules 8a and 8b; eliminate unneeded attributes
from intermediate results to get:
 customer-name ((
 account-number ( (branch-city = “Brooklyn” (branch) account ))
depositor)

Join Ordering Example

For all relations r1, r2, and r3,


(r1 X r2) X r3 = r1 X (r2 X r3 )
If r2 X r3 is quite large and r1 X r2 is small, we choose

(r1 X r2) X r3
so that we compute and store a smaller temporary relation.

Consider the expression


customer-name ((branch-city = “Brooklyn” (branch))
account depositor)
Could compute account depositor first, and join result with
branch-city = “Brooklyn” (branch)
but account depositor is likely to be a large relation.

183

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Since it is more likely that only a small fraction of the bank’s customers have
accounts in branches located in Brooklyn, it is better to compute
branch-city = “Brooklyn” (branch) account
first.

Evaluation Plan
An evaluation plan defines exactly what algorithm is used for each operation, and
how the execution of the operations is coordinated.

Choice of Evaluation Plans


Must consider the interaction of evaluation techniques when choosing evaluation
plans: choosing the cheapest algorithm for each operation independently may not
yield best overall algorithm. E.g.
o merge-join may be costlier than hash-join, but may provide a sorted output
which reduces the cost for an outer level aggregation.
o nested-loop join may provide opportunity for pipelining
Practical query optimizers incorporate elements of the following two broad
approaches:
1. Search all the plans and choose the best plan in a cost-based fashion.
2. Uses heuristics to choose a plan.

Cost-Based Optimization
Consider finding the best join-order for r1 r2 . . . rn.
There are (2(n – 1))!/(n – 1)! different join orders for above expression. With n = 7,
the number is 665280, with n = 10, the number is greater than 176 billion!
No need to generate all the join orders. Using dynamic programming, the least-cost
join order for any subset of
{r1, r2, . . . rn} is computed only once and stored for future use.

Dynamic Programming in Optimization


n To find best join tree for a set of n relations:
o To find best plan for a set S of n relations, consider all possible plans of
the form: S1 (S – S1) where S1 is any non-empty subset of S.
o Recursively compute costs for joining subsets of S to find the cost of each
plan. Choose the cheapest of the 2n – 1 alternatives.
o When plan for any subset is computed, store it and reuse it when it is
required again, instead of recomputing it
Dynamic programming

184

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Left Deep Join Trees


In left-deep join trees, the right-hand-side input for each join is a relation, not the
result of an intermediate join.

185

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

186

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Cost of Optimization
With dynamic programming time complexity of optimization with bushy trees is
O(3n).
o With n = 10, this number is 59000 instead of 176 billion!
Space complexity is O(2n)
To find best left-deep join tree for a set of n relations:
o Consider n alternatives with one relation as right-hand side input and the
other relations as left-hand side input.
o Using (recursively computed and stored) least-cost join order for each
alternative on left-hand-side, choose the cheapest of the n alternatives.
If only left-deep trees are considered, time complexity of finding best join order is
O(n 2n)
o Space complexity remains at O(2n)
Cost-based optimization is expensive, but worthwhile for queries on large datasets
(typical queries have small n, generally < 10)
Consider the expression (r1 r2 r3) r4 r5
An interesting sort order is a particular sort order of tuples that could be useful for
a later operation.
o Generating the result of r1 r2 r3 sorted on the attributes common with
r4 or r5 may be useful, but generating it sorted on the attributes common
only r1 and r2 is not useful.
o Using merge-join to compute r1 r2 r3 may be costlier, but may
provide an output sorted in an interesting order.
Not sufficient to find the best join order for each subset of the set of n given relations;
must find the best join order for each subset, for each interesting sort order
o Simple extension of earlier dynamic programming algorithms
o Usually, number of interesting orders is quite small and doesn’t affect
time/space complexity significantly
Heuristic Optimization
Cost-based optimization is expensive, even with dynamic programming.
Systems may use heuristics to reduce the number of choices that must be made in a
cost-based fashion.
Heuristic optimization transforms the query-tree by using a set of rules that typically
(but not in all cases) improve execution performance:
o Perform selection early (reduces the number of tuples)
o Perform projection early (reduces the number of attributes)
o Perform most restrictive selection and join operations before other similar
operations.
o Some systems use only heuristics, others combine heuristics with partial
cost-based optimization.

Steps in Typical Heuristic Optimization

187

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

1. Deconstruct conjunctive selections into a sequence of single selection operations


(Equiv. rule 1.).
2. Move selection operations down the query tree for the earliest possible execution
(Equiv. rules 2, 7a, 7b, 11).
3. Execute first those selection and join operations that will produce the smallest
relations (Equiv. rule 6).
4. Replace Cartesian product operations that are followed by a selection condition
by join operations (Equiv. rule 4a).
5. Deconstruct and move as far down the tree as possible lists of projection
attributes, creating new projections where needed (Equiv. rules 3, 8a, 8b, 12).
6. Identify those subtrees whose operations can be pipelined, and execute them using
pipelining).

UNIT VIII
Transaction & Concurrency Control: Transaction Concepts & ACID Properties,
Transaction States, Concurrent Executions, Serializability & Its Testing, Guarantee
Serializability, Recoverability, Introduction to Concurrency Control, Locked Base
Protocol & Deadlock Handling.

188

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Transaction Concept

A transaction is a unit of program execution that


accesses and possibly updates various data items.
A transaction must see a consistent database.
During transaction execution the database may be
inconsistent.
When the transaction is committed, the database
must be consistent.
Two main issues to deal with:
o Failures of various kinds, such as hardware
failures and system crashes
o Concurrent execution of multiple
transactions

ACID Properties
To preserve integrity of data, the database system
must ensure:

189

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

 Atomicity. Either all operations of the


transaction are properly reflected in the
database or none are.
 Consistency. Execution of a transaction in
isolation preserves the consistency of the
database.
 Isolation. Although multiple transactions may
execute concurrently, each transaction must be
unaware of other concurrently executing
transactions. Intermediate transaction results
must be hidden from other concurrently
executed transactions.
That is, for every pair of transactions Ti and
Tj, it appears to Ti that either Tj, finished
execution before Ti started, or Tj started
execution after Ti finished.

 Durability. After a transaction completes


successfully, the changes it has made to the
database persist, even if there are system failures.

Transaction to transfer $50 from account A to


account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50

190

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

6. write(B)
Consistency requirement – the sum of A and B is
unchanged by the execution of the transaction.
Atomicity requirement — if the transaction fails
after step 3 and before step 6, the system should
ensure that its updates are not reflected in the
database, else an inconsistency will result.
Durability requirement — once the user has been
notified that the transaction has completed (i.e.,
the transfer of the $50 has taken place), the
updates to the database by the transaction must
persist despite failures.
Isolation requirement — if between steps 3 and 6,
another transaction is allowed to access the
partially updated database, it will see an
inconsistent database
(the sum A + B will be less than it should be).
Can be ensured trivially by running transactions
serially, that is one after the other. However,
executing multiple transactions concurrently has
significant benefits, as we will see.

Transaction State

Active, the initial state; the transaction stays in this state


while it is executing

191

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Partially committed, after the final statement has been


executed.
Failed, after the discovery that normal execution can no
longer proceed.
Aborted, after the transaction has been rolled back and
the database restored to its state prior to the start of the
transaction. Two options after it has been aborted:
o restart the transaction – only if no internal
logical error
o kill the transaction
Committed, after successful completion

Tr
ansact
ionSt
ate

192

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Implementation of Atomicity and Durability

The recovery-management component of a database system implements the support


for atomicity and durability.
The shadow-database scheme:
o assume that only one transaction is active at a time.
o a pointer called db_pointer always points to the current consistent copy of
the database.
o all updates are made on a shadow copy of the database, and db_pointer is
made to point to the updated shadow copy only after the transaction
reaches partial commit and all updated pages have been flushed to disk.
o in case transaction fails, old consistent copy pointed to by db_pointer can
be used, and the shadow copy can be deleted.

I
mpl
ement
ati
onofAt
omi
cit
yandDur
Theshadow-
dat
abases
cheme:

Assumesdiskst
onotfai
l
Usefulf
ort
extedi
tor
s,butext
remel
yineffici
entf
orl
argedat
abases
:ex
ecut
i
ngasi
ngl
etr
ans
a

193

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Concurrent Executions
Multiple transactions are allowed to run concurrently in the system. Advantages are:
o increased processor and disk utilization, leading to better transaction
throughput: one transaction can be using the CPU while another is reading
from or writing to the disk
o reduced average response time for transactions: short transactions need
not wait behind long ones.
Concurrency control schemes – mechanisms to achieve isolation, i.e., to control the
interaction among the concurrent transactions in order to prevent them from
destroying the consistency of the database
Schedules
Schedules – sequences that indicate the chronological order in which instructions of
concurrent transactions are executed
o a schedule for a set of transactions must consist of all instructions of those
transactions
o must preserve the order in which the instructions appear in each individual
transaction.

Concurrency control in database management systems (DBMS) ensures that database


transactions are performed concurrently without the concurrency violating the data
integrity of a database. Transactions should be executed safely and follow the ACID
rules, as described below. The DBMS must guarantee that only serializable (unless
Serializability is relaxed), recoverable schedules are generated, and also that no
committed actions are lost while undoing aborted transactions.
The main categories of concurrency control mechanisms are:
Optimistic - Delay the synchronization for a transaction until its end without blocking
(read, write) operations, and then abort transactions that violate desired synchronization
rules.
Pessimistic - Block operations of transaction that would cause violation of
synchronization rules.
There are several methods for concurrency control. Among them:
Two-phase locking
Strict two-phase locking
Conservative two-phase locking
Index locking
Multiple granularity locking

194

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. The
following is a serial schedule (Schedule 1 in the text), in which T1 is followed by T2.

T1 T2
Read(A)
A=A-50
Write(A)
Read(B)
B=B+50
Write(B)
Read(A)
A=A-50
Write(A)
Read(B)
B=B+50
Write(B)

Schedule 1 – a serial schedule in which T1 is followed by T2

T1 T2
Read(A)
Temp=A*0.1
A=A-temp
Write(A)
Read(B)
B=B+temp
Write(B)
Read(A)
A=A-50
Write(A)
Read(B)
B=B+50
Write(B)

Schedule 2 - a serial schedule in which T2 is followed by T1

Let T1 and T2 be the transactions defined previously. The following schedule


(Schedule 3) is not a serial schedule, but it is equivalent to Schedule 1.

T1 T2

195

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Read(A)
A=A-50
Write(A)
Read(A)
Temp=A*0.1
A=A-temp
Write(A)
Read(B)
B=B+50
Write(B)
Read(B)
B=B+temp
Write(B)

Schedule 3 – a concurrent schedule equivalent to Schedule 1.

In both Schedule 1 and 3, the sum A + B is preserved

The following concurrent schedule (Schedule 4 in the text) does not preserve the value of
the the sum A + B.

Schedule 4

Serializability

Basic Assumption – Each transaction preserves database consistency.


Thus serial execution of a set of transactions preserves database consistency.
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule.
Different forms of schedule equivalence give rise to the notions of:
o conflict serializability
o view serializability

196

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

We ignore operations other than read and write instructions, and we assume that
transactions may perform arbitrary computations on data in local buffers in between
reads and writes. Our simplified schedules consist of only read and write
instructions.

Conflict Serializability
Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there
exists some item Q accessed by both li and lj, and at least one of these instructions
wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
Intuitively, a conflict between li and lj forces a (logical) temporal order between
them. If li and lj are consecutive in a schedule and they do not conflict, their results
would remain the same even if they had been interchanged in the schedule.

If a schedule S can be transformed into a schedule S´ by a series of swaps of non-


conflicting instructions, we say that S and S´ are conflict equivalent.
We say that a schedule S is conflict serializable if it is conflict equivalent to a serial
schedule
Example of a schedule that is not conflict serializable:

T3 T4
Read(Q)
Write(Q)
Write(Q)
We are unable to swap instructions in the above schedule to obtain either the serial
schedule < T3, T4 >, or the serial schedule < T4, T3 >.

T1 T2
Read(A)
A=A-50
Write(A)
Read(A)
Temp=A*0.1
A=A-temp
Write(A)
Read(B)
B=B+50
Write(B)

197

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Read(B)
B=B+temp
Write(B)

Schedule 3
Schedule 3 below can be transformed into Schedule 1, a serial schedule where T2
follows T1, by series of swaps of non-conflicting instructions. Therefore Schedule 3
is conflict serializable.

T1 T2
Read(A)
A=A-50
Write(A)
Read(A)
Temp=A*0.1
A=A-temp
Read(B)
Write(A)
B=B+50
Write(B)
Read(B)
B=B+temp
Write(B)
Schedule 3_a (After swaping between Read(B) and Write (A)

T1 T2
Read(A)
A=A-50
Write(A)
Read(B)
Read(A)
Temp=A*0.1
A=A-temp
Write(A)
B=B+50
Write(B)
Read(B)
B=B+temp
Write(B)
Schedule3-b

And Finally we will get Serial Schedule -1

View Serializability

198

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Let S and S´ be two schedules with the same set of transactions. S and S´ are view
equivalent if the following three conditions are met:
1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S,
then transaction Ti must, in schedule S´, also read the initial value of Q.
2. For each data item Q if transaction Ti executes read(Q) in schedule S, and that
value was produced by transaction Tj (if any), then transaction Ti must in schedule S´
also read the value of Q that was produced by transaction Tj .
3. For each data item Q, the transaction (if any) that performs the final write(Q)
operation in schedule S must perform the final write(Q) operation in schedule S´.
As can be seen, view equivalence is also based purely on reads and writes alone.

A schedule S is view serializable it is view equivalent to a serial schedule.


Every conflict serializable schedule is also view serializable.
Schedule 9 (from text) — a schedule which is view-serializable but not conflict
serializable.

Every view serializable schedule that is not conflict serializable has blind writes.

Schedule 8 (from text) given below produces same outcome as the serial schedule <
T1, T5 >, yet is not conflict equivalent or view equivalent to it.

199

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Determining such equivalence requires analysis of operations other than read and write

Recoverability

Need to address the effect of transaction failures on concurrently running transactions.

Recoverable schedule — if a transaction Tj reads a data items previously written


by a transaction Ti , the commit operation of Ti appears before the commit operation
of Tj.
The following schedule (Schedule 11) is not recoverable if T9 commits immediately
after the read

If T8 should abort, T9 would have read (and possibly shown to the user) an inconsistent
database state. Hence database must ensure that schedules are recoverable.
Recoverable schedules are desirable because failure of a transaction might otherwise
bring the system into an irreversibly inconsistent state. Non recoverable schedules may
sometimes be needed when updates must be made visible early due to time constraints,
even if they have not yet been committed, which may be required for very long duration
transactions.

Cascading rollback – a single transaction failure leads to a series of transaction


rollbacks. Consider the following schedule where none of the transactions has yet
committed (so the schedule is recoverable)

200

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

If T10 fails, T11 and T12 must also be rolled back. Can lead to the undoing of a
significant amount of work

Cascadeless schedules — cascading rollbacks cannot occur; for each pair of


transactions Ti and Tj such that Tj reads a data item previously written by Ti, the
commit operation of Ti appears before the read operation of Tj.
Every cascadeless schedule is also recoverable
It is desirable to restrict the schedules to those where cascading rollback cannot occur
Such a schedule are called cascadeless.
Cascadeless schedule are desirable because the failure of a transaction does not lead
to the aborting of any other transaction .
If failure occurs rarely , so that we can pay the price of cascading aborts for the
increased concurrency, noncascadeless might be desirable.

201

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Strict
A schedule is strict if for any two transactions T1, T2, if a write operation of T1 precedes
a conflicting operation of T2 (either read or write), then the commit event of T1 also
precedes that conflicting operation of T2.

Any strict schedule is cascadeless, but not the converse.

Implementation of Isolation
Schedules must be conflict or view serializable, and recoverable, for the sake of
database consistency, and preferably cascadeless.
A policy in which only one transaction can execute at a time generates serial
schedules, but provides a poor degree of concurrency..
Concurrency-control schemes tradeoff between the amount of concurrency they allow
and the amount of overhead that they incur.
Some schemes allow only conflict-serializable schedules to be generated, while
others allow view-serializable schedules that are not conflict-serializable

Transaction Definition in SQL


Data manipulation language must include a construct for specifying the set of actions
that comprise a transaction.
In SQL, a transaction begins implicitly.
A transaction in SQL ends by:
o Commit work commits current transaction and begins a new one.
o Rollback work causes current transaction to abort.
Levels of consistency specified by SQL-92:
o Serializable — default
o Repeatable read
o Read committed
o Read uncommitted
Levels of Consistency in SQL-92
Serializable — default
Repeatable read — only committed records to be read, repeated reads of same
record must return same value. However, a transaction may not be serializable – it
may find some records inserted by a transaction but not find others.
Read committed — only committed records can be read, but successive reads of
record may return different (but committed) values.
Read uncommitted — even uncommitted records may be read.

202

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Serial Schedule

T1 T2
R(X)
X=X-N
W(X)
R(Y)
Y=Y+N
W(Y)
R(X)
X=X+M
W(X)

T1 T2
R(X)
X=X+M
W(X)
R(X)
X=X-N
W(X)
R(Y)
Y=Y+N
W(Y)
Schedule A Schedule B

T1 T2
R(X)
X=X-N
R(X)
X=X+M
W(X)
R(Y)
W(X)
Y=Y+N
W(Y)

T1 T2
R(X)
X=X-N
W(X)
R(X)
X=X+M
W(X)

203

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

R(Y)
Y=Y+N
W(Y)
Schedule C Schedule D
Example
(a) Serial Schedule A : T1 followed by T2
(b) Serial Schedule B: T2 followed by T1
(c) Schedule C : Non serial Schedule
(d) Schedule D : Non serial Schedule

Testing for Conflict Serializability of a Schedule


1. For each transaction Ti participating in schedule S, Create a node labeled Ti in the
precedence graph
2. For each case in S where Tj executes a read(X) after Ti executes a Write(X), Create
an edge (TiTj) in the precedence graph.
3. For each case in S where Tj executes a write(X) after Ti executes a read(X), Create
an edge (TiTj) in the precedence graph.
4. For each case in S where Tj executes a write(X) after Ti executes a write(X), Create
an edge (TiTj) in the precedence graph.
5. The Schedule S is serilizable if and only if the precedence graph has no cycles.
T1 T2
R(X)
X=X-N
W(X)
R(Y)
Y=Y+N
W(Y)
R(X)
X=X+M
W(X)

T1 T2

Precedence graph for Schedule A

204

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

W(X)
R(Y)
Y=Y+N
W(Y)

T1 T2

T1 T2
R(X)
X=X+M
W(X)
R(X)
X=X-N
Precedence graph for Schedule B

T1 T2
R(X)
X=X-N
R(X)
X=X+M
W(X)
R(Y)
W(X)
Y=Y+N
W(Y)

T1 T2

205

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Precedence graph for Schedule C (Not Serial)Cycle exist from T1T2 and T2T1

T1 T2 T3
Read(Z)
Read(Y)
Write(Y)
Read(Y)
Read(Z)
Read(X)
Write(X)
Write(Y)
Write(Z)
Read(X)
Read(Y)
Write(Y)

Schedule E

T1 T2

T3

Cycle X( T1T2), Y (T2T1)


Cycle X(T1T2),Yz(t2T3),Y(T3T1)
Not Serial Schedule

206

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Lock Based Protocols

One way to ensure Searilizability is to require that data items be accessed in a


mutually exclusive manner; that is , while one transaction is accessing a data item, no
other can modify that data item.
A lock is a mechanism to control concurrent access to a data item
Data items can be locked in two modes :
 exclusive (X) mode. Data item can be both read as well as
written. X-lock is requested using lock-X instruction.
 shared (S) mode. Data item can only be read. S-lock is
requested using lock-S instruction.
Lock requests are made to concurrency-control manager. Transaction can proceed
only after request is granted.

Lock-Compatible Matrix

S X
S True False
X False False

 A transaction may be granted a lock on an item if the requested lock is compatible


with locks already held on the item by other transactions
 Any number of transactions can hold shared locks on an item, but if any
transaction holds an exclusive on the item no other transaction may hold any lock
on the item.
 If a lock cannot be granted, the requesting transaction is made to wait till all
incompatible locks held by other transactions have been released. The lock is
then granted.

Example of a transaction performing locking:

T1: lock-X(B);
Read(B);
B=B-50;
Write(B)
Unlock(B);
Lock-X(A)
Read(A)
A=A+50
Write(A)
Unlock(A)

Figure 16.2 Transaction T1

207

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

T2: lock_S(A)
Read(A)
Unlock(A)
Lock-S(B)
Read(B)
Unlock(B)
Display(A+B)
Figure 16.3 Transaction T2

Locking as above is not sufficient to guarantee serializability — if A and B get


updated in-between the read of A and B, the displayed sum would be wrong.
A locking protocol is a set of rules followed by all transactions while requesting and
releasing locks. Locking protocols restrict the set of possible schedules.

T1 T2 Concurrency-Control Manager
Lock-X(B)
Grant-X(B,T1)
Read(B)
B=B-50
Write(B)
Unlock(B)
Lock-S(A)
Grant-S(A,T2)
Read(A)
Unlock(A)
Lock-S(B)
Grant-S(B,T2)
Read(B)
Unlock(B)
Display(A+B)
Lock-X(A)
Grant-X(A,T2)
Read(A)
A=A+50
Write(A)
Unlock(A)

Figure 16.4 Schedule 1

208

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

T3: lock-X(B);
Read(B);
B=B-50;
Write(B)
Lock-X(A)
Read(A)
A=A+50
Write(A)
Unlock(B)
Unlock(A)

Figure 16.5 Transaction T3

T4: lock_S(A)
Read(A)
Lock-S(B)
Read(B)
Display(A+B)
Unlock(A)
Unlock(B)
Figure 16.6 Transaction T4

Pitfall of Lock Based Protocol

T3 T4
Lock-X(B)
Read(B)
B=B-50
Write(B)
Lock-S(A)
Read(A)
Lock-S(B)
Lock-X(A)
Figure 16.7 Schedule 2 (Dead Lock)

Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to wait for T3 to
release its lock on B, while executing lock-X(A) causes T3 to wait for T4 to release its
lock on A.
Such a situation is called a Deadlock
o To handle a deadlock one of T3 or T4 must be rolled back

The potential for deadlock exists in most locking protocols. Deadlocks are a
necessary evil.

209

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Starvation is also possible if concurrency control manager is badly designed. For


example:
o A transaction may be waiting for an X-lock on an item, while a sequence
of other transactions request and are granted an S-lock on the same item.
o The same transaction is repeatedly rolled back due to deadlocks.
Concurrency control manager can be designed to prevent starvation.

The Two-Phase Locking Protocol

This is a protocol which ensures conflict-serializable schedules.


Phase 1: Growing Phase
o transaction may obtain locks
o transaction may not release locks
Phase 2: Shrinking Phase
o transaction may release locks
o transaction may not obtain locks
The protocol assures serializability. It can be proved that the transactions can be
serialized in the order of their lock points (i.e. the point where a transaction acquired
its final lock).
T1 and T2 are not two phase.
T3 and T4 are in two phase

T5 T6 T7
Lock-X(A)
Read(A)
Lock-S(B)
Read(B)
Write(A)
Unlock(A)
Lock-X(A)
Read(A)
Write(A)
Unlock(A)
Lock-S(A)
Read(A)

Figure 16.8 Partial Schedule under two phase locking

Two-phase locking does not ensure freedom from deadlocks


Cascading roll-back is possible under two-phase locking. To avoid this, follow a
modified protocol called strict two-phase locking. Here a transaction must hold all
its exclusive locks till it commits/aborts.

210

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Rigorous two-phase locking is even stricter: here all locks are held till
commit/abort. In this protocol transactions can be serialized in the order in which they
commit.

There can be conflict serializable schedules that cannot be obtained if two-phase


locking is used.
However, in the absence of extra information (e.g., ordering of access to data), two-
phase locking is needed for conflict serializability in the following sense:
Given a transaction Ti that does not follow two-phase locking, we can find a
transaction Tj that uses two-phase locking, and a schedule for Ti and Tj that is not
conflict serializable.
Lock Conversions
Two-phase locking with lock conversions:
– First Phase:
o can acquire a lock-S on item
o can acquire a lock-X on item
o can convert a lock-S to a lock-X (upgrade)
– Second Phase:
o can release a lock-S
o can release a lock-X
o can convert a lock-X to a lock-S (downgrade)
This protocol assures serializability. But still relies on the programmer to insert the
various locking instructions.

T8: Read(a1)
Read(a2)
Read(a3)
….
….
Read(an)
Write(a1)
T9: Read(a1)
Read(a2);
Display(a1+a2)
T8 T9
Lock-S(a1)
Lock-S(a1)
Lock-S(a2)
Lock-S(a2)
Lock-S(a3)
Lock-S(a4)
Unlock(a1)
Unlock(a2)
Lock-S(an)
Upgrade(a1)
Figure 16.9 Incomplete Schedule with a lock conversion

211

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Implementation of Locking

A Lock manager can be implemented as a separate process to which transactions


send lock and unlock requests
The lock manager replies to a lock request by sending a lock grant messages (or a
message asking the transaction to roll back, in case of a deadlock)
The requesting transaction waits until its request is answered
The lock manager maintains a datastructure called a lock table to record granted
locks and pending requests
The lock table is usually implemented as an in-memory hash table indexed on the
name of the data item being locked

LockTabl
e
Blackrect
anglesindicat
egrantedlocks,whit
eonesi ndi
catewaiti
ngrequest
s
Locktablealsorecordsthetypeoflockgrantedorrequested
Newr equestisaddedt otheendoft hequeueofreques t
sforthedatait
em,andgrantedi
fi
tis
Unlockrequestsresulti
ntherequestbeingdelet
ed,andl at
errequestsar
echeckedtoseeift
Ift
ransact
ionaborts,allwai
ti
ngorgr ant
edrequestsofthetransact
ionaredel
eted
l
ockmanagermayk
eepal
i
stofl
ock
shel
dbyeacht
ransact
i
on,t
oimpl
ementt
hiseffici
ent
l
y

212

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Graph-Based Protocols(Tree Protocols)

Graph-based protocols are an alternative to two-phase locking


Impose a partial ordering  on the set D = {d1, d2 ,..., dh} of all data items.
o If di  dj then any transaction accessing both di and dj must access di
before accessing dj.
o Implies that the set D may now be viewed as a directed acyclic graph,
called a database graph.
The tree-protocol is a simple kind of graph protocol.

Tr
eePr
otocol

Only exclusive locks are allowed.


The first lock by Ti may be on any data item. Subsequently, a data Q can be locked by Ti only
Data items may be unlocked at any time.

The tree protocol ensures conflict serializability as well as freedom from deadlock.
Unlocking may occur earlier in the tree-locking protocol than in the two-phase
locking protocol.
o shorter waiting times, and increase in concurrency
o protocol is deadlock-free, no rollbacks are required
o the abort of a transaction can still lead to cascading rollbacks.
(this correction has to be made in the book also.)
However, in the tree-locking protocol, a transaction may have to lock data items that
it does not access.
o increased locking overhead, and additional waiting time
o potential decrease in concurrency
Schedules not possible under two-phase locking are possible under tree protocol, and
vice versa.

213

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

T10 T11 T12 T13


Lock-X(B)
Lock-X(D)
Lock-X(H)
Unlock(D)
Lock-X(E)
Lock-X(D)
Unlock(B)
Unlock(E)
Lock-X(B)
Lock-X(E)
Unlock(H)
Lock-X(G)
Unlock(D)
Lock-X(D)
Lock-X(H)
Unlock(D)
Unlock(H)
Unlock(E)
Unlock(B)
Unlock(G)

Figure 16.12 Serializable Schedule under the tree Protocol.

Timestamp Based Protocols

Each transaction is issued a timestamp when it enters the system. If an old transaction
Ti has time-stamp TS(Ti), a new transaction Tj is assigned time-stamp TS(Tj) such
that TS(Ti) <TS(Tj).
The protocol manages concurrent execution such that the time-stamps determine the
serializability order.

In order to assure such behavior, the protocol maintains for each data Q two
timestamp values:
o W-timestamp(Q) is the largest time-stamp of any transaction that
executed write(Q) successfully.
o R-timestamp(Q) is the largest time-stamp of any transaction that executed
read(Q) successfully.

The timestamp ordering protocol ensures that any conflicting read and write
operations are executed in timestamp order.
1. Suppose a transaction Ti issues a read(Q)

214

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

a. If TS(Ti) < W-timestamp(Q), then Ti needs to read a value of Q


that was already overwritten. Hence, the read operation is
rejected, and Ti is rolled back.
b. If TS(Ti) W-timestamp(Q), then the read operation is executed, and R-
timestamp(Q) is set to the maximum of R- timestamp(Q) and TS(Ti).

2. Suppose that transaction Ti issues write(Q).

a. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed
previously, and the system assumed that that value would never be produced. Hence,
the write operation is rejected, and Ti is rolled back.

b. If S(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q.


Hence, this write operation is rejected, and Ti is rolled back.

c. Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti).

T14 : Read(B)
Read(A)
Display(A+B)

T15: Read(B)
B=B-50
Write(B)
Read(A)
A=A+50
Write(A)
Display(A+B)

T14 T15
Read(B)
Read(B)
B=B-50
Write(B)
Read(A)
Read(A)
Display(A+B)
A=A+50
Write(A)
Display(A+B)
Figure 16.13 Schedule 3
In Schedule 3 of figure 16.13, TS(T14) <TS(T15)

Correctness of Timestamp-Ordering Protocol

215

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

The timestamp-ordering protocol guarantees serializability


Timestamp protocol ensures freedom from deadlock as no transaction ever waits.
But the schedule may not be cascade-free, and may not even be recoverable.

Recoverability and Cascade Freedom

Problem with timestamp-ordering protocol:


o Suppose Ti aborts, but Tj has read a data item written by Ti
o Then Tj must abort; if Tj had been allowed to commit earlier, the schedule
is not recoverable.
o Further, any transaction that has read a data item written by Tj must abort
o This can lead to cascading rollback --- that is, a chain of rollbacks
Solution:
o A transaction is structured such that its writes are all performed at the end
of its processing
o All writes of a transaction form an atomic action; no transaction may
execute while a transaction is being written
o A transaction that aborts is restarted with a new timestamp

Example

Consider the following schedule consisting of three transactions T1, T2, and T3:
S: r3(Z), w3(Z), r1(X), r2(Y), w2(Y), w1(X), r1(Y), r3(X), c1, c2, c3
Assume that transactions have timestamp values:
 TS(T1)= 2 TS(T2)= 1 TS(T3)= 3

Question:
1. Rewrite S using Timestamp ordering algorithm
2. Rewrite S using Timestamp ordering algorithm with the following timestamps
 TS(T1)=3, TS(T2)=2, TS(T3)=1
 Show the point where some transaction gets aborted
3. Rewrite 2) with multiversion Timestamp ordering algorithm
 The initial values of the items are X0,Y0,Z0

216

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

1. Write S using Timestamp ordering algorithm (equivalent serial schedule T2-T1-T3)

T1 - TS=2 T2 - TS=1 T3 - TS=3


Read(Z)
ReadTS(Z)=3
Write(Z)
WriteTS(Z)=3
Read(X)
ReadTS(X)=2
Read(Y)
ReadTS(Y)=1
Write(Y)
WriteTS(Y)=1
Write(X)
WriteTS(X)=2
Read(Y)
ReadTS(Y)=2
Read(X)
ReadTS(X)=3

2.Rewrite S with timestamps 3,2,1 for T1, T2 and T3.

T1 - TS=3 T2 - TS=2 T3 - TS=1


Read(Z)
ReadTS(Z)=1
Write(Z)
WriteTS(Z)=1
Read(X)
ReadTS(X)=3
Read(Y)
ReadTS(Y)=2
Write(Y)
WriteTS(Y)=2
Write(X)
WriteTS(X)=3
Read(Y)
ReadTS(Y)=3
Read(X)
tries to read a value from a
later transaction - failure

Validation-Based Protocol

217

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Execution of transaction Ti is done in three phases.


1. Read and execution phase: Transaction Ti writes only to
temporary local variables
2. Validation phase: Transaction Ti performs a ``validation test''
to determine if local variables can be written without violating
serializability.
3. Write phase: If Ti is validated, the updates are applied to the
database; otherwise, Ti is rolled back.
The three phases of concurrently executing transactions can be interleaved, but each
transaction must go through the three phases in that order.
Also called as optimistic concurrency control since transaction executes fully in the
hope that all will go well during validation
Each transaction Ti has 3 timestamps
 Start(Ti) : the time when Ti started its execution
 Validation(Ti): the time when Ti entered its validation phase
 Finish(Ti) : the time when Ti finished its write phase

Serializability order is determined by timestamp given at validation time, to increase


concurrency. Thus TS(Ti) is given the value of Validation(Ti).
This protocol is useful and gives greater degree of concurrency if probability of
conflicts is low. That is because the serializability order is not pre-decided and
relatively less transactions will have to be rolled back.

Validation Test for Transaction


If for all Ti with TS (Ti) < TS (Tj) either one of the following condition holds:
o finish(Ti) < start(Tj)
o start(Tj) < finish(Ti) < validation(Tj) and the set of data items written by
Ti does not intersect with the set of data items read by Tj.
then validation succeeds and Tj can be committed. Otherwise, validation fails and Tj is
aborted.
Justification: Either first condition is satisfied, and there is no overlapped execution, or
second condition is satisfied and
1. the writes of Tj do not affect reads of Ti since they occur after Ti
has finished its reads.
2. the writes of Ti do not affect reads of Tj since Tj does not read
any item written by Ti.

T14 T15

218

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Read(B)
Read(B)
B=B-50
Read(A)
A=A+50
Read(A)
<Validate>
Display(A+B)
<Validate>
Write(B)
Write(A)

Figure 16.15 Schedule 5, a Schedule produced by using validation

Multiple Granularity
If a transaction Ti needs to access the entire database and a locking protocol is used ,
then Ti must lock each item in the database . It is time consuming. On the other hand ,
if transaction Tj needs to access only a few data items, it should not be required to
lock the entire database , since otherwise concurrency is lost.

Solution

Multiple Granularity

 Allow data items to be of various sizes and define a hierarchy of data


granularities, where the small granularities are nested within larger ones
 Can be represented graphically as a tree (but don't confuse with tree-locking
protocol)
 When a transaction locks a node in the tree explicitly, it implicitly locks all the
node's descendents in the same mode.
 Granularity of locking (level in tree where locking is done):
o fine granularity (lower in tree): high concurrency, high locking overhead
o coarse granularity (higher in tree): low locking overhead, low
concurrency

219

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Figure 16.16
The highest level in the example hierarchy is the entire database.
The levels below are of type area, file and record in that order.
Intention Lock Modes
 In addition to S and X lock modes, there are three additional lock modes with
multiple granularity:
o intention-shared (IS): indicates explicit locking at a lower level of the tree
but only with shared locks.
o intention-exclusive (IX): indicates explicit locking at a lower level with
exclusive or shared locks
o shared and intention-exclusive (SIX): the subtree rooted by that node is
locked explicitly in shared mode and explicit locking is being done at a
lower level with exclusive-mode locks

 intention locks allow a higher level node to be locked in S or X mode without


having to check all descendent nodes.

Suppose the transaction Tj wishes to lock record rb6 of file fb. Since Ti has locked Fb
explicitly , it follows that rb6 is also locked(implicitly). But, When Tj issues a lock
request for rb6 , rb6 is not explicitly locked! How does the system determine whether Tj
can lock rb6? Tj must traverse the tree from the root to record rb6.

220

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

The compatibility matrix for all lock modes is:

IS IX S SIX X
IS True True True True False
IX True True False False False
S True False True False False
SIX True False False False False
X False False False False False
Multiple Granularity Locking Scheme
Transaction Ti can lock a node Q, using the following rules:
1. The lock compatibility matrix must be observed.
2. The root of the tree must be locked first, and may be locked in
any mode.
3. A node Q can be locked by Ti in S or IS mode only if the parent
of Q is currently locked by Ti in either IX or IS
mode.
4. A node Q can be locked by Ti in X, SIX, or IX mode only if the
parent of Q is currently locked by Ti in either IX
or SIX mode.
5. Ti can lock a node only if it has not previously unlocked any node
(that is, Ti is two-phase).
6. Ti can unlock a node Q only if none of the children of Q are
currently locked by Ti.
Observe that locks are acquired in root-to-leaf order,
whereas they are released in leaf-to-root order.

Consider the tree of figure 16.16


 Suppose that trancsaction T18 reads record ra2 in file Fa. Then, T18 needs to
lock the database , area A1, and Fa in IS mode and finally to lock ra2 in S
mode
 Suppose that trancsaction T19 modifies record ra9 in file Fa. Then, T19 needs
to lock the database , area A1, and Fa in IX mode and finally to lock ra9 in X
mode
 Suppose that transaction T20 reads all the records in file Fa. Then, T20 needs
to lock the database , area A1 in IS mode and finally to lock FA in S mode
 Suppose that transaction T21 reads the entire database. It can do after locking
the database in S mode.

Multiversion Timestamp Ordering

221

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Each data item Q has a sequence of versions <Q1, Q2,...., Qm>. Each version Qk contains
three data fields:
o Content -- the value of version Qk.
o W-timestamp(Qk) -- timestamp of the transaction that created (wrote)
version Qk
o R-timestamp(Qk) -- largest timestamp of a transaction that successfully
read version Qk

when a transaction Ti creates a new version Qk of Q, Qk's W-timestamp and R-timestamp


are initialized to TS(Ti).
R-timestamp of Qk is updated whenever a transaction Tj reads Qk, and TS(Tj) > R-
timestamp(Qk).

The multiversion timestamp scheme presented next ensures serializability.


Suppose that transaction Ti issues a read(Q) or write(Q) operation. Let Qk denote
the version of Q whose write timestamp is the largest write timestamp less than or
equal to TS(Ti).

1. If transaction Ti issues a read(Q), then the value returned is the


content of version Qk.

2. If transaction Ti issues a write(Q), and if TS(Ti) < R-


timestamp(Qk), then transaction Ti is rolled
back. Otherwise, if TS(Ti) = W-timestamp(Qk), the contents of Qk
are overwritten, otherwise a new version of Q is created.
Reads always succeed; a write by Ti is rejected if some other transaction Tj that (in
the serialization order defined by the timestamp values) should read Ti's write, has
already read a version created by a transaction older than Ti.

222

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Multiversion Timestamp Ordering

T1 - TS=3 T2 - TS=2 T3 - TS=1


Read(Z0)
ReadTS(Z0)=1,
WriteTS(Z0)=0
Write(Z1)
ReadTS(Z1)=1,
WriteTS(Z1)=1
Read(X0)
ReadTS(X0)=3,
WriteTS(X0)=0
Read(Y0)
ReadTS(Y0)=2,
WriteTS(Y0)=0
Write(Y1)
ReadTS(Y1)=2,
WriteTS(Y1)=2
Write(X1)
ReadTS(X1)=3,
WriteTS(X1)=3
Read(Y1)
ReadTS(Y1)=3,
WriteTS(Y1)=2
Read(X0)
ReadTS(X0)=1,WriteTS(
X0)=0
Reads never fail –
however, if T3 tries to
write(X) there will be a
failure because
Read(X0)=3

Deadlock Handling

223

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

T3 T4
Lock-X(B)
Read(B)
B=B-50
Write(B)
Lock-S(A)
Read(A)
Lock-S(B)
Lock-X(A)
Figure 16.7 Schedule 2

System is deadlocked if there is a set of transactions such that every transaction in the
set is waiting for another transaction in the set.
Deadlock prevention protocols ensure that the system will never enter into a deadlock
state. Some prevention strategies :
o Require that each transaction locks all its data items before it begins
execution (predeclaration).
o Impose partial ordering of all data items and require that a transaction can
lock data items only in the order specified by the partial order (graph-
based protocol).

Following schemes use transaction timestamps for the sake of deadlock prevention
alone.
wait-die scheme — non-preemptive
o older transaction may wait for younger one to release data item. Younger
transactions never wait for older ones; they are rolled back instead.
o a transaction may die several times before acquiring needed data item
wound-wait scheme — preemptive
o older transaction wounds (forces rollback) of younger transaction instead
of waiting for it. Younger transactions may wait for older ones.
o may be fewer rollbacks than wait-die scheme.

Both in wait-die and in wound-wait schemes, a rolled back transactions is restarted


with its original timestamp. Older transactions thus have precedence over newer ones,
and starvation is hence avoided.
Timeout-Based Schemes :
o a transaction waits for a lock only for a specified amount of time. After
that, the wait times out and the transaction is rolled back.
o thus deadlocks are not possible
o simple to implement; but starvation is possible. Also difficult to determine
good value of the timeout interval.
Deadlock Detection

224

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Deadlocks can be described as a wait-for graph, which consists of a pair G = (V,E),


o V is a set of vertices (all the transactions in the system)
o E is a set of edges; each element is an ordered pair Ti Tj.
If Ti  Tj is in E, then there is a directed edge from Ti to Tj, implying that Ti is
waiting for Tj to release a data item.
When Ti requests a data item currently being held by Tj, then the edge Ti Tj is
inserted in the wait-for graph. This edge is removed only when Tj is no longer
holding a data item needed by Ti.
The system is in a deadlock state if and only if the wait-for graph has a cycle. Must
invoke a deadlock-detection algorithm periodically to look for cycles.

Wait-for graph without a cycle Wait-for graph with a cycle

Deadlock Recovery
When deadlock is detected :
o Some transaction will have to rolled back (made a victim) to break
deadlock. Select that transaction as victim that will incur minimum cost.
o Rollback -- determine how far to roll back transaction
 Total rollback: Abort the transaction and then restart it.
 More effective to roll back transaction only as far as necessary to
break deadlock.
o Starvation happens if same transaction is always chosen as victim. Include
the number of rollbacks in the cost factor to avoid starvation
Dirty read

225

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

A transaction T1 may read the update of a transaction T2, which


has not yet commiteed. If T2 fails and is aborted , then T1 would
have read a value that does not exists and is incorrect
non repeatable read
A transaction T1 may read a given value from a table. If another
transaction T2 latter updates that value and T1 reads that value
again, T1 will see a different value
Phantoms
A transaction T1 may read a set of rows from a table , based on some
condition. Now suppose that a transaction T2 insert a new row
that also satisfy condition used in T1, into the table used by T1.
If T1 is repeated then T1 will see a phantom, a row that
previously did not exist.

(b)What do you meant by commit , Rollback transaction?


A transaction in SQL ends by:
a. Commit work commits current transaction and begins a
new one.
b. Rollback work causes current transaction to abort.
Diagram
T1: lock-X(B);
Read(B);
B=B-50;
Write(B)
Unlock(B);
Lock-X(A)
Read(A)
A=A+50
Write(A)
Unlock(A)

Figure 16.2 Transaction T1


T2: lock_S(A)
Read(A)
Unlock(A)
Lock-S(B)
Read(B)
Unlock(B)
Display(A+B)
Figure 16.3 Transaction T2
T1 T2 Concurrency-Control Manager
Lock-X(B)
Grant-X(B,T1)
Read(B)

226

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

B=B-50
Write(B)
Unlock(B)
Lock-S(A)
Grant-S(A,T2)
Read(A)
Unlock(A)
Lock-S(B)
Grant-S(B,T2)
Read(B)
Unlock(B)
Display(A+B)
Lock-X(A)
Grant-X(A,T2)
Read(A)
A=A+50
Write(A)
Unlock(A)
Figure 16.4 Schedule 1

T3: lock-X(B);
Read(B);
B=B-50;
Write(B)
Lock-X(A)
Read(A)
A=A+50
Write(A)
Unlock(B)
Unlock(A)

Figure 16.5 Transaction T3

T4: lock_S(A)
Read(A)
Lock-S(B)
Read(B)
Display(A+B)
Unlock(A)
Unlock(B)
Figure 16.6 Transaction T4

T3 T4

227

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Lock-X(B)
Read(B)
B=B-50
Write(B)
Lock-S(A)
Read(A)
Lock-S(B)
Lock-X(A)
Figure 16.7 Schedule 2

T5 T6 T7
Lock-X(A)
Read(A)
Lock-S(B)
Read(B)
Write(A)
Unlock(A)
Lock-X(A)
Read(A)
Write(A)
Unlock(A)
Lock-S(A)
Read(A)
Figure 16.8 Partial Schedule under two phase locking

T8: Read(a1)
Read(a2)
Read(a3)
….
….
Read(an)
Write(a1)

T9: Read(a1)
Read(a2);
Display(a1+a2)

228

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

T8 T9
Lock-S(a1)
Lock-S(a1)
Lock-S(a2)
Lock-S(a2)
Lock-S(a3)
Lock-S(a4)
Unlock(a1)
Unlock(a2)
Lock-S(an)
Upgrade(a1)

Figure 16.9 Incomplete Schedule with a lock conversion

T10 T11 T12 T13


Lock-X(B)
Lock-X(D)
Lock-X(H)
Unlock(D)
Lock-X(E)
Lock-X(D)
Unlock(B)
Unlock(E)
Lock-X(B)
Lock-X(E)
Unlock(H)
Lock-X(G)
Unlock(D)
Lock-X(D)
Lock-X(H)
Unlock(D)
Unlock(H)
Unlock(E)
Unlock(B)
Unlock(G)
Figure 16.12 Serializable Schedule under the tree Protocol.

T14 : Read(B)
Read(A)
Display(A+B)

229

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

T15: Read(B)
B=B-50
Write(B)
Read(A)
A=A+50
Write(A)
Display(A+B)

T14 T15
Read(B)
Read(B)
B=B-50
Write(B)
Read(A)
Read(A)
Display(A+B)
A=A+50
Write(A)
Display(A+B)

Figure 16.13 Schedule 3

T16 T17
Read(Q)
Write(Q)
Write(Q)
Figure 16.14 Schedule 4

T14 T15
Read(B)
Read(B)
B=B-50
Read(A)
A=A+50
Read(A)
<Validate>
Display(A+B)

230

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

<Validate>
Write(B)
Write(A)

Figure 16.15 Schedule 5, a Schedule produced by using validation

Q Consider the following two transactions:

T31:
Read(A)
Read(B)
If A=0 then B=B+1
Write(B)

T32:
Read(B)
Read(A)
If B=0 then A=A+1
Write(A)
Add lock and unlock instructions to transactions T31 and T32, so that they observe
the two phase locking protocol. Can the execution of these transactions result in a
deadlock?
Solution
T31:
Lock-S(A)
Read(A)
Lock-X(B)
Read(B)
If A=0 then B=B+1
Write(B)
Unlock(A)
Unlock(B)
T32:
Lock-S(B)
Read(B)
Lock-X(A)
Read(A)
If B=0 then A=A+1
Write(A)
Unlock(B)
Unlock(A)

Dead Lock

231

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Lock-S(A)
Lock-S(B)
Read(B)
Read(A)
Lock-X(B)
Lock-X(A)

Protecting against Crashes


In practice, several things might happen to prevent a transaction from completing.
1. The System could fail from a variety of hardware or software causes. In this case,
all active transaction are prevented from completing and it is even possible that
some completed transaction must be cancelled because they read values written
by transaction that have not yet completed
2. A single transaction could be forced to stop before completion for a variety of
reasons. If deadlock detection is done by the system, the transaction could be
found to contribute to a deadlock and be selected for cancellation by the system. A
bug in the transaction e.g. a division by zero.
3. In concurrency control system, where transactions are allowed to run without
locking items and violations of the serializability condition are detected and the
offending transaction cancelled.
Backup copies
Data in the machine’s cannot be presumed to survive a power outage, for example
Magnetic devices such as tapes,disks
It is essential that backup copies of the database be made periodically, at least once a day
if possible
The Journal
Journal or log entries consist of
o unique identifier for the transaction causing the change
o The old value of item
o The new value of item
The need for old and new values will become evident when we consider that it may not
only be necessary to redo transaction , but to undo them
Committed Transaction
When dealing with transaction that may have to de redone or undone, it helps to think in
terms of committed and uncommitted transaction .
We define the two-phase commit policy as follows
1. a transaction cannot write into the database until it has committed
2. a transaction cannot commit until it has recorded all its changes to items in the
journal

232

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

References

1. Korth , H . F . et . al, Database System Concepts, ( McGraw – Hill: New Delhi)


2. http://codex.cs.yale.edu/avi/db-book/
3. Elmasri / Navathe , Fundamentals of Database Systems, ( Pearson: New Delhi)
4. Date, CJ, Fundamentals of Database Systems

233

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

BIRLA INSTITUTE OF TECHNOLOGY

QUESTION BANK

Cycle Test 1 ( Question no 1 to Question no 20)

Q1 What are the main differences between a file-processing system and a database
management system?
Q2 What do you mean by data independence? Explain the differences between
physical and logical data independences.
Q3 What are the advantages and disadvantages of using a DBMS.
Q4 Briefly explain the different views of database.
Q5 Explain three-level schema architecture?
Q6 Define database management system, data model, schema, DDL, DML.
Q7 Explain the different DBMS languages.
Q8 Explain the component modules of a DBMS.
Q9 Explain different types of data models.
Q10 Explain the main phases of database design.
Q11 What are the main functions of a database administrator(DBA)?
Q12 List the responsibilities of a database manager?
Q13 Briefly explain the different types of users of database management system.
Q14 Explain the following terms:
a. Composite and Simple attributes
b. Single values and multi-valued attributes
c. Derived attributes
Q15 Explain the following terms:
a. Primary key, candidate key and Super key
b. instance
c. Null value
Q16 Explain different symbols of E-R diagram
Q17 Explain the following
a. mapping constraint
b. Entity , attribute and relationship set
Q18 Explain the difference between weak and strong entity with the help of diagram.

Q19 Explain the concept of generalization and specialization with the help of an
example.

Q20 Explain concept of aggregation with the help of an example

234

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Cycle Test II ( Question no 21 to Question no 40)

Q21 How database design can be performed from E-R diagram containing many to
many relationship, week strong relationship, generalization and specialization?
Q22 Explain Design Constraints on a Specialization and Generalization
Q23 Construct an ER diagram for a car insurance company with a set of customers
each of which owns a number of cars. Each car has a number of recorded
accidents associated with it.
Q24 Construct an ER diagram for a hospital with a set of patients and a set of medical
doctors. With each patient a log of various conducted tests is also associated
Q25 Explain what is meant by repetition of information and inability to represent
information. Explain why each of these properties may indicate a bad relational
database design.
Q26 Define relations.What are five basic relational algebraic operators?

Q27 What is relational algebra? Explain different type of joins operation with
Examples
Q28 How database design can be performed from E-R diagram containing many to
many relationship, week strong relationship, generalization and specialization?
Q29 Explain the UNION, SET DIFFERENCE and SET INTERSECTION operators
with suitable example.
Q30 Consider the following relations and write the relational-algebra expression
equivalent to the following queries:
BRANCH (Branch_Name, Branch_City, Assets)
CUSTOMER (Customer_Name, Customer_Street, Customer_City)
ACCOUNT (Account_No., Branch_Name, Balance)
LOAN (Loan_No., Branch_Name, Amount)
BORROWER (Customer_Name, Loan_No.)
DEPOSITOR (Customer_Name, Account_No.)
a) Find the names of all customers, who live in “MUSCAT”.
b) Find the names of all customers, who have a loan at the “MUSCAT” branch.
c) Find the names of all customers along with their loan numbers, who have a
loan at the bank.
d) Find the names of all customers, who have either a loan or an account at the
bank.
e) Find the names of all branches with customers who have an account in the
bank and who live in “MUSCAT”.

Q31. Suppose that we decompose the scheme R=(A,B,C,D,E) into


(A,B,C)
(A,D,E)

235

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Show that this decomposition is a lossless-join decomposition if the following set


F of functional dependencies holds:
A →BC
CD →E
B →D
E →A

Q32 List all functional dependencies satisfied by the relation of figure

A B C
a1 b1 c1
a1 b1 c2
a2 b1 c1
a2 b1 c2
Q33 Compute the closure of the following set F of functional dependencies for relation
schema R = (A,B,C,D,E)
A →BC
CD →E
B →D
E →A
List the candidate keys for R.

Q34 How do you find Closure of attribute sets? Explain with the help of example
Q35 Explain Functional Dependencies with the help of example.
Q36 Explain canonical cover with the help of example.
Q37 Explain the concept of Normalization briefly with the help of Example.
Q38. Discuss Armstrong’s Axiom’s. Give examples for each rule
Q39 Explain Codd’s Rules.
Q40. Show that every BCNF schema is in 3NF but 3NF is not in BCNF.

Q41. Explain the following


a) Levels of Data Security
b) Authorization and View
c) Transaction log
Q42(a) Define Integrity constraints. Explain different types of integrity constraints with
example.

(a) Create the following relational database:

1) Sales_order

Column Data type Size Attribute


Name

236

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Order_no Varchar2 6 Primary key / first letter must start with ‘O’
Order_date Date
Client_no Varchar2 6
Dely_addr Varchar2 25
Salesman_no Varchar2 6
Dely_type Char 1
Billed_yn Char 1
Dely_date date
Order_status Varchar2 10 Values in (‘In Process’,’Fullfilled ‘,’Back
order’,’Cancelled’)
1) Employees
Column Data type Size Attribute
Name
Emp_name Varchar2 32 Primary key
Street Varchar2 15
City Varchar2 15
3) Works
Column Name Data type Size Attribute
Emp_name Varchar2 32 Primary key
Company_name Varchar2 32
Salary number
4) company
Column Name Data type Size Attribute
Company_name Varchar2 32 Primary key
City Varchar2 15
5)manages
Column Name Data type Size Attribute
Emp_name Varchar2 32 Primary key
manager_name Varchar2 32

Q43. What do you mean by Application Security? Explain Data Encryption and Data
Decryption technique in Database.
Q44. Explain the following
a) Digital certificates
b) Triggers
c) Privileges with Grant option

Q45. What is statistical database? What are the techniques used to prevent users
from using some methods for accessing private information?

Q46. List the ACID Properties . Explain the usefulness of each .


Q47. Explain the following
a) Transaction failure
b) Storage Structure

237

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

c) Log based Recovery


d) Deferred Database Modification
e) Recovery with concurrent Transaction
Q48. Explain Shadow Paging with the help of example.
Q49. What do you mean by Log record Buffering and Database buffering? What is the
role of operating system in Buffer Management ?
Q50. What do you mean by failure with loss of Nonvolatile storage? Explain it.
Q51. Explain the following terms
a) Logical undo Logging
b) Transaction Rollback
c) Checkpoints
d) Restart Recovery
e) Recovery Algorithm
Q52. Describe briefly, the advance SQL features with examples
Q53. Explain the need of decomposition for a bad relational database design.
Q54. What are the objectives of database design? How database design can be
performed from E-R diagram containing many to many relationship, week strong
relationship, generalization and specialization?
Q55. Draw E-R diagram of online bookstore with suitable entity sets and relationship.
Q56.
employee (person-name, street, city)
works (person-name, company-name, salary)
company (company-name, city)
manages (person-name, manager-name)
a. Find the names of all employees who work for First Bank Corporation.
b. Find the names and cities of residence of all employees who work for First
Bank Corporation.
c. Find the names, streets, and cities of residence of all employees who work
for First Bank Corporation and earn more than $10,000.
d. Find all employees who live in the city where the company they work for
is located.
e. Find all employees who live in the same city and on the same street as their
managers.
f. Find all employees in the database who do not work for First Bank Corporation.
g. Find all employees in the database who earn more than every employee of
Small Bank Corporation.
h. Assume that the companies can be located in several cities. Find all companies
located in every city in which Small Bank Corporation is located.
i. Find all employees who earn more than the average salary of employees
who work in their companies.
j. Find the company that employs the most people.
k. Find the company that has the smallest payroll.
l. Find those companies that pay higher salaries, on average, than the average
salary at First Bank Corporation.
m. Modify the database such that Jones now lives in Newtown.
n. Give all employees of First Bank Corporation a 10 percent raise.

238

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

o. Give all managers in the database a 10 percent raise.


p. Give all managers in the database a 10 percent raise, unless the resulting
salary would be greater than $100,000; if it would be, give only a 3 percent
raise.
q. Delete all employees of Small Bank Corporation.

Q57 What is query execution plan? Explain it.


Q58 What is meant by heuristic optimization ? Discuss the main
heuristics that are applied during query optimization ?
Q59 How does a query tree represent a relational algebra expression?
What is meant by an execution of a query tree?
Q60 Discuss the rules for transformation of query trees .

Q61 List the cost function for the select and join method.

Q62 Let relation r1(A,B,C) and r2(C,D,E) have the following


properties : r1 has 20000 tuples , r2 has 45000 tuples , 25 tuples
of r1 fit on one block and 30 tuples of r2 fit on one block .
Estimate the number of block accesses required , using each of
the following join strategies for r1 X r2:
a. Nested loop join
b. Block nested loop join
c. Merge join
d. Hash join
Q63 Consider the relations r1(A,B,C), r2(C,D,E), and r3(E,F) , with Primary Keys A,C
and E respectively. Assume that r1 has 1000 tuples, r2 has 1500 tuples, and r3 has
750 tuples. Estimate the size of r1Xr2Xr3
Q64 Consider the relations r1(A,B,C), r2(C,D,E), and r3(E,F) ,Assume that there are
no primary keys, except the entire schema.Let V(C,r1) be 900, V(C,r2) be 1100,
V(E,r2) be 50 , and V(E,r3) be 100. Assume that r1 has 1000 tuples, r2 has 1500
tuples, and r3 has 750 tuples. Estimate the size of r1Xr2Xr3

Q65 Discuss the different algorithm for implementing each of the


following relational operators and circumstances under which
each algorithm can be used : Project , union
Q66 Discuss the different algorithm for implementing each of the
following relational operators and circumstances under which
each algorithm can be used :
Intersect , set Difference , Cartesian product.

239

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

Files--Introduction and Terminology


Programs depend on data input in order to properly express the solution
to a problem. In the examples seen thus far in this text, programs have taken
their data from:

 within themselves, perhaps as constants (batch style,) or


 dialogue with the user (interactive style)

These two styles, used alone, have in common that both the data
employed and the results produces do not survive the run of the program. When
the program is next run, the data must be obtained again (even if it is the same
data, or only slightly changed), and the results cannot be fed forward to become
the input for some other program. In order to work around these obstacles, a
means of storing data outside programs is required. Files serve not only this
purpose, but also provide a way of storing very large data collections, for which
individual entry to every program would be impractical. Indeed, it is often the
case in such instances that the data is the central theme to an entire symphony
of programs operating on it, and that no one of the programs in the collection is
nearly as important as the data itself.
A file resembles a book. Its structure (plot) is created in the mind of an
author and it must be written (encoded) on some medium. Once this has been
done, others can read it. However, in order to read it intelligibly, they must
follow the structure created by the author.
A file is a source or a sink for a collection of data.
Just as data must be structured or arranged in such a way as to represent
some real life problem, so also files must be structured (the plot, again) so as to
represent the data they are intended to store. There are as many ways to do this
as there are programmers, computers, operating systems, programs, and
problems. The definition of a file has been expressed in a broad and general
form for this very reason--the meaning must cover a lot of ground. In fact, by
this definition, the batched data within a program and the data input
interactively at a keyboard by the user of a program are both files--at least
conceptually.
At the highest and most abstract level, a book can be thought of in terms
of its plot, character and events. At a middle level, the book is perhaps
structured by named chapters. On a more detailed level, the book is a collection
of words on a page. That is, there are degrees of abstraction to the
concept book as there are for many other commonly used ideas. This
observation leads to the first classification of files, by the degree of their
abstraction:

240

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

A logical file is an abstract structuring of data storage as viewed by the


programmer and/or user of the program. This is the high level, or conceptual
view of a file.
A program file is a specific data collection as seen and manipulated by a
program. It is often (but not always) represented by a variable, perhaps of
type "file." At this middle level of abstraction a file can be regarded as residing
within the machine's memory, as existing only as long as the program that
employs it is active.
A physical file is a recording of a logical file. This usually takes the form of a
magnetic image on a disk or tape surface, in which form it exists independently
of any particular program. The details of this recording provide the lowest
level view of a file.
The distinction between the latter two levels of abstraction is not always
maintained, and the term physical file is used by some for both the memory
image and the external storage.
In order to write a useful program that employs files, some attention
needs to be paid to all three levels of abstraction. The contents of the file must
be decided on abstractly and conceptually; a program must be written to render
the abstractions into a solution; and the resulting data must actually be recorded
externally to the program so that it survives when the program terminates.
Alternatively, an existing file may have to be read by a program, and this
operation in turn can only be expressed if the original logical structure of the
physical file is known.
The relationships among the three views of a file are expressed in figure
8.6 below. Observe that if the broadest possible view of a file is used, all input
to and output from a program uses files.

Fortunately for programmers, the troublesome details of how files are to


be physically stored can be left to the operating system, whose function it is to
make such recordings, name them, keep track of them, ensure their integrity,
and deliver them back to a program upon demand. This observation might seem
to indicate that a simple and universal file handling module could serve as the
interface between programs and operating systems.
Unfortunately there are many different operating systems, and widely
differing views of what constitutes a simple program interface with any one of

241

Downloaded by Sangam Srivastav ([email protected])


lOMoARcPSD|45233824

them. Thus, there have been many attempts to provide universal file handling
interfaces, and these differ widely. Indeed, perhaps the most troublesome area
for both the designers of a computing notation and for those who program
using it is how to deal with input and output, as they are on the one hand
essential to any substantial programming activity, and on the other closely tied
to a particular system.
The problem for a language designer is the necessity to find common
ground ahead of time for all possible external devices and operating systems so
that I/O routines can be universally applicable. In Modula-2, this problem has
been partially avoided by placing such matters outside the purvey of the
language proper, and by assuming instead that any device needing
communications links with a program will have these facilities supplied in a
particular implementation by appropriate library modules.
This results in the Modula-2 notation itself being small and versatile, but
causes somewhat more work for the programmer, who often had a large
number of library routines to keep straight--especially if using more than one
version--for then there was no guarantee that such libraries would correspond.
In spite of this deliberate lack of pre-specification by Wirth (he required
no particular I/O routines or modules, and only made a few suggestions of
modules he had found generally useful), much can still be said about such
functions. Although operating systems differ widely, there are many things that
they do have in common. Moreover, there are not many kinds of logical file in
common use, even though the recordings of such files may be very different. As
a result, many vendors of Modula-2 products produced very similar libraries for
I/O and to some extent, this tended to create a de facto or marketplace standard.
One section of this chapter is devoted to outlining the most common I/O and
file handling routines used by commercial vendors in the years when no official
standards existed.
Even the ubiquitous classical high level
modules InOut and RealInOut have many variations however, and lower level
modules often have more differences than similarities between
implementations. This was one of the major reasons for convening a working
group (WG13) of the International Standards Organization (ISO) in April of
1987 to produce a standard definition of both Modula-2 and its libraries. This
standard will be the focus for most of the rest of this chapter and will be used
subsequently when a sample solution happens to call for the use of files.
Before looking at the specifics of handling the program/logical file
communication, however, some additional attention needs to be paid to the
logical view of data storage.

242

Downloaded by Sangam Srivastav ([email protected])

You might also like