0% found this document useful (0 votes)
57 views88 pages

DBMS Complete Notes

4 semester computer engineering notes for Database management system. Good luck engineers

Uploaded by

free98072fire
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views88 pages

DBMS Complete Notes

4 semester computer engineering notes for Database management system. Good luck engineers

Uploaded by

free98072fire
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

1

Database Management System (EG2201CT)


Unit 1. Introduction [8 marks]

History
1950s and early 1960s:
• Data processing using magnetic tapes for storage.
• Tapes provided only sequential access.
• Punched cards for input.
Late 1960s and 1970s:
• Hard disks allowed direct access to data.
• Hierarchical and network data models in widespread use
✓ IBM’s DL/I (Data Language One)
✓ CODAYSL’s DBTG (Data Base Task Group) model
• Ted Codd defines the relational data model.
✓ IBM Research develops System R prototype.
✓ UC Berkeley develops Ingres prototype.
• Entity Relationship Model for database design
1980s:

• Research relational prototypes evolve into commercial systems.


✓ DB2 from IBM is the first DBMS product based on the relational model
✓ Oracle and Microsoft SQL Server are the most prominent commercial
DBMS products based on the relational model.
• SQL becomes industrial standard
• Parallel and distributed database systems
• Object oriented database systems (OODBMS)
• Object relational database systems allow both relational and object views of
data in the same database.
Late 1990s:

• Large decision support and data mining applications


• Large multi-terabyte data warehouses
• Emergence of Web commerce
Early 2000s:

• XML and XQuery standards


• Automated database administration

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


2

Later 2000s

• Web databases (semi structured data, XML, complex data types)


• Cloud computing
• Giant data storage systems (Google BigTable , Yahoo PNuts , Amazon Web Services,
…)
• Advanced databases (mainly non-relational (e.g., graph based, text based) but also
advanced relational)

What is Database?
A database is an organized collection of data, stored and accessed electronically.
Databases are used to store and manage large amounts of structured and unstructured
data, and they can be used to support a wide range of activities, including data storage,
data analysis, and data management.
There are many different types of databases, including relational databases, object-
oriented databases, and NoSQL databases, and they can be used in a variety of settings,
including business, scientific, and government organizations.

Database Applications
1. Universities: student information, teacher information, non-teaching staff information,
course information, section information, grade report information, and many more.
2. Banking:customer details, asset details, banking transactions, balance sheets, credit
card and debit card details, loans, fixed deposits, and much more.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


3

3. Railway Reservation System: passenger name, mobile number, booking status,


reservation details, train schedule, employee information, account details, seating
arrangement, route & alternate route details, etc.
4. Social Media Sites: millions of accounts, which means they have a huge amount of data
that needs to be stored and maintained. Social media sites use databases to store
information about users, images, videos, chats, etc.
5. Library Management System: stores information like book name, issue date, author
name, book availability, book issuer name, book return details, etc.
6. E-commerce Websites: customer details, product details, dealer details, purchase
details, bank & card details, transactions details, invoice details, etc.
7. Medical: patient details, medicine details, practitioner details, surgeon details,
appointment details, doctor schedule, patient discharge details, payment detail, invoices,
and other medical records.
8. Accounting and Finance: accounting details, bank details, purchases of stocks, invoice
details, sales records, asset details, etc.
9. Industries: customer details, sales records, product lists, transactions, etc. All the
information is kept secure and maintained by the database.
10. Airline Reservation System: passenger name, passenger check-in, passenger
departure, flight schedule, number of flights, distance from source to destination,
reservation information, pilot details, accounting detail, route detail, etc.
11. Telecommunication: customer names, phone numbers, calling details, prepaid &
post-paid connection records, network usage, bill details, balance details, etc.
12. Manufacturing: product details, customer information, order details, purchase details,
payment info, worker's details, invoice, etc.
13. Human Resource Management: employee name, joining details, designation, salary
details, tax information, benefits & goodies details, etc.
14. Broadcasting: subscriber information, event recordings, event schedules, etc., so it
becomes important to store broadcasting data in the database.
15. Insurance: policy details, user details, buyer details, payment details, nominee details,
address details, etc.

Characteristics
Self-describing nature of a database system:
A database system is referred to as self-describing because it not only contains the
database itself, but also metadata which defines and describes the data and

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


4

relationships between tables in the database. This information is used by the DBMS
software or database users if needed. This separation of data and information about
the data makes a database system totally different from the traditional file-based
system in which the data definition is part of the application programs.
Insulation between program and data
In the file-based system, the structure of the data files is defined in the application
programs so if a user wants to change the structure of a file, all the programs that
access that file might need to be changed as well.
On the other hand, in the database approach, the data structure is stored in the
system catalogue and not in the programs. Therefore, one change is all that is
needed to change the structure of a file. This insulation between the programs and
data is also called program-data independence.
Support for multiple views of data
A database supports multiple views of data. A view is a subset of the database,
which is defined and dedicated for users of the system. Multiple users in the system
might have different views of the system. Each view might contain only the data of
interest to a user or group of users.
Sharing of data and multiuser system
Current database systems are designed for multiple users. That is, they allow many
users to access the same database at the same time. This access is achieved
through features called concurrency control strategies. These strategies ensure that
the data accessed is always correct and that data integrity is maintained.
The design of modern multiuser database systems is a great improvement from
those in the past which restricted usage to one person at a time.
Control of data redundancy
In the database approach, ideally, each data item is stored in only one place in the
database. In some cases, data redundancy still exists to improve system
performance, but such redundancy is controlled by application programming and
kept to minimum by introducing as little redundancy as possible when designing the
database.
Data sharing
The integration of all the data, for an organization, within a database system has
many advantages. First, it allows for data sharing among employees and others who
have access to the system. Second, it gives users the ability to generate more
information from a given amount of data than would be possible without the
integration.
Enforcement of integrity constraints
Database management systems must provide the ability to define and enforce
certain constraints to ensure that users enter valid information and maintain data
integrity. A database constraint is a restriction or rule that dictates what can be

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


5

entered or edited in a table such as a postal code using a certain format or adding a
valid city in the City field.
There are many types of database constraints. Data type, for example, determines
the sort of data permitted in a field, for example numbers only. Data uniqueness
such as the primary key ensures that no duplicates are entered. Constraints can be
simple (field based) or complex (programming).
Restriction of unauthorized access
Not all users of a database system will have the same accessing privileges. For
example, one user might have read-only access (i.e., the ability to read a file but not
make changes), while another might have read and write privileges, which is the
ability to both read and modify a file. For this reason, a database management
system should provide a security subsystem to create and control different types of
user accounts and restrict unauthorized access.
Data independence
Another advantage of a database management system is how it allows for data
independence. In other words, the system data descriptions or data describing data
(metadata) are separated from the application programs. This is possible because
changes to the data structure are handled by the database management system
and are not embedded in the program itself.
Transaction processing
A database management system must include concurrency control subsystems.
This feature ensures that data remains consistent and valid during transaction
processing even if several users update the same information.
Provision for multiple views of data
By its very nature, a DBMS permits many users to have access to its database either
individually or simultaneously. It is not important for users to be aware of how and
where the data they access is stored.
Backup and recovery facilities
Backup and recovery are methods that allow you to protect your data from loss. The
database system provides a separate process, from that of a network backup, for
backing up and recovering data. If a hard drive fails and the database stored on the
hard drive is not accessible, the only way to recover the database is from a backup.
If a computer system fails in the middle of a complex update process, the recovery
subsystem is responsible for making sure that the database is restored to its original
state. These are two more benefits of a database management system.

Architecture
• The DBMS design depends upon its architecture. The basic client/server
architecture is used to deal with many PCs, web servers, database servers
and other components that are connected with networks.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


6

• The client/server architecture consists of many PCs and a workstation which


are connected via the network.
• DBMS architecture depends upon how users are connected to the database
to get their request done.
Types:
1. 1-Tier
• The database is directly available to the user. It means the user can directly sit on
the DBMS and use it.
• Any changes done here will directly be done on the database itself. It doesn't
provide a handy tool for end users.
• The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.
2. 2-Tier
• The 2-Tier architecture is the same as basic client-server. In the two-tier
architecture, applications on the client end can directly communicate with the
database at the server side. For this interaction, API's like: ODBC, JDBC are used.
• The user interfaces and application programs are run on the client-side.
• The server side is responsible for providing the functionalities like query processing
and transaction management.
• To communicate with the DBMS, client-side application establishes a connection
with the server side.
3. 3-Tier
• The 3-Tier architecture contains another layer between the client and server. In this
architecture, the client can't directly communicate with the server.
• The application on the client-end interacts with an application server which further
communicates with the database system.
• The end user has no idea about the existence of the database beyond the
application server. The database also has no idea about any other user beyond the
application.
• The 3-Tier architecture is used in the case of large web applications.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


7

Data abstraction and Independence: Three schema Architecture


• called ANSI/SPARC architecture or three-level architecture.
• This framework is used to describe the structure of a specific database system.
• used to separate the user applications and physical database.
• contains three levels.

Objectives of Three schema Architecture


• Different users need different views of the same data.
• The approach in which a particular user needs to see the data may change over
time.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


8

• The users of the database should not worry about the physical implementation
and internal workings of the database such as data compression and encryption
techniques, hashing, optimization of the internal structures etc.
• All users should be able to access the same data according to their requirements.
• DBA should be able to change the conceptual structure of the database without
affecting the user's
• The internal structure of the database should be unaffected by changes to
physical aspects of the storage.
Data Abstraction:
Data abstraction refers to the hiding of details from users at certain levels, for
authentication and security purposes. Any DBMS architecture mainly consists of
three levels: Conceptual level, Internal level, and External level.
Conceptual level

• It describes the logical structure of the database


• Specifies what type of data can be stored in the database by defining the
data type and data type signs and constraints like (primary key, foreign key)
etc.
• It also specifies the relationship between tables
• Example: Create table emp (id num (5) primary key, name varchar(10));
External level

• External level describes users view of the database


• It provides security mechanism i.e., some users can access only a certain
portion of data, it depends upon the database administrator which users can
access the conceptual level and at what extent
Internal level

• This is the lowest level of abstraction describes how physical data is stored
• It provides details about the complex data structures that are used for
storage of data
• Internal level provides indexes and clusters to control and manage the
physically stored data in hard disk

Data Independence
These levels of abstraction provide data independence i.e., is all the transactions or
changes made at one level are unaffected to other levels

DBMS architecture provides two types of data independence

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


9

• Logical data independence


• Physical data independence
Logical data independence
Logical data Independence states that external level is completely unaffected are
free from any changes that are made at the conceptual level and vice-versa
Example: Adding a new entity in the conceptual level should not affect the external
level
Physical data independence
Physical data Independence states that conceptual level is completely unaffected
are free from any changes that are made at the internal level and vice-versa
Example: Adding a new entity in the internal level should not affect the conceptual
level
What are Instances?
These refer to a collection of all the information and data stored at any given moment. One
can easily change these instances using certain CRUD operations, such as deletion and
addition of data and information.
You must note that no search queries make any changes in any instances.
What is a Schema?
It refers to an overall description that we get for any given database. In simpler words,
schema refers to the basic structure of how one needs to store data in any database. There
are basically two types of Schemas: Physical Schema and Logical Schema.
Physical Schema – This schema describes the DB designed at a physical level.
Logical Schema – This schema describes the DB designed at a logical level.
Difference Between Schema and Instance in DBMS

Parameters Schema Instance


Meaning Schema refers to the overall Instance basically refers to a
description of any given database. collection of data and information
that the database stores at any
moment.
Alternation The schema remains the same for One can change the instances of
the entire database. data and information in a
database using updating,
deletion, and addition.
Frequency of It does not change very frequently. It changes very frequently.
Change

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


10

Uses We use Schema for defining the We use Instance for referring to a
basic structure of any given set of information at any given
database. It defines how the instance/ time.
available needs to get stored.

Classifications of DBMS
1) Centralized Database
It is the type of database that stores data in a centralized database system. It allows the
users to access the stored data from different locations through several applications.
These applications contain the authentication process to let users access data securely.
An example of a Centralized database can be the Central Library that carries a central
database of each library in a college/university.
Advantages

• It has decreased the risk of data management, i.e., manipulation of data will not
affect the core data.
• Data consistency is maintained as it manages data in a central repository.
• It provides better data quality, which enables organizations to establish data
standards.
• It is less costly because fewer vendors are required to handle the data sets.
Disadvantages

• The size of the centralized database is large, which increases the response time for
fetching the data.
• It is not easy to update such an extensive database system.
• If any server failure occurs, entire data will be lost, which could be a huge loss.
2) Distributed Database
Unlike a centralized database system, in distributed systems, data is distributed among
different database systems of an organization. These database systems are connected via
communication links. Such links help the end-users to access the data easily. Examples of
the Distributed database are Apache Cassandra, HBase, Ignite, etc.
Homogeneous DDB: Those database systems which execute on the same operating
system and use the same application process and carry the same hardware devices.
Heterogeneous DDB: Those database systems which execute on different operating
systems under different application procedures and carries different hardware devices.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


11

Advantages

• Modular development is possible in a distributed database, i.e., the system can be


expanded by including new computers and connecting them to the distributed
system.
• One server failure will not affect the entire data set.
3) Relational Database
This database is based on the relational data model, which stores data in the form of
rows(tuple) and columns(attributes), and together forms a table(relation). A relational
database uses SQL for storing, manipulating, as well as maintaining the data. E.F. Codd
invented the database in 1970. Each table in the database carries a key that makes the
data unique from others. Examples of Relational databases are MySQL, Microsoft SQL
Server, Oracle, etc.
4) NoSQL Database
Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of data
sets. It is not a relational database as it stores data not only in tabular form but in several
different ways. It came into existence when the demand for building modern applications
increased. Thus, NoSQL presented a wide variety of database technologies in response to
the demands. We can further divide a NoSQL database into the following four types:
Key-value storage: It is the simplest type of database storage where it stores every single
item as a key (or attribute name) holding its value, together.
Document-oriented Database: A type of database used to store data as JSON-like
document. It helps developers in storing data by using the same document-model format
as used in the application code.
Graph Databases: It is used for storing vast amounts of data in a graph-like structure. Most
commonly, social networking websites use the graph database.
Wide-column stores: It is similar to the data represented in relational databases. Here,
data is stored in large columns together, instead of storing in rows.
Advantages

• It enables good productivity in the application development as it is not required to


store data in a structured format.
• It is a better option for managing and handling large data sets.
• It provides high scalability.
• Users can quickly access data from the database through key-value.
5) Cloud Database

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


12

A type of database where data is stored in a virtual environment and executed over the
cloud computing platform. It provides users with various cloud computing services (SaaS,
PaaS, IaaS, etc.) for accessing the database. There are numerous cloud platforms, but the
best options are:

• Amazon Web Services (AWS)


• Microsoft Azure
• Kamatera
• PhonixNAP
• ScienceSoft
• Google Cloud SQL, etc.
6) Object-oriented Databases
The type of database that uses the object-based data model approach for storing data in
the database system. The data is represented and stored as objects which are like the
objects used in the object-oriented programming language.
Other popular types of databases are: Hierarchical database, Network database, Personal
database, Operational database, Enterprise database etc.

SQL Commands
• SQL commands are instructions. It is used to communicate with the database. It is
also used to perform specific tasks, functions, and queries of data.
• SQL can perform various tasks like creating a table, add data to tables, drop the
table, modify the table, set permission for users.
1. Data Definition Language (DDL)
• DDL changes the structure of the table like creating a table, deleting a table, altering
a table, etc.
• All the commands of DDL are auto committed that means it permanently saves all
the changes in the database.
• Here are some commands that come under DDL:
➢ CREATE
➢ ALTER
➢ DROP
➢ TRUNCATE
2. Data Manipulation Language (DML)
• DML commands are used to modify the database. It is responsible for all forms of
changes in the database.
• The command of DML is not auto committed that means it can't permanently save
all the changes in the database. They can be rollback.
• Here are some commands that come under DML:

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


13

➢ INSERT
➢ UPDATE
➢ DELETE
3. Data Control Language (DCL)
• DCL commands are used to grant and take back authority from any database user.
• Here are some commands that come under DCL:
➢ Grant
➢ Revoke

Unit 2. Data Models [14 Marks]


Introduction to Entity Relationship Model
Peter Chen developed the ER diagram in 1976. The ER model was created to provide a
simple and understandable model for representing the structure and logic of databases. It
has since evolved into variations such as the Enhanced ER Model and the Object
Relationship Model
The Entity Relational Model is a model for identifying entities to be represented in the
database and representation of how those entities are related. The ER data model specifies
enterprise schema that represents the overall logical structure of a database graphically.
The Entity Relationship Diagram explains the relationship among the entities present in the
database. ER models are used to model real-world objects like a person, a car, or a
company and the relation between these real-world objects. In short, the ER Diagram is the
structural format of the database.
Entity
An entity is a “thing” or “object” in the real world. An entity contains attributes, which
describe that entity. So, anything about which we store information is called an entity.
Entities are recorded in the database and must be distinguishable, i.e., easily recognized by
the group.
For example: A student, an employee, or bank a/c, etc. all are entities.
Entity Set
An entity set is a collection of similar types of entities that share the same attributes.
For example: All students at a school are a entity set of Student entities.
Key Terminologies used in Entity Set:

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


14

Attributes: Attributes are the houses or traits of an entity. They describe the data that may
relate to an entity.
Entity Type: A category or class of entities that share the same attributes is referred to as
an entity kind.
Entity Instance: An entity example is a particular incidence or character entity within an
entity type. Each entity instance has a unique identity, often known as the number one key.
Primary Key: A primary key is a unique identifier for every entity instance inside an entity
kind.
It can be classified into two types:
Strong Entity Set
Strong entity sets exist independently and each instance of a strong entity set has a unique
primary key.
Example of Strong Entity includes:

• Car Registration Number


• Model
• Name etc.
Weak Entity Set
A weak entity cannot exist on its own; it is dependent on a strong entity to identify it. A weak
entity does not have a single primary key that uniquely identifies it; instead, it has a partial
key.
Example of Weak Entity Set includes:

• Laptop Color
• RAM, etc.

Kinds of Entities
There are two types of Entities:
Tangible Entity

• A tangible entity is a physical object or a physical thing that can be physically


touched, seen, or measured.
• It has a physical existence or can be seen directly.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


15

• Examples of tangible entities are physical goods or physical products (for example,
“inventory items” in an inventory database) or people (for example, customers or
employees).
Intangible Entity

• Intangible entities are abstract or conceptual objects that are not physically present
but have meaning in the database.
• They are typically defined by attributes or properties that are not directly visible.
• Examples of intangible entities include concepts or categories (such as “Product
Categories” or “Service Types”) and events or occurrences (such as appointments
or transactions).
Entity Types in DBMS
Strong Entity Types: These are entities that exist independently and have a completely
unique identifier.
Weak Entity Types: These entities depend on another entity for his or her lifestyles and do
now not have a completely unique identifier on their own.
The Example of Strong and Weak Entity Types in DMBS is:
Associative Entity Types: These constitute relationships between or greater entities and
might have attributes of their own.
Derived Entity Types: These entities are derived from different entities through a system or
calculation.
Multi-Valued Entity Types: These entities will have more than one value for an
characteristic.
Attributes
In DBMS, there are various types of attributes available:

• Simple Attributes
• Composite Attributes
• Single Valued Attributes
• Multi-Valued Attributes
• Derived Attributes
• Complex Attributes (Rarely used attributes)
• Key Attributes
• Stored Attributes

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


16

Simple Attributes
Simple attributes in an ER model diagram are independent attributes that can't be
classified further and can't be subdivided into any other component. These attributes are
also known as atomic attributes.
Composite Attributes
Composite attributes have opposite functionality to of simple attributes as we can further
subdivide composite attributes into different components or sub-parts that form simple
attributes. In simple terms, composite attributes are composed of one or more simple
attributes.
Single-Valued Attributes
Single-valued attributes are those attributes that consist of a single value for each entity
instance and can't store more than one value. The value of these single-valued attributes
always remains the same, just like the name of a person.
Multi-Valued Attributes
Multi-valued attributes have opposite functionality to that of single-valued attributes, and
as the name suggests, multi-valued attributes can take up and store more than one value
at a time for an entity instance from a set of possible values. These attributes are
represented by co-centric elliptical shape, and we can also use curly braces { } to represent
multi-valued attributes inside it.
Derived Attributes

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


17

Derived attributes in DBMS are those attributes whose values can be derived from the
values of other attributes. They are always dependent upon other attributes for their value.
For example, as we were discussing above, DOB is a single-valued attribute and remains
constant for an entity instance. From DOB, we can derive the Age attribute, which changes
every year, and can easily calculate the age of a person from his/her date of birth value.
Hence, the Age attribute here is derived attribute from the DOB single-valued attribute.
Key Attributes
Key attributes are special types of attributes that act as the primary key for an entity and
they can uniquely identify an entity from an entity set. The values that key attributes store
must be unique and non-repeating.
Stored Attributes
Values of stored attributes remain constant and fixed for an entity instance and, and they
help in deriving the derived attributes. For example, the Age attribute can be derived from
the Date of Birth attribute, and the Date of birth attribute has a fixed and constant value
throughout the life of an entity. Hence, the Date of Birth attribute is a stored attribute.

Relationship Types
A relationship in DBMS is the way in which two or more data sets are linked, i.e., any
association between two entity types is called a relationship. So, an entity takes part in the
relationship, and it is represented by a diamond shape. Three specific types of

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


18

relationships can exist between the tables, and they are One-to-One, One-to-Many and
Many-to-One relationship.
One-to-One Relationship
A One-to-one relationship means a single record in Table A is related to a single record in
Table B and vice-versa.
For example, if there are two entities, 'Person'(Name, age, address, contact no.) and
‘Citizenship Card’ (Name, Citizenship number.). So, each person can have only one Aadhar
card, and the single Aadhar card belongs to only one person.
This type of relationship is used for security purposes. In the above example, we can store
the Citizenship number in the same entity 'Person', but we created another table for the
Citizenship number because the Citizenship number may be sensitive data and should be
hidden from others. It is also represented as a 1:1 relationship.

One-to-Many Relationship
Such a relationship exists when each record of table A can be related to one or more
records of another table i.e., table B. However, a single record in table B will have a link to a
single record in table A.
For example, if there are two entities, 'Customer' and 'Account', then each customer can
have more than one account, and each account is owned by one customer only.
It is also represented as a 1: N relationship.

Many-to-Many Relationship

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


19

A many-to-many relationship exists between the tables if a single record of the first table is
related to one or more records of the second table and a single record in the second table
is related to one or more records of the first table.
For example, consider the two tables i.e., a student table and a courses table. A particular
student may enroll himself in one or more than one course, while a course also may have
one or more students. It is also represented as an M: N relationship.
A many-to-many relationship from the perspective of table A.

A many-to-many relationship from the perspective of table B.

E-R diagrams
An Entity Relationship Diagram in DBMS is a blueprint of the database that can be later
implemented as an actual database in the form of tables. It is a "diagrammatic
representation of the database."
Why Use ER Diagrams?
The main reasons for using the ER diagram before constructing an actual database are as
follows:

• An Entity Relationship Diagram is used for modeling the data that will be stored in a
database.
• The database designers get a better understanding of the information that will be
contained in the database using the Entity Relationship Diagram.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


20

• An ER diagram is used as a blueprint by the database designers to implement the


data in a certain application.
• They define what data will be stored in the databases, that is, the entities and their
attributes.
• However, they also specify the relationships between the data.
• It provides a preview of how the tables should be connected and what entities are in
which table.
• These ER diagrams can easily be converted into relational tables that help in the
designing of the software quickly.
ER Diagrams Symbols & Notations
Since we know the entity-relationship diagram has entities, attributes, and the relationship
between the data. However, all these components of the ER diagram are represented with
the help of certain symbols.
Rectangle: It is used to represent the entities in an entity-relationship diagram.
Ellipses/Oval: This symbol is used to represent the attributes in an entity-relationship
diagram.
Diamond: This symbol is used to represent the type of relationship that exists between the
entities such as one-to-one, many-to-one, and many-to-many.
Lines: It links the entities to the relationship types whereas the attributes to the entity
types.
Double Ellipses: It is used to represent a multivalued attribute.
Double rectangle: It is used to represent a weak entity.
Double diamond: It is used to represent a weak relationship between entities.

ER diagram of Library Management System

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


21

Create ER diagrams

• Hotel Management System


• Airlines Ticket Reservation System
• Hospital Management System
• Online Shopping Platform

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


22

Unit 3. Normalization [11 Marks]


Keys
The different types of keys in DBMS are −
Candidate Key - The candidate keys in a table are defined as the set of keys that is minimal
and can uniquely identify any data row in the table.
Primary Key - The primary key is selected from one of the candidate keys and becomes the
identifying key of a table. It can uniquely identify any data row of the table.
Super Key - Super Key is the superset of primary key. The super key contains a set of
attributes, including the primary key, which can uniquely identify any data row in the table.
Composite Key - If any single attribute of a table is not capable of being the key i.e it
cannot identify a row uniquely, then we combine two or more attributes to form a key. This
is known as a composite key.
Secondary Key - Only one of the candidate keys is selected as the primary key. The rest of
them are known as secondary keys.
Foreign Key - A foreign key is an attribute value in a table that acts as the primary key in
another table. Hence, the foreign key is useful in linking together two tables. Data should
be entered in the foreign key column with great care, as wrongly entered data can invalidate
the relationship between the two tables.

Functional Dependency
Functional dependency (FD) is a set of constraints between two attributes in a relation.
Functional dependency says that if two tuples have same values for attributes A1, A2..., An,
then those two tuples must have to have same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X
functionally determines Y. The left-hand side attributes determine the values of attributes
on the right-hand side.

Armstrong's Axioms
Reflexive Rule − If alpha is a set of attributes and beta is subset of alpha, then alpha holds
beta.
Augmentation Rule − If a → b holds and y is attribute set, then ay → by also holds. That is
adding attributes in dependencies, does not change the basic dependencies.
Transitivity Rule − Same as transitive rule in algebra, if a → b holds and b → c holds, then a →
c also holds. a → b is called as a functionally that determines b.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


23

Trivial Functional Dependency


Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is
called a trivial FD. Trivial FDs always hold.
Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial
FD.
Completely non-trivial − If an FD X → Y holds, where x intersects Y = Φ, it is said to be a
completely non-trivial FD.

Anomalies on Database
Update Anomalies − If data items are scattered and are not linked to each other properly,
then it could lead to strange situations. For example, when we try to update one data item
by having its copies scattered over several places, a few instances get updated properly
while a few others are left with old values. Such instances leave the database in an
inconsistent state.
Deletion Anomalies − We tried to delete a record, but parts of it were left undeleted
because of unawareness, the data is also saved somewhere else.
Insert Anomalies − We tried to insert data in a record that does not exist at all.

Normalization
Normalization is a method to remove all the above anomalies and bring the database to a
consistent state.

First Normal Form (1NF)


First Normal Form is defined in the definition of relations (tables) itself. This rule defines
that all the attributes in a relation must have atomic domains. The values in an atomic
domain are indivisible units.

Course Content
Programming Java, C++
Web HTML, PHP, ASP
We re-arrange the relation (table) as below, to convert it to First Normal Form.

Course Content
Programming Java
Programming C++
Web HTML
Web PHP
Web ASP
Each attribute must contain only a single value from its pre-defined domain.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


24

Second Normal Form (2NF)


Before we learn about the second normal form, we need to understand the following −
Prime attribute − An attribute, which is a part of the candidate-key, is known as a prime
attribute.
Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a non-
prime attribute.
If we follow the second normal form, then every non-prime attribute should be fully
functionally dependent on prime key attribute. That is, if X → A holds, then there should not
be any proper subset Y of X, for which Y → A also holds true.
Student_Project

Stu_id Proj_id Stu_name Proj_name


We see here in Student_Project relation that the prime key attributes are Stu_ID and
Proj_ID. According to the rule, non-key attributes, i.e., Stu_Name and Proj_Name must be
dependent upon both and not on any of the prime key attributes individually. But we find
that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by Proj_ID
independently. This is called partial dependency, which is not allowed in Second Normal
Form.
Student

Stu_id Stu_name Proj_id


Project

Proj_id Proj_name
We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.

Third Normal Form (3NF)


For a relation to be in Third Normal Form, it must be in Second Normal form and the
following must satisfy −
No non-prime attribute is transitively dependent on prime key attribute.
For any non-trivial functional dependency, X → A, then either
X is a super key or, A is the prime attribute.
Student_Detail

Stu_id Stu_name Zip City


We find that in the above Student_detail relation, Stu_ID is the key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


25

super key nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists
transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as
follows.
Student_Detail

Stu_id Stu_name Zip


Zip_Code

Zip City
Boyce-Codd Normal Form (BCNF)
❖ BCNF is the advanced version of 3NF. It is stricter than 3NF.
❖ A table is in BCNF if every functional dependency X → Y, X is the super key of the
table.
❖ For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one
department.
EMPLOYEE table

Id Country Department Depart_Type Depart_No


100 Nepal Designing D123 02
100 Nepal Testing D123 03
200 India Stores D234 04
200 India Developing D234 05
In the above table Functional dependencies are as follows:
Id → Country
Department → {Depart_Type, Depart_No}
Candidate key: {Id, Department}
The table is not in BCNF because neither Department nor Id alone are keys. To convert the
given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:

Id Country
100 Nepal
200 India
EMP_DEPT table:

Department Depart_Type Depart_No

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


26

Designing D123 02
Testing D123 03
Stores D234 04
Developing D234 05
EMP_DEPT_MAPPING table:

Id Department
100 Designing
100 Testing
200 Stores
200 Developing
Functional dependencies:
Id → Country
Department → {Depart_Type, Depart_No}
Candidate keys:
For the first table: Id
For the second table: Department
For the third table: {Id, Department}
Now, this is in BCNF because the left-side part of both the functional dependencies is a
key.

Integrity Constraints
Integrity constraints are a set of rules. It is used to maintain the quality of information.
Integrity constraints ensure that the data insertion, updating, and other processes must be
performed in such a way that data integrity is not affected. Thus, integrity constraint is used
to guard against accidental damage to the database.
Types of Integrity Constraint

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


27

1. Domain constraints
Domain constraints can be defined as the definition of a valid set of values for an attribute.
The data type of domain includes string, character, integer, time, date, currency, etc. The
value of the attribute must be available in the corresponding domain.

2. Entity integrity constraints


The entity integrity constraint states that primary key value can't be null.
This is because the primary key value is used to identify individual rows in relation and if the
primary key has a null value, then we can't identify those rows. A table can contain a null
value other than the primary key field.

3. Referential Integrity Constraints


A referential integrity constraint is specified between two tables.
In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key of
Table 2, then every value of the Foreign Key in Table 1 must be null or be available in Table 2.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


28

4. Key constraints
Keys are the entity set that is used to identify an entity within its entity set uniquely.
An entity set can have multiple keys, but out of which one key will be the primary key. A
primary key can contain a unique and null value in the relational table.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


29

Unit 4. Relational Language [14 Marks]


Introduction to SQL
IBM developed the original version of SQL, originally called Sequel, as part of the System R
project in the early 1970s. The Sequel language has evolved since then, and its name has
changed to SQL (Structured Query Language). Many products now support the SQL
language. SQL has clearly established itself as the standard relational database language.
In 1986, the American National Standards Institute (ANSI) and the International
Organization for Standardization (ISO) published an SQL standard, called SQL-86. ANSI
published an extended standard for SQL, SQL-89, in 1989. The next version of the standard
was SQL-92 standard, followed by SQL:1999, SQL:2003, SQL:2006, SQL:2008, SQL:2011,
and most recently SQL:2016.
SQL uses the terms table, row, and column for the formal relational model terms relation,
tuple, and attribute, respectively.

Features of SQL
Data-definition language (DDL): The SQL DDL provides commands for defining relation
schemas, deleting relations, and modifying relation schemas.
Data-manipulation language (DML): The SQL DML provides the ability to query
information from the database and to insert tuples into, delete tuples from, and
modify tuples in the database.
Integrity: The SQL DDL includes commands for specifying integrity constraints that the
data stored in the database must satisfy. Updates that violate integrity constraints are
disallowed.
View definition: The SQL DDL includes commands for defining views.
Transaction control: SQL includes commands for specifying the beginning and end points
of transactions.
Embedded SQL and dynamic SQL: Embedded and dynamic SQL define how SQL
statements can be embedded within general-purpose programming languages, such as C,
C++, and Java.
Authorization: The SQL DDL includes commands for specifying access rights to relations
and views.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


30

Basic Types
The SQL standard supports a variety of built-in types, including:
char(n): A fixed-length character string with user-specified length n. The full form,
character, can be used instead.
varchar(n): A variable-length character string with user-specified maximum length n. The
full form, character varying, is equivalent.
int: An integer (a finite subset of the integers that is machine dependent). The full form,
integer, is equivalent.
smallint: A small integer (a machine-dependent subset of the integer type).
numeric(p, d): A fixed-point number with user-specified precision. The number consists of
p digits (plus a sign), and d of the p digits are to the right of the decimal point. Thus,
numeric(3,1) allows 44.5 to be stored exactly, but neither 444.5 nor 0.32 can be stored
exactly in a field of this type.
real, double precision: Floating-point and double-precision floating-point numbers with
machine-dependent precision.
float(n): A floating-point number with precision of at least n digits.
Each type may include a special value called the null value. A null value indicates an
absent value that may exist but be unknown or that may not exist at all.

Data Definition Language (DDL) Commands in SQL


The DDL Commands in Structured Query Language are used to create and modify the
schema of the database and its objects. The syntax of DDL commands is predefined for
describing the data. The commands of Data Definition Language deal with how the data
should exist in the database.
Following is the five DDL commands in SQL:

• CREATE Command
• DROP Command
• ALTER Command
CREATE Command
CREATE is a DDL command used to create databases, tables, triggers and other database
objects.
Syntax to Create a Database:
CREATE Database Database_Name;

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


31

Suppose you want to create a Books database in the SQL database. To do this, you must
write the following DDL Command:
Create Database Books;
Example 2: This example describes how to create a new table using the CREATE DDL
command.
Syntax to create a new table:
CREATE TABLE table_name(
column_Name1 data_type ( size of the column ) ,
column_Name2 data_type ( size of the column) ,
...
column_NameN data_type ( size of the column )
) ;

CREATE TABLE Student(


Roll_No Int,
First_Name Varchar (20),
Last_Name Varchar (20),
Age Int,
Marks float,
);

DROP Command
DROP is a DDL command used to delete/remove the database objects from the SQL
database. We can easily remove the entire table, view, or index from the database using
this DDL command.
Syntax to remove a database:
DROP DATABASE Database_Name;
Suppose you want to delete the Books database from the SQL database. To do this, you
must write the following DDL command:
DROP DATABASE Books;
Example 2: This example describes how to remove the existing table from the SQL
database.
Syntax:
DROP TABLE Table_Name;
Suppose you want to delete the Student table from the SQL database. To do this, you must
write the following DDL command:
DROP TABLE Student;

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


32

ALTER Command
ALTER is a DDL command which changes or modifies the existing structure of the
database, and it also changes the schema of database objects.
We can also add and drop constraints of the table using the ALTER command.
Syntax to add a new field in the table:
ALTER TABLE name_of_table ADD column_name column_definition;
Suppose you want to add the 'Father's_Name' column in the existing Student table. To do
this, you must write the following DDL command:
ALTER TABLE Student ADD Father's_Name Varchar(60);
Example 2: This example describes how to remove the existing column from the table.
Syntax to remove a column from the table:
ALTER TABLE name_of_table DROP Column_Name_1 , column_Name_2 ,
….., column_Name_N;
Suppose you want to remove the Age and Marks column from the existing Student table. To
do this, you must write the following DDL command:
ALTER TABLE Student DROP Age, Marks;
Data Manipulation Language (DML)
The DML commands in Structured Query Language change the data present in the SQL
database. We can easily access, store, modify, update and delete the existing records from
the database using DML commands.
Following is the four main DML commands in SQL:

• SELECT Command
• INSERT Command
• UPDATE Command
• DELETE Command
SELECT Command
The SELECT command shows the records of the specified table. It also shows the record of
a particular column by using the WHERE clause.
Syntax:
SELECT column_Name_1,column_Name_2, …..,column_Name_N
FROM Name_of_table;
Here, column_Name_1, column_Name_2, ….., column_Name_N are the names of those
columns whose data we want to retrieve from the table.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


Emp_Id Emp_Salary
201 25000 33
202 45000
203 30000
204 29000
205 40000
If we want to retrieve the data from all the columns of the table, we must use the following
SELECT command:
SELECT * FROM table_name;
Example 1: This example shows all the values of every column from the table.
SELECT * FROM Student;
This SQL statement displays the following values of the student table:

Student_ID Student_Name Student_Marks


BCA1001 Abhay 85
BCA1002 Anuj 75
BCA1003 Bheem 60
BCA1004 Ram 79
BCA1005 Sumit 80

Example 2: This example shows all the values of a specific column from the table.
SELECT Emp_Id, Emp_Salary FROM Employee;
This SELECT statement displays all the values of Emp_Salary and Emp_Id column of
Employee table:
Example 3: This example describes how to use the WHERE clause with the SELECT DML
command.
Let's take the following Student table:
Student_ID Student_Name Student_Marks
BCA1001 Abhay 80
BCA1002 Ankit 75
BCA1003 Bheem 80
BCA1004 Ram 79
BCA1005 Sumit 80

If you want to access all the records of those students whose marks is 80 from the above
table, then you have to write the following DML command in SQL:
SELECT * FROM Student WHERE Stu_Marks = 80;
The above SQL query shows the following table in result

Student_ID Student_Name Student_Marks


BCA1001 Abhay 80

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


34

BCA1003 Bheem 80
BCA1005 Sumit 80

INSERT Command
INSERT is another most important data manipulation command in Structured Query
Language, which allows users to insert data in database tables.
Syntax:
INSERT INTO TABLE_NAME ( column_Name1 , column_Name2 ,
column_Name3 , .... column_NameN ) VALUES (value_1, value_2,
value_3, .... value_N ) ;
Example 1: This example describes how to insert the record in the database table. Let's
take the following student table, which consists of only 2 records of the student.

Stu_Id Stu_Name Stu_Marks Stu_Age


101 Ramesh 92 20
201 Jatin 83 19
Suppose you want to insert a new record into the student table. For this, you must write the
following DML INSERT command:
INSERT INTO Student (Stu_id, Stu_Name, Stu_Marks, Stu_Age)
VALUES (104, ‘Anmol’, 89, 19);

UPDATE Command
UPDATE is another most important data manipulation command in Structured Query
Language, which allows users to update or modify the existing data in database tables.
Syntax
UPDATE Table_name SET [column_name1= value_1, ….., column_nameN
= value_N] WHERE CONDITION;
Here, 'UPDATE', 'SET', and 'WHERE' are the SQL keywords, and 'Table_name' is the name of
the table whose values you want to update.
Example 1: This example describes how to update the value of a single field.

Product_Id Product_Name Product_Price Product_Quantity


P101 Chips 20 20
P102 Chocolates 60 40
P103 Maggi 75 5
P201 Biscuits 80 20

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


35

P203 Namkeen 40 50
Let's take a Product table consisting of the following records:
Suppose you want to update the Product_Price of the product whose Product_Id is P102.
To do this, you must write the following DML UPDATE command:
UPDATE Product SET Product_Price=80 WHERE Product_Id = 'P102';
Example 2: This example describes how to update the value of multiple fields of the
database table.
Let's take a Student table consisting of the following records:
Stu_Id Stu_Name Stu_Marks Stu_Age
101 Ramesh 92 20
201 Jatin 83 19
202 Anuj 85 19
203 Monty 95 21
102 Saket 65 21
103 Sumit 78 19
104 Ashish 98 20
Suppose you want to update Stu_Marks and Stu_Age of that student whose Stu_Id is 103
and 202. To do this, you have to write the following DML Update command:
UPDATE Student SET Stu_Marks = 80, Stu_Age = 21 WHERE Stu_Id =
103 AND Stu_Id = 202;
DELETE Command
DELETE is a DML command which allows SQL users to remove single or multiple existing
records from the database tables.
This command of Data Manipulation Language does not delete the stored data
permanently from the database. We use the WHERE clause with the DELETE command to
select specific rows from the table.
Syntax
DELETE FROM Table_Name WHERE condition;
Example 1: This example describes how to delete a single record from the table.
Let's take a Product table consisting of the following records:
Product_Id Product_Name Product_Price Product_Quantity
P101 Chips 20 20
P102 Chocolates 60 40
P103 Maggi 75 5
P201 Biscuits 80 20
P203 Namkeen 40 50

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


36

Suppose you want to delete that product from the Product table whose Product_Id is P203.
To do this, you must write the following DML DELETE command:
DELETE FROM Product WHERE Product_Id = 'P203’;
Example 2: This example describes how to delete the multiple records or rows from the
database table.
Let's take a Student table consisting of the following records:
Stu_Id Stu_Name Stu_Marks Stu_Age
101 Ramesh 92 20
201 Jatin 83 19
202 Anuj 85 19
203 Monty 95 21
102 Saket 65 21
103 Sumit 78 19
104 Ashish 98 20
Suppose you want to delete the record of those students whose Marks is greater than 70.
To do this, you must write the following DML Update command:
DELETE FROM Student WHERE Stu_Marks > 70 ;

SQL Aggregate Functions


SQL aggregation function is used to perform the calculations on multiple rows of a single
column of a table. It returns a single value. It is also used to summarize the data.
Types of SQL Aggregation Function
1. COUNT Function
COUNT function is used to Count the number of rows in a database table. It can work on
both numeric and non-numeric data types.
COUNT function uses the COUNT(*) that returns the count of all the rows in a specified
table. COUNT(*) considers duplicate and Null.
Syntax
COUNT(*)
or
COUNT( [ALL|DISTINCT] expression )
Sample table:

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


37

PRODUCT_MAST
PRODUCT COMPANY QTY RATE COST
Item1 Com1 2 10 20
Item2 Com2 3 25 75
Item3 Com1 2 30 60
Item4 Com3 5 10 50
Item5 Com2 2 20 40
Item6 Cpm1 3 25 75
Item7 Com1 5 30 150
Item8 Com1 3 10 30
Item9 Com2 2 25 50
Item10 Com3 4 30 120
Example: COUNT()
SELECT COUNT(*)
FROM PRODUCT_MAST;
Output:
10
Example: COUNT with WHERE
SELECT COUNT(*)
FROM PRODUCT_MAST;
WHERE RATE>=20;
Output:
7
Example: COUNT() with DISTINCT
SELECT COUNT(DISTINCT COMPANY)
FROM PRODUCT_MAST;
Output:
3
Example: COUNT() with GROUP BY
SELECT COMPANY, COUNT(*)
FROM PRODUCT_MAST
GROUP BY COMPANY;
Output:
Com1 5
Com2 3
Com3 2
Example: COUNT() with HAVING
SELECT COMPANY, COUNT(*)

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


38

FROM PRODUCT_MAST
GROUP BY COMPANY
HAVING COUNT(*)>2;
Output:
Com1 5
Com2 3
2. SUM Function
Sum function is used to calculate the sum of all selected columns. It works on numeric
fields only.
Syntax
SUM()
or
SUM( [ALL|DISTINCT] expression )
Example: SUM()
SELECT SUM(COST)
FROM PRODUCT_MAST;
Output:
670
Example: SUM() with WHERE
SELECT SUM(COST)
FROM PRODUCT_MAST
WHERE QTY>3;
Output:
320
Example: SUM() with GROUP BY
SELECT SUM(COST)
FROM PRODUCT_MAST
WHERE QTY>3
GROUP BY COMPANY;
Output:
Com1 150
Com2 170
Example: SUM() with HAVING
SELECT COMPANY, SUM(COST)
FROM PRODUCT_MAST
GROUP BY COMPANY
HAVING SUM(COST)>=170;
Output:

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


39

Com1 335
Com3 170
3. AVG function
The AVG function is used to calculate the average value of the numeric type. AVG function
returns the average of all non-Null values.
Syntax
AVG()
or
AVG( [ALL|DISTINCT] expression )
Example:
SELECT AVG(COST)
FROM PRODUCT_MAST;
Output:
67.00
4. MAX Function
MAX function is used to find the maximum value of a certain column. This function
determines the largest value of all selected values of a column.
Syntax
MAX()
or
MAX( [ALL|DISTINCT] expression )
Example:
SELECT MAX(RATE)
FROM PRODUCT_MAST;
Output:
30
5. MIN Function
MIN function is used to find the minimum value of a certain column. This function
determines the smallest value of all selected values of a column.
Syntax
MIN()
or
MIN( [ALL|DISTINCT] expression )

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


40

Example:
SELECT MIN(RATE)
FROM PRODUCT_MAST;
Output:
10

SQL Joins
The SQL JOIN statement is used to combine rows from two tables based on a common
column and selects records that have matching values in these columns.
Types of SQL JOINs
In SQL, we have four main types of joins:

• INNER JOIN
• LEFT JOIN
• RIGHT JOIN
• FULL OUTER JOIN
SQL INNER JOIN
The SQL INNER JOIN statement joins two tables based on a common column and selects
rows that have matching values in these columns.
Syntax
SELECT columns_from_both_tables
FROM table1
INNER JOIN table2
ON table1.column1 = table2.column2
Here,
• table1 and table2 - two tables that are to be joined
• column1 and column2 - columns common to in table1 and table2
Example
-- join the Customers and Orders tables
-- with customer_id and customer fields

SELECT Customers.customer_id, Customers.first_name,


Orders.amount
FROM Customers
INNER JOIN Orders
ON Customers.customer_id = Orders.customer;
Here, the SQL command selects the specified rows from both tables if the values of
customer_id (of the Customers table) and customer (of the Orders table) are a match.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


41

As you can see, INNER JOIN excludes all the rows that are not common between two
tables.
Here's an example of INNER JOIN with the WHERE clause:
-- join Customers and Orders table
-- with customer_id and customer fields
SELECT Customers.customer_id, Customers.first_name,
Orders.amount
FROM Customers
INNER JOIN Orders
ON Customers.customer_id = Orders.customer
WHERE Orders.amount >= 500;
Here, the SQL command joins two tables and selects rows where the amount is greater
than or equal to 500.
SQL LEFT JOIN
The SQL LEFT JOIN combines two tables based on a common column. It then selects
records having matching values in these columns and the remaining rows from the left
table.
Syntax
SELECT columns_from_both_tables
FROM table1
LEFT JOIN table2

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


42

ON table1.column1 = table2.column2
Here,

• table1 is the left table to be joined


• table2 is the right table to be joined
• column1 and column2 are the common columns in the two tables
Example:
-- left join the Customers and Orders tables
SELECT Customers.customer_id, Customers.first_name,
Orders.amount
FROM Customers
LEFT JOIN Orders
ON Customers.customer_id = Orders.customer;

Here, the SQL command combines data from the Customers and Orders tables.
The query selects the customer_id and first_name from Customers and the amount from
Orders.
Hence, the result includes rows where customer_id from Customers matches customer
from Orders.
We can use the LEFT JOIN statement with an optional WHERE clause. For example,

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


43

SELECT Customers.customer_id, Customers.first_name,


Orders.amount
FROM Customers
LEFT JOIN Orders
ON Customers.customer_id = Orders.customer
WHERE Orders.amount >= 500;
Here, the SQL command joins the Customers and Orders tables and selects rows where
the amount is greater than or equal to 500.
SQL RIGHT JOIN
The SQL RIGHT JOIN statement joins two tables based on a common column. It selects
records that have matching values in these columns and the remaining rows from the right
table.
Syntax
SELECT columns_from_both_tables
FROM table1
RIGHT JOIN table2
ON table1.column1 = table2.column2
Here,

• table1 is the left table to be joined


• table2 is the right table to be joined
• column1 and column2 are the related columns in the two tables
Example:
-- join Customers and Orders tables
-- based on customer_id of Customers and customer of Orders
-- Customers is the left table
-- Orders is the right table
SELECT Customers.customer_id, Customers.first_name,
Orders.amount
FROM Customers
RIGHT JOIN Orders
ON Customers.customer_id = Orders.customer;

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


44

Here, the SQL command selects the customer_id and first_name columns (from the
Customers table) and the amount column (from the Orders table).
And the result set will contain those rows where there is a match between customer_id (of
the Customers table) and customer (of the Orders table), along with all the remaining rows
from the Orders table.
The SQL RIGHT JOIN statement can have an optional WHERE clause. For example,
SELECT Customers.customer_id, Customers.first_name,
Orders.amount
FROM Customers
RIGHT JOIN Orders
ON Customers.customer_id = Orders.customer
WHERE Orders.amount >= 500;
Here, the SQL command joins the Customers and Orders tables and selects rows where
the amount is greater than or equal to 500.
SQL FULL OUTER JOIN

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


45

The SQL FULL OUTER JOIN statement joins two tables based on a common column. It
selects records that have matching values in these columns and the remaining rows from
both tables.
Syntax
SELECT columns
FROM table1
FULL OUTER JOIN table2
ON table1.column1 = table2.column2;
Here,

• table1 and table2 are the tables to be joined


• column1 and column2 are the related columns in the two tables
Example: SQL OUTER Join
SELECT Customers.customer_id, Customers.first_name,
Orders.amount
FROM Customers
FULL OUTER JOIN Orders
ON Customers.customer_id = Orders.customer;
Here, the SQL command selects the customer_id and first_name columns (from the
Customers table) and the amount column (from the Orders table).
The result set will contain all rows of both the tables, regardless of whether there is a match
between customer_id (of the Customers table) and customer (of the Orders table).

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


46

The SQL FULL OUTER JOIN statement can have an optional WHERE clause. For example,
SELECT Customers.customer_id, Customers.first_name,
Orders.amount
FROM Customers
FULL OUTER JOIN Orders
ON Customers.customer_id = Orders.customer
WHERE Orders.amount >= 500;
Here, the SQL command joins two tables and selects rows where the amount is greater
than or equal to 500.

SQL Views
In SQL, views contain rows and columns similar to a table, however, views don't hold the
actual data.
You can think of a view as a virtual table environment that's created from one or more
tables so that it's easier to work with data.
Creating a View in SQL
We can create views in SQL by using the CREATE VIEW command. For example,
CREATE VIEW us_customers AS
SELECT customer_id, first_name

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


47

FROM Customers
WHERE Country = 'USA';
Here, a view named us_customers is created from the customers table.
Now to select the customers who lives in USA, we can simply run,
SELECT *
FROM us_customers;
Updating a View
It's possible to change or update an existing view using the CREATE OR REPLACE VIEW
command. For example,
CREATE OR REPLACE VIEW us_customers AS
SELECT *
FROM Customers
WHERE Country = 'USA';
Here, the us_customers view is updated to show all the fields.
Deleting a View
We can delete views using the DROP VIEW command. For example,
DROP VIEW us_customers;
Here, the SQL command deletes the view named us_customers.

SQL UPDATE
In SQL, the UPDATE statement is used to modify existing records in a database table.
Example
--update a single value in the given row
UPDATE Customers
SET age = 21
WHERE customer_id = 1;
Here, the SQL command updates the age column to 21 where the customer_id equals 1.
SQL UPDATE TABLE Syntax
UPDATE table_name
SET column1 = value1, column2 = value2, ...
[WHERE condition];
Here,

• table_name is the name of the table to be modified


• column1, column2, ... are the names of the columns to be modified
• value1, value2, ... are the values to be set to the respective columns
• [WHERE condition] is an optional clause specifying which rows should be updated

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


48

Update a Single Value in a Row


In SQL, we can update a single value by using the UPDATE command with a WHERE clause.
For example,
-- update a single value in the given row
UPDATE Customers
SET first_name = 'Johnny'
WHERE customer_id = 1;
Here, the SQL command changes the value of the first_name column to Johnny if
customer_id is equal to 1.

Update Multiple Values in a Row


We can also update multiple values in a single row at once. For example,
-- update multiple values in the given row
UPDATE Customers
SET first_name = 'Johnny', last_name = 'Depp'
WHERE customer_id = 1;
Here, the SQL command changes the value of the first_name column to Johnny and
last_name to Depp if customer_id is equal to 1.
Update Multiple Rows

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


49

We use the UPDATE statement to update multiple rows at once. For example,
-- update multiple rows satisfying the condition
UPDATE Customers
SET country = 'NP'
WHERE age = 22;
Here, the SQL command changes the value of the country column to NP if age is 22.
If there is more than one row where age equals to 22, all the matching rows will be
modified.
Update all Rows
We can update all the rows in a table at once by omitting the WHERE clause. For example,
-- update all rows
UPDATE Customers
SET country = 'NP';

SQL LIKE and NOT LIKE Operators


We use the SQL LIKE operator with the WHERE clause to get a result set that matches the
given string pattern.
Example
-- select customers who live in the UK
SELECT first_name
FROM Customers
WHERE country LIKE 'UK';
Here, the SQL command selects the first name of customers whose country is UK.
SQL LIKE Syntax
SELECT column1, column2, ...
FROM table
WHERE column LIKE value;
Here,

• column1,column2, ... are the columns to select the data from


• table is the name of the table
• column is the column we want to apply the filter to
• LIKE matches the column with value
• value is the pattern you want to match in the specified column
Example: SQL LIKE
-- select customers who live in the UK
SELECT *

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


50

FROM Customers
WHERE country LIKE 'UK';
Here, the SQL command selects customers whose country is UK.

SQL LIKE With %Wildcards


The SQL LIKE query is often used with the % wildcard to match a pattern of a string. For
example,
-- select customers whose
-- last name starts with R
SELECT *
FROM Customers
WHERE last_name LIKE 'R%';
Here, % is a wildcard character. Hence, the SQL command selects customers who’s
last_name starts with R followed by zero or more characters after it.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


51

SQL LIKE With _Wildcards


There are more wildcard characters we can use with LIKE.
Let's look at an example using the _ wildcard character.
-- select customers whose
-- country names start with U
-- followed by a single character
SELECT *
FROM Customers
WHERE country LIKE 'U_';
Here, the SQL command selects customers whose country name starts with U followed by
exactly one character.
SQL NOT LIKE Operator
We can also invert the working of the LIKE operator by using the NOT operator with it. This
returns a result set that doesn't match the given string pattern.
Syntax
SELECT column1, column2, ...
FROM table_name
WHERE column NOT LIKE value;

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


52

Here,

• column1,column2, ... are the columns to select the data from


• table_name is the name of the table
• column is the column we want to apply the filter to
• NOT LIKE ignores the match of the column with the value
• value is the pattern you don't want to match in the specified column
For example,
-- select customers who don't live in the USA
SELECT *
FROM Customers
WHERE country NOT LIKE 'USA';
Here, the SQL command selects all customers except those whose country is USA.
SQL LIKE With Multiple Values
We can use the LIKE operator with multiple string patterns using the OR operator. For
example,
-- select customers whose last_name starts with R and ends with
t
-- or customers whose last_name ends with e
SELECT *
FROM Customers
WHERE last_name LIKE 'R%t' OR last_name LIKE '%e';
Here, the SQL command selects customers whose last_name starts with R and ends with t
or customers whose last_name ends with e.
SQL LOGICAL OPERATORS

Operator Meaning
ALL TRUE if all of a set of comparisons are TRUE.
AND TRUE if both Boolean expressions are TRUE.
ANY TRUE if any one of a set of comparisons are TRUE.
BETWEEN TRUE if the operand is within a range.
EXISTS TRUE if a subquery contains any rows.
IN TRUE if the operand is equal to one of a list of expressions.
LIKE TRUE if the operand matches a pattern.
NOT Reverses the value of any other Boolean operator.
OR TRUE if either Boolean expression is TRUE.
SOME TRUE if some of sets of comparisons are TRUE.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


53

Relational Algebra
Relational algebra refers to a procedural query language that takes relation instances as
input and returns relation instances as output. It performs queries with the help of
operators. A binary or unary operator can be used. They take in relations as input and
produce relations as output. Recursive relational algebra is applied to a relationship, and
intermediate outcomes are also considered relations.
Relational Algebra Operations
The following are the fundamental operations present in a relational algebra:

• Select Operation
• Project Operation
• Union Operation
• Set Different Operation
• Cartesian Product Operation
• Rename Operation
Select Operation (or σ)
It selects tuples from a relation that satisfy the provided predicate.
The notation is: σp(r)

Here σ stands for the selection predicate while r stands for the relation, p refers to the
prepositional logic formula that may use connectors such as or, and, and not. Also, these
terms may make use of relational operators such as =, ≠, ≥, <, >, ≤.
Example
σsubject = “information”(Novels)
The output would be − Selecting tuples from the novels wherever the subject happens to be
‘information’.
σsubject = “information” and cost = “150”(Novels)
The output would be − Selecting tuples from the novels wherever the subject happens to be
‘information’ and the ‘price’ is 150.
σsubject = “information” and cost = “150” or year > “2015”(Novels)
The output would be − Selecting tuples from the novels wherever the subject happens to be
‘information’ and the ‘price’ is 150 or those novels have been published after 2015.
Project Operation (or ∏)
It projects those column(s) that satisfy any given predicate.
Here B1, B2 , Bn refer to the attribute names of the relation r.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


54

The notation is : ∏B1, B2, Bn (r)

Remember that duplicate rows are eliminated automatically, since relation is a set.
Example
∏subject, writer (Novels)
The output would be − Selecting and projecting columns named as writer as well as the
subject from the relation Novels.

Union Operation (or ∪)


It would perform binary union between two relations.
The notation is : r U s
It is defined as follows:
r ∪ s = { t | t ∈ r or t ∈ s}
Here r and s either refer to DB relations or the relation result set (or temporary relation).
The given conditions must hold if we want any union operation to be valid:

• s, and r must contain a similar number of attributes.


• The domains of an attribute must be compatible.
• The duplicate tuples are eliminated automatically.
∏ writer (Novels) ∪ ∏ writer (Articles)
The output would be − Projecting the names of those writers who might have written either
an article or a novel or both.
Set Different Operation (or −)
Tuples refers to the result of the set difference operation. These are present in just one of
the relations but not at all in the second one.
The notation is: r − s

Finding all the tuples present in r and not present in s.


∏ writer (Novels) − ∏ writer (Articles)

The output would be − Providing the writer names who might have written novels but have
not written articles.
Cartesian Product Operation (or Χ)
It helps in combining data and info of two differing relations into one.
The notation is: r Χ s

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


55

Where s and r refer to the relations. Their outputs would be defined as the follows:
s Χ r = { t ∈ s and q t | q ∈ r}
σwriter = ‘mahesh'(Novels Χ Articles)
The output would be − Yielding a relation that shows all the articles and novels written by
mahesh.
Rename Operation (or ρ)
Relations are the results of the relational algebra, but without any name. Thus, the rename
operation would allow us to rename the relation output. The ‘rename’ operation is basically
denoted by the small Greek letter ρ or rho.
The notation is: ρx(E)

Here the result of the E expression is saved with the name of x.


Query to rename the attributes Name, Age of table Department to A,B.
ρ(A, B)(Department)
Query to rename the table name Project to Pro and its attributes to P, Q, R.
ρ Pro(P, Q, R) (Project)

Join Operations:
A Join operation combines related tuples from different relations, if and only if a given join
condition is satisfied. It is denoted by ⋈.
Types of Join operations:

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


56

EMPLOYEE
EMP_CODE EMP_NAME
101 Stephan
102 Jack
103 Harry
SALARY

EMP_CODE SALARY
101 50000
102 30000
103 25000

1. Natural Join:
A natural join is the set of tuples of all combinations in R and S that are equal on their
common attribute names. It is denoted by ⋈.

Example: Let's use the above EMPLOYEE table and SALARY table:

∏EMP_NAME, SALARY (EMPLOYEE ⋈ SALARY)


Output:

EMP_NAME SALARY
Stephan 50000
Jack 30000
Harry 25000
2. Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with missing
information. Example:

EMPLOYEE
EMP_NAME STREET CITY
Ram Civil line Mumbai
Shyam Park street Kolkata
Ravi M.G. Street Delhi
Hari Nehru nagar Hyderabad
FACT_WORKERS

EMP_NAME BRANCH SALARY


Ram Infosys 10000
Shyam Wipro 20000
Kuber HCL 30000
Hari TCS 50000

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


57

(EMPLOYEE ⋈ FACT_WORKERS)
Output:
EMP_NAME STREET CITY BRANCH SALARY
Ram Civil line Mumbai Infosys 10000
Shyam Park street Kolkata Wipro 20000
Hari Nehru nagar Hyderabad TCS 50000
An outer join is basically of three types:

• Left outer join


• Right outer join
• Full outer join

a. Left outer join:


Left outer join contains the set of tuples of all combinations in R and S that are equal on
their common attribute names. In the left outer join, tuples in R have no matching tuples in
S. It is denoted by ⟕.
Example: Using the above EMPLOYEE table and FACT_WORKERS table
EMPLOYEE ⟕ FACT_WORKERS
EMP_NAME STREET CITY BRANCH SALARY
Ram Civil line Mumbai Infosys 10000
Shyam Park street Kolkata Wipro 20000
Hari Nehru street Hyderabad TCS 50000
Ravi M.G. Street Delhi NULL NULL
b. Right outer join:
Right outer join contains the set of tuples of all combinations in R and S that are equal
on their common attribute names. In right outer join, tuples in S have no matching
tuples in R. It is denoted by ⟖.

Example: Using the above EMPLOYEE table and FACT_WORKERS Relation


EMPLOYEE ⟖ FACT_WORKERS
Output:

EMP_NAME BRANCH SALARY STREET CITY


Ram Infosys 10000 Civil line Mumbai
Shyam Wipro 20000 Park street Kolkata
Hari TCS 50000 Nehru street Hyderabad

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


58

Kuber HCL 30000 NULL NULL


c. Full outer join:
Full outer join is like a left or right join except that it contains all rows from both tables.

In full outer join, tuples in R that have no matching tuples in S and tuples in S that have
no matching tuples in R in their common attribute name. It is denoted by ⟗.

Example: Using the above EMPLOYEE table and FACT_WORKERS table


EMPLOYEE ⟗ FACT_WORKERS

Output:

EMP_NAME STREET CITY BRANCH SALARY


Ram Civil line Mumbai Infosys 10000
Shyam Park street Kolkata Wipro 20000
Hari Nehru street Hyderabad TCS 50000
Ravi M.G. Street Delhi NULL NULL
Kuber NULL NULL HCL 30000
3. Equi join:
It is also known as an inner join. It is the most common join. It is based on matched data as
per the equality condition. The equi-join uses the comparison operator(=). Example:
CUSTOMER
CLASS_ID NAME
1 John
2 Harry
3 Jackson
PRODUCT

PRODUCT_ID CITY
1 Delhi
2 Mumbai
3 Noida

CUSTOMER ⋈ PRODUCT
Output:
CLASS_ID NAME PRODUCT_ID CITY
1 John 1 Delhi
2 Harry 2 Mumbai
3 Harry 3 Noida

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


59

Unit 5. Query Processing [11 Marks]


5.1. Introduction to Query Processing
Query Processing is the activity performed in extracting data from the database. In query
processing, it takes various steps for fetching the data from the database. The steps
involved are:

• Parsing and translation


• Optimization
• Evaluation
1. Parsing and Translation
As query processing includes certain activities for data retrieval. Initially, the given user
queries get translated in high-level database languages such as SQL. It gets translated into
expressions that can be further used at the physical level of the file system. After this, the
actual evaluation of the queries and a variety of query -optimizing transformations and
takes place. Thus, before processing a query, a computer system needs to translate the
query into a human-readable and understandable language. Consequently, SQL or
Structured Query Language is the best suitable choice for humans. But it is not perfectly
suitable for the internal representation of the query to the system. Relational algebra is well
suited for the internal representation of a query. The translation process in query
processing is like the parser of a query. When a user executes any query, for generating the
internal form of the query, the parser in the system checks the syntax of the query, verifies
the name of the relation in the database, the tuple, and finally the required attribute value.
The parser creates a tree of the query, known as 'parse-tree.' Further, translate it into the
form of relational algebra. With this, it evenly replaces all the use of the views when used in
the query.
Thus, we can understand the working of a query processing in the below-described
diagram.
Suppose a user executes a query. As we have learned that there are various methods of
extracting the data from the database. In SQL, a user wants to fetch the records of the
employees whose salary is greater than or equal to 10000. For doing this, the following
query is undertaken:
select salary from Employee where salary>10000;
Thus, to make the system understand the user query, it needs to be translated in the form
of relational algebra. We can bring this query in the relational algebra form as:
σsalary>10000 (πsalary (Employee))
πsalary (σsalary>10000 (Employee))
After translating the given query, we can execute each relational algebra operation by using
different algorithms. So, in this way, a query processing begins its working.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


60

2. Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the
translated relational algebra expression with the instructions used for specifying and
evaluating each operation. Thus, after translating the user query, the system executes a
query evaluation plan.
Query Evaluation Plan

• To fully evaluate a query, the system needs to construct a query evaluation plan.
• The annotations in the evaluation plan may refer to the algorithms to be used for the
index or the specific operations.
• Such relational algebra with annotations is referred to as Evaluation Primitives. The
evaluation primitives carry the instructions needed for the evaluation of the
operation.
• Thus, a query evaluation plan defines a sequence of primitive operations used for
evaluating a query. The query evaluation plan is also referred to as the query
execution plan.
• A query execution engine is responsible for generating the output of the given query.
It takes the query execution plan, executes it, and finally makes the output for the
user query.
3. Optimization
• The cost of the query evaluation can vary for different types of queries. Although the
system is responsible for constructing the evaluation plan, the user does need not
to write their query efficiently.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


61

• Usually, a database system generates an efficient query evaluation plan, which


minimizes its cost. This type of task is performed by the database system and is
known as Query Optimization.
• For optimizing a query, the query optimizer should have an estimated cost analysis
of each operation. It is because the overall operation cost depends on the memory
allocations to several operations, execution costs, and so on.
Finally, after selecting an evaluation plan, the system evaluates the query and produces
the output of the query.

5.2. Query Cost Estimation


The measures of query cost in DBMS can be done by creating a framework that can make
numerous designs for an inquiry. It tends to be finished by means of contrasting every
conceivable arrangement as far as their assessed cost. For working out the net assessed
cost of any arrangement, the expense of every activity inside an arrangement ought to be
set in a deterministic and consolidated cost to get the net assessed cost of the query
assessment plan.
Example: We utilize the number of square exchanges that is basically the block from the
disk and the quantity of the disk seeks to appraise the expense of a query assessment plan.
If the disk subsystem takes a normal of tT seconds to move a square of information and
has a normal block access time (disk lookup time in addition to rotational idleness) of tS
seconds, then, at that point, an activity that moves b obstructs and performs S looks for
would take b ∗ tT + S ∗ tS seconds. The upsides of tT and tS should be aligned for the
disk framework utilization, however, normal qualities for top-end disk today would be tS = 4
milliseconds and tT = 0.1 milliseconds, expecting a 4-kilobyte block size and an exchange
pace of 40 megabytes each second.
tT – time to transfer one block
tS – time for one to seek
Cost for b block transfers plus S seeks
b * tT + S * tS
The expense assessment of a query assessment plan is determined by keeping in mind the
different assets that follow as:

• The number of disk accesses.


• Time of Execution taken by the CPU to execute a query.
• The involved Communication costs in either distributed or parallel database
systems.

5.3. Query Operations


Selection

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


62

• A1 (linear search): Search each file block and test all records to determine
satisfaction.
• A2 (Binary Search): Selection is equality on attribute on which file is ordered.
• A3 (Primary index on candidate key, equality)
• A4 (primary index on non-key, equality.)
• A5 (secondary index on search key, equality.)
• A6 (primary index, comparison)
• A7 (secondary index, comparison)
Sorting:
- Quick sort (records completely in main memory)
- External sort (records are on disk)
Join Operation
1. Nested loops join
R∞S
For each tuple t in r do begin
For each tuple ts in s do begin
Test pair (t, ts) to see if condition satisfy
If they do add t and ts to the result.
End
End
2. Merge loops join

• Sort both relations on their join attributes


• Used for natural joins
3. Hash join

• Used for natural join


• A hash function h is used to partition tuples of both relations

5.4. Evaluation of Expressions


Materialization:

• Executes a single operation at a time which generates a temporary file that will be
used as input for the next operation.
• It is easy to implement but time consuming.
• It walks the parse tree of relational algebra and performs innermost operations first.
• The result has materialized and becomes input for next operations.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


63

• The cost is the sum of individual operations plus the cost of writing intermediate
results to disks.
• It can always be applied.
Pipelining

• With pipeline, operations are arranged and form a queue and results are passed
from one operation to another as they are calculated
• Avoids intermediate temporary relations.
• Cheaper as no cost of writing results to disk
• It is not always possible.
• In demand driven system requests next tuple from top level operation, each
operation requests next tuple from children operation.
• In producer driven, operators produce tuples and pass to parents.

5.5. Query Optimization


Query optimization is the process of selecting the most efficient execution plan for a given
query. The query optimizer analyzes various potential execution plans and chooses the one
with the lowest estimated cost. It considers factors such as available indexes, statistics
about the data, and the complexity of the query to determine the optimal plan.
The goal of query optimization is to reduce the system resources required to fulfill a query,
and ultimately provide the user with the correct result set faster.
First, it provides the user with faster results, which makes the application seem faster to
the user.
Secondly, it allows the system to service more queries in the same amount of time,
because each request takes less time than unoptimized queries.
Thirdly, query optimization ultimately reduces the amount of wear on the hardware (e.g.
disk drives) and allows the server to run more efficiently (e.g. lower power consumption,
less memory usage).
There are broadly two ways a query can be optimized:

• Analyze and transform equivalent relational expressions: Try to minimize the tuple
and column counts of the intermediate and final query processes (discussed here).
• Using different algorithms for each operation: These underlying algorithms
determine how tuples are accessed from the data structures they are stored in,
indexing, hashing, data retrieval and hence influence the number of disk and block
accesses (discussed in query processing).

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


64

Equivalence Rules
1. Conjunctive selection operations can be deconstructed into a sequence of individual
selections.

σθ1^θ2(E) = σθ1(σθ2(E))
2. Selection operations are commutative.
σθ1(σθ2(E)) = σθ2(σθ1(E))
3. Only the last in a sequence of projection operations is needed, the others can be
omitted.

Πt1(Πt2(....(Πtn(E)))) = Πt1(E)
4. Selections can be combined with Cartesian products and theta joins.
a. σθ(E1 X E2) = E1 ⋈θ E2

b. σθ1(E1 ⋈θ2 E2) = E1 ⋈θ1∧ θ2 E2

5. Theta-join operations (and natural joins) are commutative.


E1 ⋈θ E2 = E2 ⋈θ +E1
6. (a) Natural join operations are associative:

(E1 ⋈ E2) ⋈ E3 = E1 ⋈ (E2 ⋈ E3)


(b) Theta joins are associative in the following manner:
(E1 ⋈θ1 E2) ⋈θ2∧θ3 E3 = E1 ⋈θ1∧ θ3 (E2 ⋈θ2 E3)
where θ2 involves attributes from only E2 and E3.
8. The projections operation distributes over the theta join operation
as follows:

(a) if Π involves only attributes from L1 ∪ L2:

ΠL1∪L2 (E1 ⋈θ E2) = (ΠL1(E1)) ⋈θ (ΠL2(E2))


(b) Consider a join E1 ⋈θ E2.
- Let L1 and L2 be sets of attributes from E1 and E2, respectively.

- Let L3 be attributes of E1 that are involved in join condition θ, but are not in L1 ∪ L2, and

- let L4 be attributes of E2 that are involved in join condition θ, but are not in L1 ∪ L2.

ΠL1∪L2 (E1 ⋈θ E2) = ΠL1∪L2((ΠL1∪L3(E1)) ⋈θ (ΠL2∪L4(E2)))


9. The set operations union and intersection are commutative
E1 ∪ E2 = E2 ∪ E1

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


65

E1 ∩ E2 = E2 ∩ E1
! (set difference is not commutative).
10. Set union and intersection are associative.

(E1 ∪ E2) ∪ E3 = E1 ∪ (E2 ∪ E3)


(E1 ∩ E2) ∩ E3 = E1 ∩ (E2 ∩ E3)

11. The selection operation distributes over ∪, ∩ and –.

σθ (E1 – E2) = σθ (E1) – σθ(E2)


and similarly, for ∪ and ∩ in place of –

Also: σθ (E1 – E2) = σθ(E1) – E2

and similarly, for ∩ in place of –, but not for ∪


12. The projection operation distributes over union

ΠL(E1 ∪ E2) = (ΠL(E1)) ∪ (ΠL(E2))

Choice of Evaluation plan


1) Cost based optimization
a. It explores the space of all query evaluation plans that are equivalent to given query and
chooses one with least estimated cost.
b. Exploring space of all possible plans may be expensive.
c. It guarantees finding optimal plans.
2) Heuristic optimization
a. System uses heuristics to reduce number of choices to evaluate,
b. A query is transformed by using a set of rules as:
i. Perform selection early
ii. Perform projection early
iii. Perform most restrictive selection and join operations before other similar
operations.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


66

6. Performance Tuning
SQL Performance tuning is the process of enhancing SQL queries to speed up the server
performance. Performance tuning in SQL shortens the time it takes for a user to receive a
response after sending a query and utilizes fewer resources in the process. The idea is that
users can occasionally produce the same intended result set with a faster-running query.
Factors Affecting SQL Speed
Some of the major factors that influence the computation and execution time in SQL are:
Table size: Performance may be impacted if your query hits one or more tables with
millions of rows or more.
Joins: Your query is likely to be slow if it joins two tables in a way that significantly raises
the number of rows in the return set.
Aggregations: Adding several rows together to create a single result needs more
processing than just retrieving those values individually.
The following techniques can be used to optimize SQL queries:

• SELECT fields instead of using SELECT *


• Avoid SELECT DISTINCT
• Create queries with INNER JOIN (not WHERE or cross join)
• Use WHERE instead of HAVING to define filters
• Use wildcards at the end of a phrase only
• Use LIMIT to sample query results
• Run your query during off-peak hours
SQL Performance Tuning Tools
Here are some tools for SQL Performance Tuning:

• SQL Sentry (SolarWinds)


• SQL Doctor (IDERA)
• Profiler (Microsoft)
• Foglight for SQL Server (Quest Software)
• SQL Index Manager (Red Gate Software)
• Qure Optimizer (DBSophic)
• Database Performance Analyzer (SolarWinds)
• Applications Manager (ManageEngine)
• NaviCat for SQL Server (NaviCat)

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


67

Unit 6. Transaction and Concurrency Control [11 Marks]


6.1. Introduction to Transaction
Transactions refer to a set of operations that are used for performing a set of logical work.
Usually, a transaction means the data present in the DB has changed. Protecting the user
data from system failures is one of the primary uses of DBMS.
let us take the example of a certain simple transaction. Suppose any worker transfers Rs
1000 from X’s account to Y’s account. This given small and simple transaction involves
various low-level tasks.
X’s Account
Open_Account(X)
Old_Bank_Balance = X.balance
New_Bank_Balance = Old_Bank_Balance – 1000
A.balance = New_Bank_Balance
Close_Bank_Account(X)
Y’s Account
Open_Account(Y)
Old_Bank_Balance = Y.balance
New_Bank_Balance = Old_Bank_Balance + 1000
Y.balance = New_Bank_Balance
Close_Bank_Account(Y)

ACID Properties of Transaction


The transaction refers to a small unit of any given program that consists of various low-level
tasks. Every transaction in DBMS must maintain ACID – A (Atomicity), C (Consistency), I
(Isolation), D (Durability). One must maintain ACID so as to ensure completeness,
accuracy, and integrity of data.
1. Atomicity
The property of atomicity states that we must treat any given transaction as an atomic unit.
It means that either all or none of its operations need to be executed. One must ensure that
there is no state in the database in which a transaction happens to be left partially
completed. One must either define the states before or after the execution/failure/abortion
of the transaction.
2. Consistency
The property of consistency states that the database must always remain in a consistent
state after any transaction. Thus, a transaction must never have any damaging effect on the
data and information that resides in the database. In case, before the execution of a

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


68

transaction, the database happens to be in a consistent state, then it must remain


consistent even after the transaction gets executed.
3. Durability
The property of durability states that any given database must be durable enough to all its
latest updates, and it must happen even if the system suddenly restarts or fails. The
database would hold the modified data in case a transaction updates and commits some
chunk of information in the database. In case a transaction commits and yet the system
fails before we write the data on the disk, then the information would be updated after the
system springs back into action.
4. Isolation
The property of isolation states that when multiple transactions are being simultaneously
executed and in parallel in a database system, then the carrying out and execution of the
transaction would occur as if it were the only transaction that exists in the system. None of
the transactions would affect any other transaction’s existence.
Operations of Transaction
A user can make different types of requests to access and modify the contents of a
database. So, we have different types of operations relating to a transaction. They are
discussed as follows:
i) Read(X)
A read operation is used to read the value of X from the database and store it in a buffer in
the main memory for further actions such as displaying that value. Such an operation is
performed when a user wishes just to see any content of the database and not make any
changes to it. For example, when a user wants to check his/her account’s balance, a read
operation would be performed on user’s account balance from the database.
ii) Write(X)
A write operation is used to write the value to the database from the buffer in the main
memory. For a write operation to be performed, first a read operation is performed to bring
its value in buffer, and then some changes are made to it, e.g. some set of arithmetic
operations are performed on it according to the user’s request, then to store the modified
value back in the database, a write operation is performed. For example, when a user
requests to withdraw some money from his account, his account balance is fetched from
the database using a read operation, then the amount to be deducted from the account is
subtracted from this value, and then the obtained value is stored back in the database
using a write operation.
iii) Commit

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


69

This operation in transactions is used to maintain integrity in the database. Due to some
failure of power, hardware, or software, etc., a transaction might get interrupted before all
its operations are completed. This may cause ambiguity in the database, i.e. it might get
inconsistent before and after the transaction. To ensure that further operations of any other
transaction are performed only after work of the current transaction is done, a commit
operation is performed to the changes made by a transaction permanently to the database.
iv) Rollback
This operation is performed to bring the database to the last saved state when any
transaction is interrupted in between due to any power, hardware, or software failure. In
simple words, it can be said that a rollback operation does undo the operations of
transactions that were performed before its interruption to achieve a safe state of the
database and avoid any kind of ambiguity or inconsistency.

States of Transactions
In a database, a transaction can be in one of these states given below –
Active − This is the state in which a transaction is being executed. Thus, it is like the initial
state of any given transaction.
Partially Committed − A transaction is in its partially committed state whenever it
executes the final operation.
Failed − In case any check made by a database recovery system fails, then that transaction
is in a failed state. Remember that a failed transaction cannot proceed further.
Aborted − In case any check fails, leading the transaction to a failed state, the recovery
manager then rolls all its write operations back on the database so that it can bring the DB
(database) back to the original state (the state where it was prior to the transaction
execution). The transactions in this state are known to be aborted. A DB recovery module
can select one of these two operations after the abortion of a transaction –

• Re-start
• Kill the transaction
Committed − We can say that a transaction is committed in case it executes all of its
operations successfully. In such a case, all its effects are now established permanently on
the DB system.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


70

6.2. Scheduling and Serializability


Schedule in DBMS
Transactions are a set of instructions that perform operations on databases. When multiple
transactions are running concurrently, then a sequence is needed in which the operations
are to be performed because at a time, only one operation can be performed on the
database. This sequence of operations is known as Schedule, and this process is known as
Scheduling.
When multiple transactions execute simultaneously in an unmanageable manner, then it
might lead to several problems, which are known as concurrency problems. To overcome
these problems, scheduling is required.
Types of Schedules

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


71

There are mainly two types of scheduling - Serial Schedule and Non-serial Schedule.
Serial Schedule
As the name says, all the transactions are executed serially one after the other. In serial
Schedule, a transaction does not start execution until the currently running transaction
finishes execution.
This type of execution of the transaction is
also known as non-interleaved execution.
Serial Schedules are always recoverable,
cascades, strict, and consistent. A serial
schedule always gives the correct result.
Consider two transactions T1 and T2
shown above, which perform some
operations. If it has no interleaving of
operations, then there are the following
two possible outcomes - either execute all
T1 operations, which were followed by all
T2 operations. Or execute all T2
operations, which were followed by all T1
operations. In the above figure, the
Schedule shows the serial Schedule
where T1 is followed by T2, i.e. T1 -> T2.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


72

Non-serial Schedule
In a non-serial Schedule, multiple
transactions execute
concurrently/simultaneously, unlike the
serial Schedule, where one transaction
must wait for another to complete all its
operations. In the Non-Serial Schedule, the
other transaction proceeds without the
completion of the previous transaction. All
the transaction operations are interleaved
or mixed with each other.
Non-serial schedules are NOT always
recoverable, cascades, strict and
consistent.
In this Schedule, there are two transactions,
T1 and T2, executing concurrently. The
operations of T1 and T2 are interleaved. So,
this Schedule is an example of a Non-Serial
Schedule.
Non-serial schedules are further
categorized into serializable and non-
serializable schedules.

Serializability
Serializability of schedules ensures that a non-serial schedule is equivalent to a serial
schedule. It helps in maintaining the transactions to execute simultaneously without
interleaving one another. In simple words, serializability is a way to check if the execution of
two or more transactions are maintaining the database consistency or not.
What is a serializable schedule?
A non-serial schedule is called a serializable schedule if it can be converted to its
equivalent serial schedule. In simple words, if a non-serial schedule and a serial schedule
result in the same then the non-serial schedule is called a serializable schedule.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


73

Testing of Serializability
To test the serializability of a schedule, we can use Serialization Graph or Precedence
Graph. A serialization Graph is nothing but a Directed Graph of the entire transactions of a
schedule.
It can be defined as a Graph G(V, E)
consisting of a set of directed-edges E =
{E1, E2, E3, ..., En} and a set of
vertices V = {V1, V2, V3, ...,Vn}.
The set of edges contains one of the two
operations - READ, WRITE performed by a
certain transaction.
Ti -> Tj, means Transaction-Ti is either
performing read or write before the
transaction-Tj.
If there is a cycle present in the serialized
graph, then the schedule is non-
serializable because the cycle resembles
that one transaction is dependent on the
other transaction and vice versa. It also
means that there are one or more
conflicting pairs of operations in the
transactions. On the other hand, no-
cycle means that the non-serial schedule
is serializable.

Let's take an example of schedule "S"


having three transactions t1, t2, and t3
working simultaneously, to get a better
understanding.
R(x) of T1 conflicts with W(x) of T3, so
there is a directed edge from T1 to T3.
R(y) of T1 conflicts with W(y) of T2, so
there is a directed edge from T1 to T2.
W(y\x) of T3 conflicts with W(x) of T1, so
there is a directed edge from T3 to T.
Similarly, we will make edges for every
conflicting pair. Now, as the cycle is formed, the transactions cannot be serializable.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


74

6.3. Concurrent execution


In a multi-user system, multiple users can access and use the same database at one time,
which is known as the concurrent execution of the database. It means that the same
database is executed simultaneously on a multi-user system by different users.
While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case,
concurrent execution of the database is performed.
The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations, thus
maintaining the consistency of the database. Thus, on making the concurrent execution of
the transaction operations, there occur several challenging problems that need to be
solved.
Concurrency Control Problems
Several problems that arise when numerous transactions execute simultaneously in a
random manner are referred to as Concurrency Control Problems.
Dirty Read Problem
The dirty read problem in DBMS occurs when a transaction reads the data that has been
updated by another transaction that is still uncommitted. It arises due to multiple
uncommitted transactions executing simultaneously.
Example: Consider two transactions A and B performing read/write operations on a data DT
in the database DB. The current value of DT is
Time A B 1000: The following table shows the read/write
T1 READ(DT) ------ operations in A and B transactions.
T2 DT=DT+500 ------
T3 WRITE(DT) ------
T4 ------ READ(DT)
Transaction A reads the value of data DT as
T5 ------ COMMIT
1000 and modifies it to 1500 which gets stored
T6 ROLLBACK ------
in the temporary buffer. The transaction B
reads the data DT as 1500 and commits it and the value of DT permanently gets changed to
1500 in the database DB. Then some server errors occur in transaction A and it wants to get
rollback to its initial value, i.e., 1000 and then the dirty read problem occurs.
Unrepeatable Read Problem
The unrepeatable read problem occurs when two or more different values of the same data
are read during the read operations in the same transaction.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


75

Example: Consider two transactions A and B performing read/write operations on a data DT


in the database DB. The current value of DT is 1000: The following table shows the
read/write operations in A and B transactions.

Time A B
T1 READ(DT) ------
T2 ------ READ(DT)
T3 DT=DT+500 ------
T4 WRITE(DT) ------
T5 ------ READ(DT)
Transaction A and B initially read the value of DT as 1000. Transaction A modifies the value
of DT from 1000 to 1500 and then again transaction B reads the value and finds it to be
1500. Transaction B finds two different values of DT in its two different read operations.

Phantom Read Problem


In the phantom read problem, data is read through two different read operations in the
same transaction. In the first read operation, a value of the data is obtained but in the
second operation, an error is obtained saying the data does not exist.
Example: Consider two transactions A and B performing read/write operations on a data DT
in the database DB. The current value of DT is 1000: The following table shows the
read/write operations in A and B transactions.

Time A B
T1 READ(DT) ------
T2 ------ READ(DT)
T3 DELETE(DT) ------
T4 ------ READ(DT)
Transaction B initially reads the value of DT as 1000. Transaction A deletes the data DT from
the database DB and then again transaction B reads the value and finds an error saying the
data DT does not exist in the database DB.
Lost Update Problem
The Lost Update problem arises when an update in the data is done over another update
but by two different transactions.
Example: Consider two transactions A and B performing read/write operations on a data DT
in the database DB. The current value of DT is 1000: The following table shows the
read/write operations in A and B transactions.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


76

Time A B
T1 READ(DT) ------
T2 DT=DT+500 ------
T3 WRITE(DT) ------
T4 ------ DT=DT+300
T5 ------ WRITE(DT)
T6 READ(DT) ------
Transaction A initially reads the value of DT as 1000. Transaction A modifies the value of DT
from 1000 to 1500 and then again transaction B modifies the value to 1800. Transaction A
again reads DT and finds 1800 in DT and therefore the update done by transaction A has
been lost.
Incorrect Summary Problem
The Incorrect summary problem occurs when there is an incorrect sum of the two data.
This happens when a transaction tries to sum two data using an aggregate function and the
value of any one of the data get changed by another transaction.
Example: Consider two transactions A and B performing read/write operations on two data
DT1 and DT2 in the database DB. The current value of DT1 is 1000 and DT2 is 2000: The
following table shows the read/write operations in A and B transactions.

Time A B
T1 READ(DT1) ------
T2 add=0 ------
T3 add=add+DT1 ------
T4 ------ READ(DT2)
T5 ------ DT2=DT2+500
T6 READ(DT2) ------
T7 add=add+DT2 ------
Transaction A reads the value of DT1 as 1000. It uses an aggregate function SUM which
calculates the sum of two data DT1 and DT2 in variable add but in between the value of DT2
get changed from 2000 to 2500 by transaction B. Variable add uses the modified value of
DT2 and gives the resultant sum as 3500 instead of 3000.

6.4. Lock based Concurrency Control


To attain consistency, isolation between the transactions is the most important tool.
Isolation is achieved if we disable the transaction to perform a read/write operation. This is
known as locking an operation in a transaction. Through lock-based protocols, desired
operations are freely allowed to perform locking the undesired operations.
There are two kinds of locks used in Lock-based protocols:

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


77

Shared Lock(S): The locks which disable the write operations but allow read operations for
any data in a transaction are known as shared locks. They are also known as read-only
locks and are represented by 'S'.
Exclusive Lock(X): The locks which allow both the read and write operations for any data in
a transaction are known as exclusive locks. This is a one-time use mode that can't be
utilized on the exact data item twice. They are represented by 'X'.

6.5. 2PL and Strict 2PL


Two Phase Locking(2PL)
The two-phase locking protocol divides the execution phase of the transaction into three
parts.

• In the first part, when the execution of the transaction starts, it seeks permission for
the lock it requires.
• In the second part, the transaction acquires all the locks. The third phase is started
as soon as the transaction releases its first lock.
• In the third phase, the transaction cannot demand any new locks. It only releases
the acquired locks.

There are two phases of 2PL:


Growing phase: In the growing phase, a new lock on the data item may be acquired by the
transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing locks held by the transaction may be
released, but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can happen:

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


78

• Upgrading of lock (from S(a) to X (a)) is allowed in growing phase.


• Downgrading of lock (from X(a) to S(a)) must be done in shrinking phase.
Example:

The following way shows how unlocking and locking work with 2-PL.
Transaction T1:

• Growing phase: from step 1-3


• Shrinking phase: from step 5-7
• Lock point: at 3
Transaction T2:

• Growing phase: from step 2-6


• Shrinking phase: from step 8-9
• Lock point: at 6

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


79

Strict 2PL

• The first phase of Strict-2PL is like 2PL. In the first phase, after acquiring all the
locks, the transaction continues to execute normally.
• The only difference between 2PL and strict 2PL is that Strict-2PL does not release a
lock after using it.
• Strict-2PL waits until the whole transaction to commit, and then it releases all the
locks at a time.
• Strict-2PL protocol does not have shrinking phase of lock release.

It does not have cascading abort as 2PL does.

6.6. Timestamp concept


According to this protocol, every transaction has a timestamp attached to it. The
timestamp is based on the time in which the transaction is entered into the system. There
is read and write timestamps associated with every transaction which consists of the time
at which the latest read and write operations are performed respectively.
Timestamp Ordering Protocol:
The timestamp ordering protocol uses timestamp values of the transactions to resolve the
conflicting pairs of operations. Thus, ensuring serializability among transactions. Following
are the denotations of the terms used to define the protocol for transaction A on the data
item DT:

Terms Denotations
Timestamp of transaction A TS(A)
Read time-stamp of data-item DT R-timestamp(DT)
Write time-stamp of data-item DT W-timestamp(DT)
following are the rules on which the Time-ordering protocol works:

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


80

When transaction A is going to perform a read operation on data item DT:

• TS(A) < W-timestamp (DT): Transaction will rollback. If the timestamp of transaction
A at which it has entered in the system is less than the write timestamp of DT that is
the latest time at which DT has been updated, then the transaction will roll back.
• TS(A) >= W-timestamp (DT): Transaction will be executed. If the timestamp of
transaction A at which it has entered in the system is greater than or equal to the
write timestamp of DT that is the latest time at which DT has been updated, then the
read operation will be executed.
• All data-item timestamps updated.
When transaction A is going to perform a write operation on data item DT:

• TS(A) < R-timestamp (DT): Transaction will rollback. If the timestamp of transaction
A at which it has entered in the system is less than the read timestamp of DT that is
the latest time at which DT has been read, then the transaction will rollback.
• TS(A) < W-timestamp (DT): Transaction will rollback. If the timestamp of transaction
A at which it has entered in the system is less than the write timestamp of DT that is
the latest time at which DT has been updated, then the transaction will rollback.
• All the operations other than this will be executed.

Unit 7. Recovery [11 Marks]


7.1. Failure Classifications
Failure in terms of a database can be defined as its inability to execute the specified
transaction or loss of data from the database. A DBMS is vulnerable to several kinds of
failures and each of these failures needs to be managed differently. There are many
reasons that can cause database failures such as network failure, system crash, natural
disasters, carelessness, sabotage (corrupting the data intentionally), software errors, etc.
Transaction Failure:
If a transaction is not able to execute or it comes to a point from where the transaction
becomes incapable of executing further, then it is termed as a failure in a transaction.
Reason for a transaction failure in DBMS:
Logical error: A logical error occurs if a transaction is unable to execute because of some
mistakes in the code or due to the presence of some internal faults.
System error: Where the termination of an active transaction is done by the database
system itself due to some system issue or because the database management system is
unable to proceed with the transaction. For example– The system ends an operating
transaction if it reaches a deadlock condition or if there is an unavailability of resources.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


81

System Crash:
A system crash usually occurs when there is some sort of hardware or software
breakdown. Some other problems which are external to the system and cause the system
to abruptly stop or eventually crash include failure of the transaction, operating system
errors, power cuts, main memory crash, etc.
These types of failures are often termed soft failures and are responsible for the data losses
in the volatile memory. It is assumed that a system crash does not have any effect on the
data stored in the non-volatile storage and this is known as the fail-stop assumption.
Data-transfer Failure:
When a disk failure occurs amid data-transfer operation resulting in loss of content from
disk storage then such failures are categorized as data-transfer failures. Some other
reasons for disk failures include disk head crash, disk unreachability, formation of bad
sectors, read-write errors on the disk, etc.
To quickly recover from a disk failure caused amid a data-transfer operation, the backup
copy of the data stored on other tapes or disks can be used. Thus, it’s good practice to
backup your data frequently.

7.2. Recovery and Atomicity


• When a system crashes, it may have several transactions being executed and
various files opened for them to modify the data items.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


82

• But according to ACID properties of DBMS, atomicity of transactions must be


maintained, that is, either all the operations are executed or none.
• Database recovery means recovering the data when it gets deleted, hacked or
damaged accidentally.
• Atomicity is whether is transaction is over or not it should reflect in the database
permanently or it should not affect the database at all.
When a DBMS recovers from a crash, it should maintain the following −

• It should check the state of all the transactions which were being executed.
• A transaction may be in the middle of some operation; the DBMS must ensure the
atomicity of the transaction in this case.
• It should check whether the transaction can be completed now or it needs to be
rolled back.
• No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as
maintaining the atomicity of a transaction −

• Maintaining the logs of each transaction and writing them onto some stable storage
before modifying the database.
• Maintaining shadow paging, where the changes are done on a volatile memory, and
later, the actual database is updated

7.3. Deferred Update and Immediate Update


Deferred Update: It is a technique for the maintenance of the transaction log files of the
DBMS. It is also called NO-UNDO/REDO technique. It is used for the recovery of
transaction failures that occur due to power, memory, or OS failures. Whenever any
transaction is executed, the updates are not made immediately to the database. They are
first recorded on the log file and then those changes are applied once the commit is done.
This is called the “Re-doing” process. Once the rollback is done none of the changes are
applied to the database and the changes in the log file are also discarded. If the commit is
done before crashing the system, then after restarting the system the changes that have
been recorded in the log file are thus applied to the database.
Immediate Update: It is a technique for the maintenance of the transaction log files of the
DBMS. It is also called UNDO/REDO technique. It is used for the recovery of transaction
failures that occur due to power, memory, or OS failures. Whenever any transaction is
executed, the updates are made directly to the database and the log file is also maintained
which contains both old and new values. Once the commit is done, all the changes get
stored permanently in the database, and records in the log file are thus discarded. Once

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


83

rollback is done the old values get restored in the database and all the changes made to
the database are also discarded. This is called the “Un-doing” process. If the commit is
done before crashing the system, then after restarting the system the changes are stored
permanently in the database.

Deferred Update Immediate Update


In a deferred update, the changes are not In an immediate update, the changes
applied immediately to the database. are applied directly to the database.
The log file contains all the changes that The log file contains both old as well as
are to be applied to the database. new values.
In this method once rollback is done all the In this method once rollback is done
records of log file are discarded and no the old values are restored to the
changes are applied to the database. database using the records of the log
file.
Concepts of buffering and caching are Concept of shadow paging is used in
used in deferred update method. immediate update method.
The major disadvantage of this method is The major disadvantage of this method
that it requires a lot of time for recovery in is that there are frequent I/O operations
case of system failure. while the transaction is active.
In this method of recovery, firstly the In this method of recovery, the
changes carried out by a transaction on the database gets directly updated after
data are done in the log file and then the changes made by the transaction
applied to the database on commit. Here, and the log file keeps the old and new
the maintained record gets discarded on values. In the case of rollback, these
rollback and thus, not applied to the records are used to restore old values.
database.

7.4. Log based Recovery


The log is a sequence of log records, recording all the updated activities in the database. In
stable storage, logs for each transaction are maintained. Any operation which is performed
on the database is recorded on the log. Prior to performing any modification to the
database, an updated log record is created to reflect that modification. An update log
record represented as: <Ti, Xj, V1, V2> has these fields:

• Transaction identifier: Unique Identifier of the transaction that performed the write
operation.
• Data item: Unique identifier of the data item written.
• Old value: Value of data item prior to write.
• New value: Value of data item after write operation.
Other types of log records are:

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


84

• <Ti start>: It contains information about when a transaction Ti starts.


• <Ti commit>: It contains information about when a transaction Ti commits.
• <Ti abort>: It contains information about when a transaction Ti aborts.
Undo and Redo Operations
Because all database modifications must be preceded by the creation of a log record, the
system has available both the old value prior to the modification of the data item and new
value that is to be written for data item. This allows system to perform redo and undo
operations as appropriate:
Undo: using a log record sets the data item specified in log record to old value.
Redo: using a log record sets the data item specified in log record to new value.
The database can be modified using two approaches –
Deferred Modification Technique (No Undo/Redo): If the transaction does not modify the
database until it has partially committed, it is said to use deferred modification technique.
Immediate Modification Technique (Undo/Redo): If database modification occurs while
the transaction is still active, it is said to use immediate modification technique.
After a system crash has occurred, the system consults the log to determine which
transactions need to be redone and which need to be undone.

• Transaction Ti needs to be undone if the log contains the record <Ti start> but does
not contain either the record <Ti commit> or the record <Ti abort>.
• Transaction Ti needs to be redone if log contains record <Ti start> and either the
record <Ti commit> or the record <Ti abort>.
Example of Log:
<T1,start>
<T1,A,100,200>
<T1,B,200,400>
<T1,commit>
If log has start and commit, then the log recovery manager Redo the operation when failure
is occurred. However, if the log has only start but not commit then the log recovery manger
Undo the operation which assigns the old value.
Another example,
<T1,start>
<T1,A,100,200>
<T1,B,200,400>
<T1,commit>
<T2,start>

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


85

<T2,C,250,50>
If failure occurred here then the T1 will be Redo but T2 will be Undo since T1 has both start
and commit but T2 has only start but not commit.

7.5. Shadow Paging


• Shadow Paging is recovery technique that is used to recover databases where
database is considered as made up of fixed size of logical units of storage which are
referred as pages.
• Pages are mapped into physical blocks of storage, with the help of the page table
which allows one entry for each logical page of database.
• This method uses two-page tables named current page table and shadow page
table. The entries which are present in the current page table are used to point to the
most recent database pages on disk.
• Another table i.e., Shadow page table is used when the transaction starts which is
copying current page table. After this, shadow page table gets saved on disk and
current page table is going to be used for transaction.
• Entries present in current page table may be changed during execution but in
shadow page table it never gets changed. After the transaction, both tables become
identical. This technique is also known as Cut-of-Place updating.

Considering above figure, two write operations are performed on page 3 and 5. Before start
of write operation on page 3, current page table points to old page 3. When write operation
starts following steps are performed:

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


86

• Firstly, search start for available free block in disk blocks.


• After finding free block, it copies page 3 to free block which is represented by Page 3
(New).
• Now the current page table points to Page 3 (New) on disk but shadow page table
points to old page 3 because it is not modified.
• The changes are now propagated to Page 3 (New) which is pointed by current page
table.
COMMIT Operation
To commit transaction following steps should be done:

• All the modifications which are done by transaction which are present in buffers are
transferred to physical database.
• Output current page table to disk.
• Disk address of current page table output to fixed location which is in stable storage
containing address of shadow page table. This operation overwrites the address of
the old shadow page table. With this current page table becomes same as shadow
page table and transaction is committed.
Failure
If the system crashes during execution of transaction but before commit operation, with
this, it is sufficient only to free modified database pages and discard current page table.
Before execution of transaction, the state of database gets recovered by reinstalling
shadow page table. If the crash of system occurs after last write operation, then it does not
affect propagation of changes that are made by transaction. These changes are preserved
and there is no need to perform redo operation.

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


87

7.6. Local Recovery Manager

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha


88

7.7. UNDO and REDO protocol

DBMS [2022] DCOM, CTEVT Compiled by Er Rupesh Shrestha

You might also like