DBMS Complete Notes
DBMS Complete Notes
History
1950s and early 1960s:
• Data processing using magnetic tapes for storage.
• Tapes provided only sequential access.
• Punched cards for input.
Late 1960s and 1970s:
• Hard disks allowed direct access to data.
• Hierarchical and network data models in widespread use
✓ IBM’s DL/I (Data Language One)
✓ CODAYSL’s DBTG (Data Base Task Group) model
• Ted Codd defines the relational data model.
✓ IBM Research develops System R prototype.
✓ UC Berkeley develops Ingres prototype.
• Entity Relationship Model for database design
1980s:
Later 2000s
What is Database?
A database is an organized collection of data, stored and accessed electronically.
Databases are used to store and manage large amounts of structured and unstructured
data, and they can be used to support a wide range of activities, including data storage,
data analysis, and data management.
There are many different types of databases, including relational databases, object-
oriented databases, and NoSQL databases, and they can be used in a variety of settings,
including business, scientific, and government organizations.
Database Applications
1. Universities: student information, teacher information, non-teaching staff information,
course information, section information, grade report information, and many more.
2. Banking:customer details, asset details, banking transactions, balance sheets, credit
card and debit card details, loans, fixed deposits, and much more.
Characteristics
Self-describing nature of a database system:
A database system is referred to as self-describing because it not only contains the
database itself, but also metadata which defines and describes the data and
relationships between tables in the database. This information is used by the DBMS
software or database users if needed. This separation of data and information about
the data makes a database system totally different from the traditional file-based
system in which the data definition is part of the application programs.
Insulation between program and data
In the file-based system, the structure of the data files is defined in the application
programs so if a user wants to change the structure of a file, all the programs that
access that file might need to be changed as well.
On the other hand, in the database approach, the data structure is stored in the
system catalogue and not in the programs. Therefore, one change is all that is
needed to change the structure of a file. This insulation between the programs and
data is also called program-data independence.
Support for multiple views of data
A database supports multiple views of data. A view is a subset of the database,
which is defined and dedicated for users of the system. Multiple users in the system
might have different views of the system. Each view might contain only the data of
interest to a user or group of users.
Sharing of data and multiuser system
Current database systems are designed for multiple users. That is, they allow many
users to access the same database at the same time. This access is achieved
through features called concurrency control strategies. These strategies ensure that
the data accessed is always correct and that data integrity is maintained.
The design of modern multiuser database systems is a great improvement from
those in the past which restricted usage to one person at a time.
Control of data redundancy
In the database approach, ideally, each data item is stored in only one place in the
database. In some cases, data redundancy still exists to improve system
performance, but such redundancy is controlled by application programming and
kept to minimum by introducing as little redundancy as possible when designing the
database.
Data sharing
The integration of all the data, for an organization, within a database system has
many advantages. First, it allows for data sharing among employees and others who
have access to the system. Second, it gives users the ability to generate more
information from a given amount of data than would be possible without the
integration.
Enforcement of integrity constraints
Database management systems must provide the ability to define and enforce
certain constraints to ensure that users enter valid information and maintain data
integrity. A database constraint is a restriction or rule that dictates what can be
entered or edited in a table such as a postal code using a certain format or adding a
valid city in the City field.
There are many types of database constraints. Data type, for example, determines
the sort of data permitted in a field, for example numbers only. Data uniqueness
such as the primary key ensures that no duplicates are entered. Constraints can be
simple (field based) or complex (programming).
Restriction of unauthorized access
Not all users of a database system will have the same accessing privileges. For
example, one user might have read-only access (i.e., the ability to read a file but not
make changes), while another might have read and write privileges, which is the
ability to both read and modify a file. For this reason, a database management
system should provide a security subsystem to create and control different types of
user accounts and restrict unauthorized access.
Data independence
Another advantage of a database management system is how it allows for data
independence. In other words, the system data descriptions or data describing data
(metadata) are separated from the application programs. This is possible because
changes to the data structure are handled by the database management system
and are not embedded in the program itself.
Transaction processing
A database management system must include concurrency control subsystems.
This feature ensures that data remains consistent and valid during transaction
processing even if several users update the same information.
Provision for multiple views of data
By its very nature, a DBMS permits many users to have access to its database either
individually or simultaneously. It is not important for users to be aware of how and
where the data they access is stored.
Backup and recovery facilities
Backup and recovery are methods that allow you to protect your data from loss. The
database system provides a separate process, from that of a network backup, for
backing up and recovering data. If a hard drive fails and the database stored on the
hard drive is not accessible, the only way to recover the database is from a backup.
If a computer system fails in the middle of a complex update process, the recovery
subsystem is responsible for making sure that the database is restored to its original
state. These are two more benefits of a database management system.
Architecture
• The DBMS design depends upon its architecture. The basic client/server
architecture is used to deal with many PCs, web servers, database servers
and other components that are connected with networks.
• The users of the database should not worry about the physical implementation
and internal workings of the database such as data compression and encryption
techniques, hashing, optimization of the internal structures etc.
• All users should be able to access the same data according to their requirements.
• DBA should be able to change the conceptual structure of the database without
affecting the user's
• The internal structure of the database should be unaffected by changes to
physical aspects of the storage.
Data Abstraction:
Data abstraction refers to the hiding of details from users at certain levels, for
authentication and security purposes. Any DBMS architecture mainly consists of
three levels: Conceptual level, Internal level, and External level.
Conceptual level
• This is the lowest level of abstraction describes how physical data is stored
• It provides details about the complex data structures that are used for
storage of data
• Internal level provides indexes and clusters to control and manage the
physically stored data in hard disk
Data Independence
These levels of abstraction provide data independence i.e., is all the transactions or
changes made at one level are unaffected to other levels
Uses We use Schema for defining the We use Instance for referring to a
basic structure of any given set of information at any given
database. It defines how the instance/ time.
available needs to get stored.
Classifications of DBMS
1) Centralized Database
It is the type of database that stores data in a centralized database system. It allows the
users to access the stored data from different locations through several applications.
These applications contain the authentication process to let users access data securely.
An example of a Centralized database can be the Central Library that carries a central
database of each library in a college/university.
Advantages
• It has decreased the risk of data management, i.e., manipulation of data will not
affect the core data.
• Data consistency is maintained as it manages data in a central repository.
• It provides better data quality, which enables organizations to establish data
standards.
• It is less costly because fewer vendors are required to handle the data sets.
Disadvantages
• The size of the centralized database is large, which increases the response time for
fetching the data.
• It is not easy to update such an extensive database system.
• If any server failure occurs, entire data will be lost, which could be a huge loss.
2) Distributed Database
Unlike a centralized database system, in distributed systems, data is distributed among
different database systems of an organization. These database systems are connected via
communication links. Such links help the end-users to access the data easily. Examples of
the Distributed database are Apache Cassandra, HBase, Ignite, etc.
Homogeneous DDB: Those database systems which execute on the same operating
system and use the same application process and carry the same hardware devices.
Heterogeneous DDB: Those database systems which execute on different operating
systems under different application procedures and carries different hardware devices.
Advantages
A type of database where data is stored in a virtual environment and executed over the
cloud computing platform. It provides users with various cloud computing services (SaaS,
PaaS, IaaS, etc.) for accessing the database. There are numerous cloud platforms, but the
best options are:
SQL Commands
• SQL commands are instructions. It is used to communicate with the database. It is
also used to perform specific tasks, functions, and queries of data.
• SQL can perform various tasks like creating a table, add data to tables, drop the
table, modify the table, set permission for users.
1. Data Definition Language (DDL)
• DDL changes the structure of the table like creating a table, deleting a table, altering
a table, etc.
• All the commands of DDL are auto committed that means it permanently saves all
the changes in the database.
• Here are some commands that come under DDL:
➢ CREATE
➢ ALTER
➢ DROP
➢ TRUNCATE
2. Data Manipulation Language (DML)
• DML commands are used to modify the database. It is responsible for all forms of
changes in the database.
• The command of DML is not auto committed that means it can't permanently save
all the changes in the database. They can be rollback.
• Here are some commands that come under DML:
➢ INSERT
➢ UPDATE
➢ DELETE
3. Data Control Language (DCL)
• DCL commands are used to grant and take back authority from any database user.
• Here are some commands that come under DCL:
➢ Grant
➢ Revoke
Attributes: Attributes are the houses or traits of an entity. They describe the data that may
relate to an entity.
Entity Type: A category or class of entities that share the same attributes is referred to as
an entity kind.
Entity Instance: An entity example is a particular incidence or character entity within an
entity type. Each entity instance has a unique identity, often known as the number one key.
Primary Key: A primary key is a unique identifier for every entity instance inside an entity
kind.
It can be classified into two types:
Strong Entity Set
Strong entity sets exist independently and each instance of a strong entity set has a unique
primary key.
Example of Strong Entity includes:
• Laptop Color
• RAM, etc.
Kinds of Entities
There are two types of Entities:
Tangible Entity
• Examples of tangible entities are physical goods or physical products (for example,
“inventory items” in an inventory database) or people (for example, customers or
employees).
Intangible Entity
• Intangible entities are abstract or conceptual objects that are not physically present
but have meaning in the database.
• They are typically defined by attributes or properties that are not directly visible.
• Examples of intangible entities include concepts or categories (such as “Product
Categories” or “Service Types”) and events or occurrences (such as appointments
or transactions).
Entity Types in DBMS
Strong Entity Types: These are entities that exist independently and have a completely
unique identifier.
Weak Entity Types: These entities depend on another entity for his or her lifestyles and do
now not have a completely unique identifier on their own.
The Example of Strong and Weak Entity Types in DMBS is:
Associative Entity Types: These constitute relationships between or greater entities and
might have attributes of their own.
Derived Entity Types: These entities are derived from different entities through a system or
calculation.
Multi-Valued Entity Types: These entities will have more than one value for an
characteristic.
Attributes
In DBMS, there are various types of attributes available:
• Simple Attributes
• Composite Attributes
• Single Valued Attributes
• Multi-Valued Attributes
• Derived Attributes
• Complex Attributes (Rarely used attributes)
• Key Attributes
• Stored Attributes
Simple Attributes
Simple attributes in an ER model diagram are independent attributes that can't be
classified further and can't be subdivided into any other component. These attributes are
also known as atomic attributes.
Composite Attributes
Composite attributes have opposite functionality to of simple attributes as we can further
subdivide composite attributes into different components or sub-parts that form simple
attributes. In simple terms, composite attributes are composed of one or more simple
attributes.
Single-Valued Attributes
Single-valued attributes are those attributes that consist of a single value for each entity
instance and can't store more than one value. The value of these single-valued attributes
always remains the same, just like the name of a person.
Multi-Valued Attributes
Multi-valued attributes have opposite functionality to that of single-valued attributes, and
as the name suggests, multi-valued attributes can take up and store more than one value
at a time for an entity instance from a set of possible values. These attributes are
represented by co-centric elliptical shape, and we can also use curly braces { } to represent
multi-valued attributes inside it.
Derived Attributes
Derived attributes in DBMS are those attributes whose values can be derived from the
values of other attributes. They are always dependent upon other attributes for their value.
For example, as we were discussing above, DOB is a single-valued attribute and remains
constant for an entity instance. From DOB, we can derive the Age attribute, which changes
every year, and can easily calculate the age of a person from his/her date of birth value.
Hence, the Age attribute here is derived attribute from the DOB single-valued attribute.
Key Attributes
Key attributes are special types of attributes that act as the primary key for an entity and
they can uniquely identify an entity from an entity set. The values that key attributes store
must be unique and non-repeating.
Stored Attributes
Values of stored attributes remain constant and fixed for an entity instance and, and they
help in deriving the derived attributes. For example, the Age attribute can be derived from
the Date of Birth attribute, and the Date of birth attribute has a fixed and constant value
throughout the life of an entity. Hence, the Date of Birth attribute is a stored attribute.
Relationship Types
A relationship in DBMS is the way in which two or more data sets are linked, i.e., any
association between two entity types is called a relationship. So, an entity takes part in the
relationship, and it is represented by a diamond shape. Three specific types of
relationships can exist between the tables, and they are One-to-One, One-to-Many and
Many-to-One relationship.
One-to-One Relationship
A One-to-one relationship means a single record in Table A is related to a single record in
Table B and vice-versa.
For example, if there are two entities, 'Person'(Name, age, address, contact no.) and
‘Citizenship Card’ (Name, Citizenship number.). So, each person can have only one Aadhar
card, and the single Aadhar card belongs to only one person.
This type of relationship is used for security purposes. In the above example, we can store
the Citizenship number in the same entity 'Person', but we created another table for the
Citizenship number because the Citizenship number may be sensitive data and should be
hidden from others. It is also represented as a 1:1 relationship.
One-to-Many Relationship
Such a relationship exists when each record of table A can be related to one or more
records of another table i.e., table B. However, a single record in table B will have a link to a
single record in table A.
For example, if there are two entities, 'Customer' and 'Account', then each customer can
have more than one account, and each account is owned by one customer only.
It is also represented as a 1: N relationship.
Many-to-Many Relationship
A many-to-many relationship exists between the tables if a single record of the first table is
related to one or more records of the second table and a single record in the second table
is related to one or more records of the first table.
For example, consider the two tables i.e., a student table and a courses table. A particular
student may enroll himself in one or more than one course, while a course also may have
one or more students. It is also represented as an M: N relationship.
A many-to-many relationship from the perspective of table A.
E-R diagrams
An Entity Relationship Diagram in DBMS is a blueprint of the database that can be later
implemented as an actual database in the form of tables. It is a "diagrammatic
representation of the database."
Why Use ER Diagrams?
The main reasons for using the ER diagram before constructing an actual database are as
follows:
• An Entity Relationship Diagram is used for modeling the data that will be stored in a
database.
• The database designers get a better understanding of the information that will be
contained in the database using the Entity Relationship Diagram.
Create ER diagrams
Functional Dependency
Functional dependency (FD) is a set of constraints between two attributes in a relation.
Functional dependency says that if two tuples have same values for attributes A1, A2..., An,
then those two tuples must have to have same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X
functionally determines Y. The left-hand side attributes determine the values of attributes
on the right-hand side.
Armstrong's Axioms
Reflexive Rule − If alpha is a set of attributes and beta is subset of alpha, then alpha holds
beta.
Augmentation Rule − If a → b holds and y is attribute set, then ay → by also holds. That is
adding attributes in dependencies, does not change the basic dependencies.
Transitivity Rule − Same as transitive rule in algebra, if a → b holds and b → c holds, then a →
c also holds. a → b is called as a functionally that determines b.
Anomalies on Database
Update Anomalies − If data items are scattered and are not linked to each other properly,
then it could lead to strange situations. For example, when we try to update one data item
by having its copies scattered over several places, a few instances get updated properly
while a few others are left with old values. Such instances leave the database in an
inconsistent state.
Deletion Anomalies − We tried to delete a record, but parts of it were left undeleted
because of unawareness, the data is also saved somewhere else.
Insert Anomalies − We tried to insert data in a record that does not exist at all.
Normalization
Normalization is a method to remove all the above anomalies and bring the database to a
consistent state.
Course Content
Programming Java, C++
Web HTML, PHP, ASP
We re-arrange the relation (table) as below, to convert it to First Normal Form.
Course Content
Programming Java
Programming C++
Web HTML
Web PHP
Web ASP
Each attribute must contain only a single value from its pre-defined domain.
Proj_id Proj_name
We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.
super key nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists
transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as
follows.
Student_Detail
Zip City
Boyce-Codd Normal Form (BCNF)
❖ BCNF is the advanced version of 3NF. It is stricter than 3NF.
❖ A table is in BCNF if every functional dependency X → Y, X is the super key of the
table.
❖ For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one
department.
EMPLOYEE table
Id Country
100 Nepal
200 India
EMP_DEPT table:
Designing D123 02
Testing D123 03
Stores D234 04
Developing D234 05
EMP_DEPT_MAPPING table:
Id Department
100 Designing
100 Testing
200 Stores
200 Developing
Functional dependencies:
Id → Country
Department → {Depart_Type, Depart_No}
Candidate keys:
For the first table: Id
For the second table: Department
For the third table: {Id, Department}
Now, this is in BCNF because the left-side part of both the functional dependencies is a
key.
Integrity Constraints
Integrity constraints are a set of rules. It is used to maintain the quality of information.
Integrity constraints ensure that the data insertion, updating, and other processes must be
performed in such a way that data integrity is not affected. Thus, integrity constraint is used
to guard against accidental damage to the database.
Types of Integrity Constraint
1. Domain constraints
Domain constraints can be defined as the definition of a valid set of values for an attribute.
The data type of domain includes string, character, integer, time, date, currency, etc. The
value of the attribute must be available in the corresponding domain.
4. Key constraints
Keys are the entity set that is used to identify an entity within its entity set uniquely.
An entity set can have multiple keys, but out of which one key will be the primary key. A
primary key can contain a unique and null value in the relational table.
Features of SQL
Data-definition language (DDL): The SQL DDL provides commands for defining relation
schemas, deleting relations, and modifying relation schemas.
Data-manipulation language (DML): The SQL DML provides the ability to query
information from the database and to insert tuples into, delete tuples from, and
modify tuples in the database.
Integrity: The SQL DDL includes commands for specifying integrity constraints that the
data stored in the database must satisfy. Updates that violate integrity constraints are
disallowed.
View definition: The SQL DDL includes commands for defining views.
Transaction control: SQL includes commands for specifying the beginning and end points
of transactions.
Embedded SQL and dynamic SQL: Embedded and dynamic SQL define how SQL
statements can be embedded within general-purpose programming languages, such as C,
C++, and Java.
Authorization: The SQL DDL includes commands for specifying access rights to relations
and views.
Basic Types
The SQL standard supports a variety of built-in types, including:
char(n): A fixed-length character string with user-specified length n. The full form,
character, can be used instead.
varchar(n): A variable-length character string with user-specified maximum length n. The
full form, character varying, is equivalent.
int: An integer (a finite subset of the integers that is machine dependent). The full form,
integer, is equivalent.
smallint: A small integer (a machine-dependent subset of the integer type).
numeric(p, d): A fixed-point number with user-specified precision. The number consists of
p digits (plus a sign), and d of the p digits are to the right of the decimal point. Thus,
numeric(3,1) allows 44.5 to be stored exactly, but neither 444.5 nor 0.32 can be stored
exactly in a field of this type.
real, double precision: Floating-point and double-precision floating-point numbers with
machine-dependent precision.
float(n): A floating-point number with precision of at least n digits.
Each type may include a special value called the null value. A null value indicates an
absent value that may exist but be unknown or that may not exist at all.
• CREATE Command
• DROP Command
• ALTER Command
CREATE Command
CREATE is a DDL command used to create databases, tables, triggers and other database
objects.
Syntax to Create a Database:
CREATE Database Database_Name;
Suppose you want to create a Books database in the SQL database. To do this, you must
write the following DDL Command:
Create Database Books;
Example 2: This example describes how to create a new table using the CREATE DDL
command.
Syntax to create a new table:
CREATE TABLE table_name(
column_Name1 data_type ( size of the column ) ,
column_Name2 data_type ( size of the column) ,
...
column_NameN data_type ( size of the column )
) ;
DROP Command
DROP is a DDL command used to delete/remove the database objects from the SQL
database. We can easily remove the entire table, view, or index from the database using
this DDL command.
Syntax to remove a database:
DROP DATABASE Database_Name;
Suppose you want to delete the Books database from the SQL database. To do this, you
must write the following DDL command:
DROP DATABASE Books;
Example 2: This example describes how to remove the existing table from the SQL
database.
Syntax:
DROP TABLE Table_Name;
Suppose you want to delete the Student table from the SQL database. To do this, you must
write the following DDL command:
DROP TABLE Student;
ALTER Command
ALTER is a DDL command which changes or modifies the existing structure of the
database, and it also changes the schema of database objects.
We can also add and drop constraints of the table using the ALTER command.
Syntax to add a new field in the table:
ALTER TABLE name_of_table ADD column_name column_definition;
Suppose you want to add the 'Father's_Name' column in the existing Student table. To do
this, you must write the following DDL command:
ALTER TABLE Student ADD Father's_Name Varchar(60);
Example 2: This example describes how to remove the existing column from the table.
Syntax to remove a column from the table:
ALTER TABLE name_of_table DROP Column_Name_1 , column_Name_2 ,
….., column_Name_N;
Suppose you want to remove the Age and Marks column from the existing Student table. To
do this, you must write the following DDL command:
ALTER TABLE Student DROP Age, Marks;
Data Manipulation Language (DML)
The DML commands in Structured Query Language change the data present in the SQL
database. We can easily access, store, modify, update and delete the existing records from
the database using DML commands.
Following is the four main DML commands in SQL:
• SELECT Command
• INSERT Command
• UPDATE Command
• DELETE Command
SELECT Command
The SELECT command shows the records of the specified table. It also shows the record of
a particular column by using the WHERE clause.
Syntax:
SELECT column_Name_1,column_Name_2, …..,column_Name_N
FROM Name_of_table;
Here, column_Name_1, column_Name_2, ….., column_Name_N are the names of those
columns whose data we want to retrieve from the table.
Example 2: This example shows all the values of a specific column from the table.
SELECT Emp_Id, Emp_Salary FROM Employee;
This SELECT statement displays all the values of Emp_Salary and Emp_Id column of
Employee table:
Example 3: This example describes how to use the WHERE clause with the SELECT DML
command.
Let's take the following Student table:
Student_ID Student_Name Student_Marks
BCA1001 Abhay 80
BCA1002 Ankit 75
BCA1003 Bheem 80
BCA1004 Ram 79
BCA1005 Sumit 80
If you want to access all the records of those students whose marks is 80 from the above
table, then you have to write the following DML command in SQL:
SELECT * FROM Student WHERE Stu_Marks = 80;
The above SQL query shows the following table in result
BCA1003 Bheem 80
BCA1005 Sumit 80
INSERT Command
INSERT is another most important data manipulation command in Structured Query
Language, which allows users to insert data in database tables.
Syntax:
INSERT INTO TABLE_NAME ( column_Name1 , column_Name2 ,
column_Name3 , .... column_NameN ) VALUES (value_1, value_2,
value_3, .... value_N ) ;
Example 1: This example describes how to insert the record in the database table. Let's
take the following student table, which consists of only 2 records of the student.
UPDATE Command
UPDATE is another most important data manipulation command in Structured Query
Language, which allows users to update or modify the existing data in database tables.
Syntax
UPDATE Table_name SET [column_name1= value_1, ….., column_nameN
= value_N] WHERE CONDITION;
Here, 'UPDATE', 'SET', and 'WHERE' are the SQL keywords, and 'Table_name' is the name of
the table whose values you want to update.
Example 1: This example describes how to update the value of a single field.
P203 Namkeen 40 50
Let's take a Product table consisting of the following records:
Suppose you want to update the Product_Price of the product whose Product_Id is P102.
To do this, you must write the following DML UPDATE command:
UPDATE Product SET Product_Price=80 WHERE Product_Id = 'P102';
Example 2: This example describes how to update the value of multiple fields of the
database table.
Let's take a Student table consisting of the following records:
Stu_Id Stu_Name Stu_Marks Stu_Age
101 Ramesh 92 20
201 Jatin 83 19
202 Anuj 85 19
203 Monty 95 21
102 Saket 65 21
103 Sumit 78 19
104 Ashish 98 20
Suppose you want to update Stu_Marks and Stu_Age of that student whose Stu_Id is 103
and 202. To do this, you have to write the following DML Update command:
UPDATE Student SET Stu_Marks = 80, Stu_Age = 21 WHERE Stu_Id =
103 AND Stu_Id = 202;
DELETE Command
DELETE is a DML command which allows SQL users to remove single or multiple existing
records from the database tables.
This command of Data Manipulation Language does not delete the stored data
permanently from the database. We use the WHERE clause with the DELETE command to
select specific rows from the table.
Syntax
DELETE FROM Table_Name WHERE condition;
Example 1: This example describes how to delete a single record from the table.
Let's take a Product table consisting of the following records:
Product_Id Product_Name Product_Price Product_Quantity
P101 Chips 20 20
P102 Chocolates 60 40
P103 Maggi 75 5
P201 Biscuits 80 20
P203 Namkeen 40 50
Suppose you want to delete that product from the Product table whose Product_Id is P203.
To do this, you must write the following DML DELETE command:
DELETE FROM Product WHERE Product_Id = 'P203’;
Example 2: This example describes how to delete the multiple records or rows from the
database table.
Let's take a Student table consisting of the following records:
Stu_Id Stu_Name Stu_Marks Stu_Age
101 Ramesh 92 20
201 Jatin 83 19
202 Anuj 85 19
203 Monty 95 21
102 Saket 65 21
103 Sumit 78 19
104 Ashish 98 20
Suppose you want to delete the record of those students whose Marks is greater than 70.
To do this, you must write the following DML Update command:
DELETE FROM Student WHERE Stu_Marks > 70 ;
PRODUCT_MAST
PRODUCT COMPANY QTY RATE COST
Item1 Com1 2 10 20
Item2 Com2 3 25 75
Item3 Com1 2 30 60
Item4 Com3 5 10 50
Item5 Com2 2 20 40
Item6 Cpm1 3 25 75
Item7 Com1 5 30 150
Item8 Com1 3 10 30
Item9 Com2 2 25 50
Item10 Com3 4 30 120
Example: COUNT()
SELECT COUNT(*)
FROM PRODUCT_MAST;
Output:
10
Example: COUNT with WHERE
SELECT COUNT(*)
FROM PRODUCT_MAST;
WHERE RATE>=20;
Output:
7
Example: COUNT() with DISTINCT
SELECT COUNT(DISTINCT COMPANY)
FROM PRODUCT_MAST;
Output:
3
Example: COUNT() with GROUP BY
SELECT COMPANY, COUNT(*)
FROM PRODUCT_MAST
GROUP BY COMPANY;
Output:
Com1 5
Com2 3
Com3 2
Example: COUNT() with HAVING
SELECT COMPANY, COUNT(*)
FROM PRODUCT_MAST
GROUP BY COMPANY
HAVING COUNT(*)>2;
Output:
Com1 5
Com2 3
2. SUM Function
Sum function is used to calculate the sum of all selected columns. It works on numeric
fields only.
Syntax
SUM()
or
SUM( [ALL|DISTINCT] expression )
Example: SUM()
SELECT SUM(COST)
FROM PRODUCT_MAST;
Output:
670
Example: SUM() with WHERE
SELECT SUM(COST)
FROM PRODUCT_MAST
WHERE QTY>3;
Output:
320
Example: SUM() with GROUP BY
SELECT SUM(COST)
FROM PRODUCT_MAST
WHERE QTY>3
GROUP BY COMPANY;
Output:
Com1 150
Com2 170
Example: SUM() with HAVING
SELECT COMPANY, SUM(COST)
FROM PRODUCT_MAST
GROUP BY COMPANY
HAVING SUM(COST)>=170;
Output:
Com1 335
Com3 170
3. AVG function
The AVG function is used to calculate the average value of the numeric type. AVG function
returns the average of all non-Null values.
Syntax
AVG()
or
AVG( [ALL|DISTINCT] expression )
Example:
SELECT AVG(COST)
FROM PRODUCT_MAST;
Output:
67.00
4. MAX Function
MAX function is used to find the maximum value of a certain column. This function
determines the largest value of all selected values of a column.
Syntax
MAX()
or
MAX( [ALL|DISTINCT] expression )
Example:
SELECT MAX(RATE)
FROM PRODUCT_MAST;
Output:
30
5. MIN Function
MIN function is used to find the minimum value of a certain column. This function
determines the smallest value of all selected values of a column.
Syntax
MIN()
or
MIN( [ALL|DISTINCT] expression )
Example:
SELECT MIN(RATE)
FROM PRODUCT_MAST;
Output:
10
SQL Joins
The SQL JOIN statement is used to combine rows from two tables based on a common
column and selects records that have matching values in these columns.
Types of SQL JOINs
In SQL, we have four main types of joins:
• INNER JOIN
• LEFT JOIN
• RIGHT JOIN
• FULL OUTER JOIN
SQL INNER JOIN
The SQL INNER JOIN statement joins two tables based on a common column and selects
rows that have matching values in these columns.
Syntax
SELECT columns_from_both_tables
FROM table1
INNER JOIN table2
ON table1.column1 = table2.column2
Here,
• table1 and table2 - two tables that are to be joined
• column1 and column2 - columns common to in table1 and table2
Example
-- join the Customers and Orders tables
-- with customer_id and customer fields
As you can see, INNER JOIN excludes all the rows that are not common between two
tables.
Here's an example of INNER JOIN with the WHERE clause:
-- join Customers and Orders table
-- with customer_id and customer fields
SELECT Customers.customer_id, Customers.first_name,
Orders.amount
FROM Customers
INNER JOIN Orders
ON Customers.customer_id = Orders.customer
WHERE Orders.amount >= 500;
Here, the SQL command joins two tables and selects rows where the amount is greater
than or equal to 500.
SQL LEFT JOIN
The SQL LEFT JOIN combines two tables based on a common column. It then selects
records having matching values in these columns and the remaining rows from the left
table.
Syntax
SELECT columns_from_both_tables
FROM table1
LEFT JOIN table2
ON table1.column1 = table2.column2
Here,
Here, the SQL command combines data from the Customers and Orders tables.
The query selects the customer_id and first_name from Customers and the amount from
Orders.
Hence, the result includes rows where customer_id from Customers matches customer
from Orders.
We can use the LEFT JOIN statement with an optional WHERE clause. For example,
Here, the SQL command selects the customer_id and first_name columns (from the
Customers table) and the amount column (from the Orders table).
And the result set will contain those rows where there is a match between customer_id (of
the Customers table) and customer (of the Orders table), along with all the remaining rows
from the Orders table.
The SQL RIGHT JOIN statement can have an optional WHERE clause. For example,
SELECT Customers.customer_id, Customers.first_name,
Orders.amount
FROM Customers
RIGHT JOIN Orders
ON Customers.customer_id = Orders.customer
WHERE Orders.amount >= 500;
Here, the SQL command joins the Customers and Orders tables and selects rows where
the amount is greater than or equal to 500.
SQL FULL OUTER JOIN
The SQL FULL OUTER JOIN statement joins two tables based on a common column. It
selects records that have matching values in these columns and the remaining rows from
both tables.
Syntax
SELECT columns
FROM table1
FULL OUTER JOIN table2
ON table1.column1 = table2.column2;
Here,
The SQL FULL OUTER JOIN statement can have an optional WHERE clause. For example,
SELECT Customers.customer_id, Customers.first_name,
Orders.amount
FROM Customers
FULL OUTER JOIN Orders
ON Customers.customer_id = Orders.customer
WHERE Orders.amount >= 500;
Here, the SQL command joins two tables and selects rows where the amount is greater
than or equal to 500.
SQL Views
In SQL, views contain rows and columns similar to a table, however, views don't hold the
actual data.
You can think of a view as a virtual table environment that's created from one or more
tables so that it's easier to work with data.
Creating a View in SQL
We can create views in SQL by using the CREATE VIEW command. For example,
CREATE VIEW us_customers AS
SELECT customer_id, first_name
FROM Customers
WHERE Country = 'USA';
Here, a view named us_customers is created from the customers table.
Now to select the customers who lives in USA, we can simply run,
SELECT *
FROM us_customers;
Updating a View
It's possible to change or update an existing view using the CREATE OR REPLACE VIEW
command. For example,
CREATE OR REPLACE VIEW us_customers AS
SELECT *
FROM Customers
WHERE Country = 'USA';
Here, the us_customers view is updated to show all the fields.
Deleting a View
We can delete views using the DROP VIEW command. For example,
DROP VIEW us_customers;
Here, the SQL command deletes the view named us_customers.
SQL UPDATE
In SQL, the UPDATE statement is used to modify existing records in a database table.
Example
--update a single value in the given row
UPDATE Customers
SET age = 21
WHERE customer_id = 1;
Here, the SQL command updates the age column to 21 where the customer_id equals 1.
SQL UPDATE TABLE Syntax
UPDATE table_name
SET column1 = value1, column2 = value2, ...
[WHERE condition];
Here,
We use the UPDATE statement to update multiple rows at once. For example,
-- update multiple rows satisfying the condition
UPDATE Customers
SET country = 'NP'
WHERE age = 22;
Here, the SQL command changes the value of the country column to NP if age is 22.
If there is more than one row where age equals to 22, all the matching rows will be
modified.
Update all Rows
We can update all the rows in a table at once by omitting the WHERE clause. For example,
-- update all rows
UPDATE Customers
SET country = 'NP';
FROM Customers
WHERE country LIKE 'UK';
Here, the SQL command selects customers whose country is UK.
Here,
Operator Meaning
ALL TRUE if all of a set of comparisons are TRUE.
AND TRUE if both Boolean expressions are TRUE.
ANY TRUE if any one of a set of comparisons are TRUE.
BETWEEN TRUE if the operand is within a range.
EXISTS TRUE if a subquery contains any rows.
IN TRUE if the operand is equal to one of a list of expressions.
LIKE TRUE if the operand matches a pattern.
NOT Reverses the value of any other Boolean operator.
OR TRUE if either Boolean expression is TRUE.
SOME TRUE if some of sets of comparisons are TRUE.
Relational Algebra
Relational algebra refers to a procedural query language that takes relation instances as
input and returns relation instances as output. It performs queries with the help of
operators. A binary or unary operator can be used. They take in relations as input and
produce relations as output. Recursive relational algebra is applied to a relationship, and
intermediate outcomes are also considered relations.
Relational Algebra Operations
The following are the fundamental operations present in a relational algebra:
• Select Operation
• Project Operation
• Union Operation
• Set Different Operation
• Cartesian Product Operation
• Rename Operation
Select Operation (or σ)
It selects tuples from a relation that satisfy the provided predicate.
The notation is: σp(r)
Here σ stands for the selection predicate while r stands for the relation, p refers to the
prepositional logic formula that may use connectors such as or, and, and not. Also, these
terms may make use of relational operators such as =, ≠, ≥, <, >, ≤.
Example
σsubject = “information”(Novels)
The output would be − Selecting tuples from the novels wherever the subject happens to be
‘information’.
σsubject = “information” and cost = “150”(Novels)
The output would be − Selecting tuples from the novels wherever the subject happens to be
‘information’ and the ‘price’ is 150.
σsubject = “information” and cost = “150” or year > “2015”(Novels)
The output would be − Selecting tuples from the novels wherever the subject happens to be
‘information’ and the ‘price’ is 150 or those novels have been published after 2015.
Project Operation (or ∏)
It projects those column(s) that satisfy any given predicate.
Here B1, B2 , Bn refer to the attribute names of the relation r.
Remember that duplicate rows are eliminated automatically, since relation is a set.
Example
∏subject, writer (Novels)
The output would be − Selecting and projecting columns named as writer as well as the
subject from the relation Novels.
The output would be − Providing the writer names who might have written novels but have
not written articles.
Cartesian Product Operation (or Χ)
It helps in combining data and info of two differing relations into one.
The notation is: r Χ s
Where s and r refer to the relations. Their outputs would be defined as the follows:
s Χ r = { t ∈ s and q t | q ∈ r}
σwriter = ‘mahesh'(Novels Χ Articles)
The output would be − Yielding a relation that shows all the articles and novels written by
mahesh.
Rename Operation (or ρ)
Relations are the results of the relational algebra, but without any name. Thus, the rename
operation would allow us to rename the relation output. The ‘rename’ operation is basically
denoted by the small Greek letter ρ or rho.
The notation is: ρx(E)
Join Operations:
A Join operation combines related tuples from different relations, if and only if a given join
condition is satisfied. It is denoted by ⋈.
Types of Join operations:
EMPLOYEE
EMP_CODE EMP_NAME
101 Stephan
102 Jack
103 Harry
SALARY
EMP_CODE SALARY
101 50000
102 30000
103 25000
1. Natural Join:
A natural join is the set of tuples of all combinations in R and S that are equal on their
common attribute names. It is denoted by ⋈.
Example: Let's use the above EMPLOYEE table and SALARY table:
EMP_NAME SALARY
Stephan 50000
Jack 30000
Harry 25000
2. Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with missing
information. Example:
EMPLOYEE
EMP_NAME STREET CITY
Ram Civil line Mumbai
Shyam Park street Kolkata
Ravi M.G. Street Delhi
Hari Nehru nagar Hyderabad
FACT_WORKERS
(EMPLOYEE ⋈ FACT_WORKERS)
Output:
EMP_NAME STREET CITY BRANCH SALARY
Ram Civil line Mumbai Infosys 10000
Shyam Park street Kolkata Wipro 20000
Hari Nehru nagar Hyderabad TCS 50000
An outer join is basically of three types:
In full outer join, tuples in R that have no matching tuples in S and tuples in S that have
no matching tuples in R in their common attribute name. It is denoted by ⟗.
Output:
PRODUCT_ID CITY
1 Delhi
2 Mumbai
3 Noida
CUSTOMER ⋈ PRODUCT
Output:
CLASS_ID NAME PRODUCT_ID CITY
1 John 1 Delhi
2 Harry 2 Mumbai
3 Harry 3 Noida
2. Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the
translated relational algebra expression with the instructions used for specifying and
evaluating each operation. Thus, after translating the user query, the system executes a
query evaluation plan.
Query Evaluation Plan
• To fully evaluate a query, the system needs to construct a query evaluation plan.
• The annotations in the evaluation plan may refer to the algorithms to be used for the
index or the specific operations.
• Such relational algebra with annotations is referred to as Evaluation Primitives. The
evaluation primitives carry the instructions needed for the evaluation of the
operation.
• Thus, a query evaluation plan defines a sequence of primitive operations used for
evaluating a query. The query evaluation plan is also referred to as the query
execution plan.
• A query execution engine is responsible for generating the output of the given query.
It takes the query execution plan, executes it, and finally makes the output for the
user query.
3. Optimization
• The cost of the query evaluation can vary for different types of queries. Although the
system is responsible for constructing the evaluation plan, the user does need not
to write their query efficiently.
• A1 (linear search): Search each file block and test all records to determine
satisfaction.
• A2 (Binary Search): Selection is equality on attribute on which file is ordered.
• A3 (Primary index on candidate key, equality)
• A4 (primary index on non-key, equality.)
• A5 (secondary index on search key, equality.)
• A6 (primary index, comparison)
• A7 (secondary index, comparison)
Sorting:
- Quick sort (records completely in main memory)
- External sort (records are on disk)
Join Operation
1. Nested loops join
R∞S
For each tuple t in r do begin
For each tuple ts in s do begin
Test pair (t, ts) to see if condition satisfy
If they do add t and ts to the result.
End
End
2. Merge loops join
• Executes a single operation at a time which generates a temporary file that will be
used as input for the next operation.
• It is easy to implement but time consuming.
• It walks the parse tree of relational algebra and performs innermost operations first.
• The result has materialized and becomes input for next operations.
• The cost is the sum of individual operations plus the cost of writing intermediate
results to disks.
• It can always be applied.
Pipelining
• With pipeline, operations are arranged and form a queue and results are passed
from one operation to another as they are calculated
• Avoids intermediate temporary relations.
• Cheaper as no cost of writing results to disk
• It is not always possible.
• In demand driven system requests next tuple from top level operation, each
operation requests next tuple from children operation.
• In producer driven, operators produce tuples and pass to parents.
• Analyze and transform equivalent relational expressions: Try to minimize the tuple
and column counts of the intermediate and final query processes (discussed here).
• Using different algorithms for each operation: These underlying algorithms
determine how tuples are accessed from the data structures they are stored in,
indexing, hashing, data retrieval and hence influence the number of disk and block
accesses (discussed in query processing).
Equivalence Rules
1. Conjunctive selection operations can be deconstructed into a sequence of individual
selections.
σθ1^θ2(E) = σθ1(σθ2(E))
2. Selection operations are commutative.
σθ1(σθ2(E)) = σθ2(σθ1(E))
3. Only the last in a sequence of projection operations is needed, the others can be
omitted.
Πt1(Πt2(....(Πtn(E)))) = Πt1(E)
4. Selections can be combined with Cartesian products and theta joins.
a. σθ(E1 X E2) = E1 ⋈θ E2
- Let L3 be attributes of E1 that are involved in join condition θ, but are not in L1 ∪ L2, and
- let L4 be attributes of E2 that are involved in join condition θ, but are not in L1 ∪ L2.
E1 ∩ E2 = E2 ∩ E1
! (set difference is not commutative).
10. Set union and intersection are associative.
6. Performance Tuning
SQL Performance tuning is the process of enhancing SQL queries to speed up the server
performance. Performance tuning in SQL shortens the time it takes for a user to receive a
response after sending a query and utilizes fewer resources in the process. The idea is that
users can occasionally produce the same intended result set with a faster-running query.
Factors Affecting SQL Speed
Some of the major factors that influence the computation and execution time in SQL are:
Table size: Performance may be impacted if your query hits one or more tables with
millions of rows or more.
Joins: Your query is likely to be slow if it joins two tables in a way that significantly raises
the number of rows in the return set.
Aggregations: Adding several rows together to create a single result needs more
processing than just retrieving those values individually.
The following techniques can be used to optimize SQL queries:
This operation in transactions is used to maintain integrity in the database. Due to some
failure of power, hardware, or software, etc., a transaction might get interrupted before all
its operations are completed. This may cause ambiguity in the database, i.e. it might get
inconsistent before and after the transaction. To ensure that further operations of any other
transaction are performed only after work of the current transaction is done, a commit
operation is performed to the changes made by a transaction permanently to the database.
iv) Rollback
This operation is performed to bring the database to the last saved state when any
transaction is interrupted in between due to any power, hardware, or software failure. In
simple words, it can be said that a rollback operation does undo the operations of
transactions that were performed before its interruption to achieve a safe state of the
database and avoid any kind of ambiguity or inconsistency.
States of Transactions
In a database, a transaction can be in one of these states given below –
Active − This is the state in which a transaction is being executed. Thus, it is like the initial
state of any given transaction.
Partially Committed − A transaction is in its partially committed state whenever it
executes the final operation.
Failed − In case any check made by a database recovery system fails, then that transaction
is in a failed state. Remember that a failed transaction cannot proceed further.
Aborted − In case any check fails, leading the transaction to a failed state, the recovery
manager then rolls all its write operations back on the database so that it can bring the DB
(database) back to the original state (the state where it was prior to the transaction
execution). The transactions in this state are known to be aborted. A DB recovery module
can select one of these two operations after the abortion of a transaction –
• Re-start
• Kill the transaction
Committed − We can say that a transaction is committed in case it executes all of its
operations successfully. In such a case, all its effects are now established permanently on
the DB system.
There are mainly two types of scheduling - Serial Schedule and Non-serial Schedule.
Serial Schedule
As the name says, all the transactions are executed serially one after the other. In serial
Schedule, a transaction does not start execution until the currently running transaction
finishes execution.
This type of execution of the transaction is
also known as non-interleaved execution.
Serial Schedules are always recoverable,
cascades, strict, and consistent. A serial
schedule always gives the correct result.
Consider two transactions T1 and T2
shown above, which perform some
operations. If it has no interleaving of
operations, then there are the following
two possible outcomes - either execute all
T1 operations, which were followed by all
T2 operations. Or execute all T2
operations, which were followed by all T1
operations. In the above figure, the
Schedule shows the serial Schedule
where T1 is followed by T2, i.e. T1 -> T2.
Non-serial Schedule
In a non-serial Schedule, multiple
transactions execute
concurrently/simultaneously, unlike the
serial Schedule, where one transaction
must wait for another to complete all its
operations. In the Non-Serial Schedule, the
other transaction proceeds without the
completion of the previous transaction. All
the transaction operations are interleaved
or mixed with each other.
Non-serial schedules are NOT always
recoverable, cascades, strict and
consistent.
In this Schedule, there are two transactions,
T1 and T2, executing concurrently. The
operations of T1 and T2 are interleaved. So,
this Schedule is an example of a Non-Serial
Schedule.
Non-serial schedules are further
categorized into serializable and non-
serializable schedules.
Serializability
Serializability of schedules ensures that a non-serial schedule is equivalent to a serial
schedule. It helps in maintaining the transactions to execute simultaneously without
interleaving one another. In simple words, serializability is a way to check if the execution of
two or more transactions are maintaining the database consistency or not.
What is a serializable schedule?
A non-serial schedule is called a serializable schedule if it can be converted to its
equivalent serial schedule. In simple words, if a non-serial schedule and a serial schedule
result in the same then the non-serial schedule is called a serializable schedule.
Testing of Serializability
To test the serializability of a schedule, we can use Serialization Graph or Precedence
Graph. A serialization Graph is nothing but a Directed Graph of the entire transactions of a
schedule.
It can be defined as a Graph G(V, E)
consisting of a set of directed-edges E =
{E1, E2, E3, ..., En} and a set of
vertices V = {V1, V2, V3, ...,Vn}.
The set of edges contains one of the two
operations - READ, WRITE performed by a
certain transaction.
Ti -> Tj, means Transaction-Ti is either
performing read or write before the
transaction-Tj.
If there is a cycle present in the serialized
graph, then the schedule is non-
serializable because the cycle resembles
that one transaction is dependent on the
other transaction and vice versa. It also
means that there are one or more
conflicting pairs of operations in the
transactions. On the other hand, no-
cycle means that the non-serial schedule
is serializable.
Time A B
T1 READ(DT) ------
T2 ------ READ(DT)
T3 DT=DT+500 ------
T4 WRITE(DT) ------
T5 ------ READ(DT)
Transaction A and B initially read the value of DT as 1000. Transaction A modifies the value
of DT from 1000 to 1500 and then again transaction B reads the value and finds it to be
1500. Transaction B finds two different values of DT in its two different read operations.
Time A B
T1 READ(DT) ------
T2 ------ READ(DT)
T3 DELETE(DT) ------
T4 ------ READ(DT)
Transaction B initially reads the value of DT as 1000. Transaction A deletes the data DT from
the database DB and then again transaction B reads the value and finds an error saying the
data DT does not exist in the database DB.
Lost Update Problem
The Lost Update problem arises when an update in the data is done over another update
but by two different transactions.
Example: Consider two transactions A and B performing read/write operations on a data DT
in the database DB. The current value of DT is 1000: The following table shows the
read/write operations in A and B transactions.
Time A B
T1 READ(DT) ------
T2 DT=DT+500 ------
T3 WRITE(DT) ------
T4 ------ DT=DT+300
T5 ------ WRITE(DT)
T6 READ(DT) ------
Transaction A initially reads the value of DT as 1000. Transaction A modifies the value of DT
from 1000 to 1500 and then again transaction B modifies the value to 1800. Transaction A
again reads DT and finds 1800 in DT and therefore the update done by transaction A has
been lost.
Incorrect Summary Problem
The Incorrect summary problem occurs when there is an incorrect sum of the two data.
This happens when a transaction tries to sum two data using an aggregate function and the
value of any one of the data get changed by another transaction.
Example: Consider two transactions A and B performing read/write operations on two data
DT1 and DT2 in the database DB. The current value of DT1 is 1000 and DT2 is 2000: The
following table shows the read/write operations in A and B transactions.
Time A B
T1 READ(DT1) ------
T2 add=0 ------
T3 add=add+DT1 ------
T4 ------ READ(DT2)
T5 ------ DT2=DT2+500
T6 READ(DT2) ------
T7 add=add+DT2 ------
Transaction A reads the value of DT1 as 1000. It uses an aggregate function SUM which
calculates the sum of two data DT1 and DT2 in variable add but in between the value of DT2
get changed from 2000 to 2500 by transaction B. Variable add uses the modified value of
DT2 and gives the resultant sum as 3500 instead of 3000.
Shared Lock(S): The locks which disable the write operations but allow read operations for
any data in a transaction are known as shared locks. They are also known as read-only
locks and are represented by 'S'.
Exclusive Lock(X): The locks which allow both the read and write operations for any data in
a transaction are known as exclusive locks. This is a one-time use mode that can't be
utilized on the exact data item twice. They are represented by 'X'.
• In the first part, when the execution of the transaction starts, it seeks permission for
the lock it requires.
• In the second part, the transaction acquires all the locks. The third phase is started
as soon as the transaction releases its first lock.
• In the third phase, the transaction cannot demand any new locks. It only releases
the acquired locks.
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
Strict 2PL
• The first phase of Strict-2PL is like 2PL. In the first phase, after acquiring all the
locks, the transaction continues to execute normally.
• The only difference between 2PL and strict 2PL is that Strict-2PL does not release a
lock after using it.
• Strict-2PL waits until the whole transaction to commit, and then it releases all the
locks at a time.
• Strict-2PL protocol does not have shrinking phase of lock release.
Terms Denotations
Timestamp of transaction A TS(A)
Read time-stamp of data-item DT R-timestamp(DT)
Write time-stamp of data-item DT W-timestamp(DT)
following are the rules on which the Time-ordering protocol works:
• TS(A) < W-timestamp (DT): Transaction will rollback. If the timestamp of transaction
A at which it has entered in the system is less than the write timestamp of DT that is
the latest time at which DT has been updated, then the transaction will roll back.
• TS(A) >= W-timestamp (DT): Transaction will be executed. If the timestamp of
transaction A at which it has entered in the system is greater than or equal to the
write timestamp of DT that is the latest time at which DT has been updated, then the
read operation will be executed.
• All data-item timestamps updated.
When transaction A is going to perform a write operation on data item DT:
• TS(A) < R-timestamp (DT): Transaction will rollback. If the timestamp of transaction
A at which it has entered in the system is less than the read timestamp of DT that is
the latest time at which DT has been read, then the transaction will rollback.
• TS(A) < W-timestamp (DT): Transaction will rollback. If the timestamp of transaction
A at which it has entered in the system is less than the write timestamp of DT that is
the latest time at which DT has been updated, then the transaction will rollback.
• All the operations other than this will be executed.
System Crash:
A system crash usually occurs when there is some sort of hardware or software
breakdown. Some other problems which are external to the system and cause the system
to abruptly stop or eventually crash include failure of the transaction, operating system
errors, power cuts, main memory crash, etc.
These types of failures are often termed soft failures and are responsible for the data losses
in the volatile memory. It is assumed that a system crash does not have any effect on the
data stored in the non-volatile storage and this is known as the fail-stop assumption.
Data-transfer Failure:
When a disk failure occurs amid data-transfer operation resulting in loss of content from
disk storage then such failures are categorized as data-transfer failures. Some other
reasons for disk failures include disk head crash, disk unreachability, formation of bad
sectors, read-write errors on the disk, etc.
To quickly recover from a disk failure caused amid a data-transfer operation, the backup
copy of the data stored on other tapes or disks can be used. Thus, it’s good practice to
backup your data frequently.
• It should check the state of all the transactions which were being executed.
• A transaction may be in the middle of some operation; the DBMS must ensure the
atomicity of the transaction in this case.
• It should check whether the transaction can be completed now or it needs to be
rolled back.
• No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as
maintaining the atomicity of a transaction −
• Maintaining the logs of each transaction and writing them onto some stable storage
before modifying the database.
• Maintaining shadow paging, where the changes are done on a volatile memory, and
later, the actual database is updated
rollback is done the old values get restored in the database and all the changes made to
the database are also discarded. This is called the “Un-doing” process. If the commit is
done before crashing the system, then after restarting the system the changes are stored
permanently in the database.
• Transaction identifier: Unique Identifier of the transaction that performed the write
operation.
• Data item: Unique identifier of the data item written.
• Old value: Value of data item prior to write.
• New value: Value of data item after write operation.
Other types of log records are:
• Transaction Ti needs to be undone if the log contains the record <Ti start> but does
not contain either the record <Ti commit> or the record <Ti abort>.
• Transaction Ti needs to be redone if log contains record <Ti start> and either the
record <Ti commit> or the record <Ti abort>.
Example of Log:
<T1,start>
<T1,A,100,200>
<T1,B,200,400>
<T1,commit>
If log has start and commit, then the log recovery manager Redo the operation when failure
is occurred. However, if the log has only start but not commit then the log recovery manger
Undo the operation which assigns the old value.
Another example,
<T1,start>
<T1,A,100,200>
<T1,B,200,400>
<T1,commit>
<T2,start>
<T2,C,250,50>
If failure occurred here then the T1 will be Redo but T2 will be Undo since T1 has both start
and commit but T2 has only start but not commit.
Considering above figure, two write operations are performed on page 3 and 5. Before start
of write operation on page 3, current page table points to old page 3. When write operation
starts following steps are performed:
• All the modifications which are done by transaction which are present in buffers are
transferred to physical database.
• Output current page table to disk.
• Disk address of current page table output to fixed location which is in stable storage
containing address of shadow page table. This operation overwrites the address of
the old shadow page table. With this current page table becomes same as shadow
page table and transaction is committed.
Failure
If the system crashes during execution of transaction but before commit operation, with
this, it is sufficient only to free modified database pages and discard current page table.
Before execution of transaction, the state of database gets recovered by reinstalling
shadow page table. If the crash of system occurs after last write operation, then it does not
affect propagation of changes that are made by transaction. These changes are preserved
and there is no need to perform redo operation.