0% found this document useful (0 votes)
15 views49 pages

UNIT I&unit-2 Material Mtech

The document provides an overview of database concepts, including relational models, data modeling, query languages, and normalization techniques. It discusses various data models, the structure and types of SQL, and the importance of normalization to eliminate redundancy and anomalies. Additionally, it covers database objects, their functions, and the advantages and disadvantages of normalization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views49 pages

UNIT I&unit-2 Material Mtech

The document provides an overview of database concepts, including relational models, data modeling, query languages, and normalization techniques. It discusses various data models, the structure and types of SQL, and the importance of normalization to eliminate redundancy and anomalies. Additionally, it covers database objects, their functions, and the advantages and disadvantages of normalization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 49

UNIT I: Introduction: Concepts and Definitions, Relational models, Data Modeling and Query Languages,

Database Objects. Normalization Techniques: Functional Dependency, 1NF, 2NF, 3NF, BCNF; Multi valued
Dependency; Loss-less Join and Dependency Preservation.

Relational Models: -
The relational model represents how data is stored in Relational Databases. A relational database consists of a
collection of tables each of which is assigned a unique name. Consider a relation STUDENT with attributes
ROLL_NO, NAME, ADDRESS, PHONE, and AGE shown in the table.

Data Modeling and Query Languages

Data Modeling

Data modeling is the process of defining the structure of a database to ensure efficient data organization,
storage, and retrieval. It involves conceptualizing real-world entities, their attributes, and relationships in a
structured format.

Types of Data Models

1. Hierarchical Model – Organizes data in a tree-like structure with parent-child relationships.


2. Network Model – Uses a graph structure where records can have multiple parent and child records.
3. Relational Model – Represents data in tables (relations) with rows (tuples) and columns (attributes).
4. Entity-Relationship (ER) Model – Uses entities, attributes, and relationships to model real-world data.
5. Object-Oriented Model – Extends relational models by incorporating object-oriented principles (e.g.,
classes, inheritance).
6. NoSQL Models – Non-relational databases like document-based (MongoDB), key-value (Redis),
column-family (Cassandra), and graph databases (Neo4j).

Entity-Relationship (ER) Model


G. E. Sastry, Assistant Professor Page 1 of 50
 Entity: A real-world object (e.g., Student, Course).
 Attributes: Properties of an entity (e.g., StudentID, Name).
 Primary Key: Uniquely identifies an entity instance.
 Relationships: Defines associations between entities (one-to-one, one-to-many, many-to-many).
 Cardinality: Specifies the number of entity occurrences in a relationship.

Query Languages

Query languages allow users to interact with databases to retrieve, manipulate, and manage data. The most
widely used query language is SQL (Structured Query Language).

Types of SQL

1. Data Definition Language (DDL) – Defines database schema.


o CREATE TABLE, ALTER TABLE, DROP TABLE
2. Data Manipulation Language (DML) – Manages and modifies data.
o INSERT, UPDATE, DELETE
3. Data Query Language (DQL) – Retrieves data from the database.
o SELECT, WHERE, ORDER BY, JOIN
4. Data Control Language (DCL) – Controls access and permissions.
o GRANT, REVOKE
5. Transaction Control Language (TCL) – Manages transactions.
o COMMIT, ROLLBACK, SAVEPOINT

Example SQL Queries

 Create Table

sql
CopyEdit
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
Name VARCHAR(50),
Age INT
);

 Insert Data

sql
CopyEdit
INSERT INTO Students (StudentID, Name, Age) VALUES (1, 'Alice', 22);

 Retrieve Data

sql
CopyEdit
SELECT * FROM Students WHERE Age > 20;

 Join Tables

sql
CopyEdit
SELECT Students.Name, Courses.CourseName
FROM Students
G. E. Sastry, Assistant Professor Page 2 of 50
JOIN Enrollments ON Students.StudentID = Enrollments.StudentID
JOIN Courses ON Enrollments.CourseID = Courses.CourseID;

NoSQL Query Languages

 MongoDB (Document-based)

json
CopyEdit
db.students.find({ "age": { "$gt": 20 } })

 Cypher (Graph Databases - Neo4j)

cypher
CopyEdit
MATCH (s:Student)-[:ENROLLED_IN]->(c:Course) RETURN s.name, c.name;

Database Objects

Database objects are the components of a database that store, manage, and manipulate data. These objects help
in organizing data efficiently and ensuring its integrity. The most common database objects include tables,
views, indexes, stored procedures, triggers, sequences, and more.

1. Tables

Tables are the core database objects where data is stored in a structured format.

 Rows (Tuples): Represent individual records.


 Columns (Attributes): Define the properties of data.
 Primary Key: Uniquely identifies each row.
 Foreign Key: Establishes relationships between tables.

Example:

sql
CopyEdit
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
Name VARCHAR(50),
Age INT,
CourseID INT,
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);

2. Views

A view is a virtual table that represents a saved SQL query. It does not store data but provides a way to simplify
complex queries.

Example:
G. E. Sastry, Assistant Professor Page 3 of 50
sql
CopyEdit
CREATE VIEW StudentDetails AS
SELECT Students.Name, Courses.CourseName
FROM Students
JOIN Courses ON Students.CourseID = Courses.CourseID;

3. Indexes

Indexes improve the speed of data retrieval operations by creating an efficient lookup mechanism.

Example:

sql
CopyEdit
CREATE INDEX idx_student_name ON Students(Name);

Types of Indexes:

 Clustered Index: Sorts the table based on key values.


 Non-clustered Index: Stores a separate structure for faster access.

4. Stored Procedures

Stored procedures are predefined SQL statements that execute complex tasks efficiently and securely.

Example:

sql
CopyEdit
CREATE PROCEDURE GetStudentDetails(@StudentID INT)
AS
BEGIN
SELECT * FROM Students WHERE StudentID = @StudentID;
END;

Execution:

sql
CopyEdit
EXEC GetStudentDetails 1;

5. Triggers

Triggers are automated database actions that execute in response to specific events, such as inserts, updates, or
deletes.

Example:

sql
CopyEdit
G. E. Sastry, Assistant Professor Page 4 of 50
CREATE TRIGGER StudentInsertTrigger
ON Students
AFTER INSERT
AS
BEGIN
PRINT 'New student record added!';
END;

6. Sequences

A sequence is used to generate unique values, often for auto-incrementing primary keys.

Example:

sql
CopyEdit
CREATE SEQUENCE StudentID_Seq
START WITH 1
INCREMENT BY 1;

Usage:

sql
CopyEdit
INSERT INTO Students (StudentID, Name, Age)
VALUES (NEXT VALUE FOR StudentID_Seq, 'Alice', 22);

7. Synonyms

A synonym is an alias for a database object, providing an easier reference.

Example:

sql
CopyEdit
CREATE SYNONYM StudentSyn FOR Students;
SELECT * FROM StudentSyn;

Normalization (Schema Refinement)

Problems Caused by Redundancy: -

 Storing the same Information Redundantly that is in more than one place within database can lead to
several Problems.
1. Redundant Storage
Some information is stored repeatedly.
2. Update anomalies
If data items are scattered and are not linked to each other properly, then there may be instances
when we try to update one data item that has copies of it scattered at several places, few instances of it get
updated properly while few are left with their old values.
This leaves database in an inconsistent state.
G. E. Sastry, Assistant Professor Page 5 of 50
This anomaly is caused due to data redundancy. Redundant information makes updates more difficult
since, for example, changing the name of the student 501 would require that all tuples containing 501 in
Regno must be updated. If for some reason, all tuples are not updated, we might have a database that gives
two names for a student, which is inconsistent information. This problem is called update anomaly. An
update anomaly results in data inconsistency.
3. Insertion anomalies
We tried to insert data in a record that does not exist at all. Inability to represent certain information-The
primary key of the above relation be (Regno, course code). Any new tuple to be inserted in the relation must
have a value for the primary key since entity integrity constraint requires that a key may not be totally or
partially NULL. However, in the given relation if one wanted to insert the code and name of a new subject
in the database, it would not be possible until a student enrols in that Course. Similarly information about a
new student cannot be inserted in the database until the student enrols in a course. These problems are called
insertion anomalies.
4. Deletion anomalies
We tried to delete a record, but parts of it left undeleted because of unawareness, the data is also saved
somewhere else. Loss of Useful Information: In some instances, useful information may be lost when a tuple
is deleted. For example, if we delete the tuple corresponding to student 502 enrolled for CS-104, we will
lose relevant information about the course i.e., course name. This is called deletion anomaly.
Example is:

Decomposition:

 Decomposition is the process of breaking down the relation R into two or more relation schemas that
each contain a subset of the attributes of R and together include all attributes in R.

G. E. Sastry, Assistant Professor Page 6 of 50


 Decomposition helps in eliminating some of the problems of bad design such as redundancy,
inconsistency and anomalies.
There are two types of decomposition
1. Lossless join decomposition
2. Lossy Decomposition

 The decomposition of R into R1 and R2 is lossy when the join of R1 and R2 does not yield the same
relation as in R.
 One of the disadvantage of decomposition into two or more tables is that some information is lost
during retrieval of original relation.
 e.g. if we have a table STUDENT (Rollno,sname,dept). If we decompose the table into two tables
one with attributes Student_info(Rollno,sname) and another as Student_dept(sname,dept). When we
join the two tables then we may get some spurious or extra tupples which makes the data
inconsistent.
Lossless Join Decomposition (Non-additive Join)

 The decomposition of R into R1 and R2 is lossless when the join of R1 and R2 yield the same
relation as in R.
 e.g. if we have a table STUDENT (Rollno,sname,dept). If we decompose the table into two tables
one with attributes Student_info(Rollno,sname) and another as Student_dept(Rollno,dept).
 When we join the two tables then we get the same relation as that of student.
 Here no spurious or extra tuples are generated.
 Hence, care must be taken before decomposing a relation into parts.
Functional dependency

Definition
 A functional dependency is a constraint between two sets of attributes from the database
 A functional dependency, denoted by X  Y, between two sets of attributes X and Y that are subsets
of R specifies a constraint on the possible tuples that can form a relation state r of R.
 The constraint is that, for any two tuples t1 and t2 if, t1 [X] =t2 [X], then we must also have t1 [Y] =
t2[Y].
 This means that the values of the Y component of a tuple depend on, or are determined by, the values
of the X component; or alternatively, the values of the X component of a tuple uniquely (or
functionally) determine the values of the Y component.
 We also say that there is a functional dependency from X to Y or that Y is functionally dependent on
X.
 The abbreviation for functional dependency is FD or f.d. The set of attributes X is called the left-
hand side of the FD, and Y is called the right-hand side.
 Functional dependency is represented by arrow sign () that is AB, where A functionally
determines B.
Determinant: Attribute or set of attributes on the left hand side of the arrow.

Dependent: Attribute or set of attributes on the right hand side of the arrow.

Properties of FD: [Armstrong's Axioms]

 If F is set of functional dependencies then the closure of F, denoted as F+, is the set of all functional
dependencies logically implied by F. Armstrong's Axioms are set of rules, when applied repeatedly
generates closure of functional dependencies.

1. Reflexive rule: If A is a set of attributes and B is a subset of A, then A holds B.


G. E. Sastry, Assistant Professor Page 7 of 50
If A ⊃ B then A 

G. E. Sastry, Assistant Professor Page 8 of 50


Augmentation rule: if ab holds and y is attribute set, then ayby also holds. That is adding
attributes in dependencies, does not change the basic dependencies.

2. Transitivity rule: Same as transitive rule in algebra, if ab holds and bc holds then ac also hold ab
is called functionally determines b

3. Decomposition rule: If X YZ then XY and X Z

4. Union rule: If X  Y and X  Z then X  YZ

5. Pseudo transitive rule: If X Y, WY Z then WX  Z

Trivial Functional Dependency

 Trivial: If an FD X->Y holds where Y subset of X , then its called TRIVIAL FD


 Non-trivial: If an FD X - >Y holds where Y not subset of X , then its called NON-TRIVIAL FD

Normalization

 If a database design is not perfect it may contain anomalies, which leads to inconsistence of database
itself. Managing a database with anomalies is next to impossible.
 Normalization is the process of efficiently organizing data in the

DB. There are two goals of the normalization process:

 Eliminate redundant data (for example, storing the same data in more than one table) and
 Ensure data dependencies make sense (only storing related data in a table).

Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is
logically stored.
Normal Forms Based on FDS:

 The normalization process, as first proposed by Codd (1972a), takes a relation schema through a
series of tests to "certify" whether it satisfies a certain normal form.

 The normal forms based on FDs are 1st normal form (1NF), second normal form (2NF), third normal
form (3NF), and Boyce-Codd normal form (BCNF).

 Normalization of data can hence be looked upon as a process of analyzing the given relation
schemas based on their FDs and primary keys to achieve the desirable properties of

1) minimizing redundancy and


2) minimizing the insertion, deletion, and update anomalies

ADVANTAGES OF NORMALIZATION

The following are the advantages of the normalization.

 More efficient data structure.

 Avoid redundant fields or columns.

 More flexible data structure i.e. we should be able to add new rows and data values easily

 Better understanding of data.


G. E. Sastry, Assistant Professor Page 9 of 50
 Ensures that distinct tables exist when necessary.

G. E. Sastry, Assistant Professor Page 10 of 50


 Easier to maintain data structure i.e. it is easy to perform operations and complex queries can be
easily handled.

 Minimizes data duplication.

 Close modeling of real world entities, processes and their relationships.

DISADVANTAGES OF NORMALIZATION

The following are disadvantages of normalization.

 You cannot start building the database before you know what the user needs.

 On Normalizing the relations to higher normal forms i.e. 4NF, 5NF the performance degrades.

 It is very time consuming and difficult process in normalizing relations of higher degree.

 Careless decomposition may leads to bad design of database which may leads to serious problems.

EXAMPLE DATA:

A company obtains parts from a number of suppliers. Each supplier is located in one city. A city can
have more than one supplier located there and each city has a status code associated with it. Each supplier
may provide many parts. The company creates a simple relational table to store this information that can be
expressed in relational notation as:
FIRST (s#, status, city, p#, qty)
where
s# supplier identifcation number (this is the primary key) status
status code assigned to city
city name of city where supplier is located p#
part number of part supplied
qty> quantity of parts supplied to date

First Normal Form [1NF]:-

Definition: A relation is said to be in 1NF if it contains atomic values and each row can provide a unique
combination of values.

 For example all the fields in the below table are atomic with single values and each row contains
unique combination of values so it in 1NF

 1NF does not allow multivalued attributes. If multivalued attributes present then the relation need to
be decomposed.

G. E. Sastry, Assistant Professor Page 11 of 50


G. E. Sastry, Assistant Professor Page 12 of 50
 Although the table FIRST is in 1NF it contains redundant data. For example, information about the
supplier's location and the location's status have to be repeated for every part supplied.

 Redundancy causes what are called update anomalies.

Update anomalies are problems that arise when information is inserted, deleted, or updated.

 INSERT. The fact that a certain supplier (s5) is located in a particular city (Athens) cannot be added
until they supplied a part.

 DELETE. If a row is deleted, then not only is the information about quantity and part lost but also
information about the supplier.

 UPDATE. If supplier s1 moved from London to New York, then six rows would have to be updated
with this new information.

Second Normal Form (2NF):-

Definition: A relation is said to be in 2NF, if it is already in 1NF and each and every attribute fully
depends on the primary key of the relation.

 Speaking inversely, if a table has some attributes which is not dependent on the primary key of that
table, then it is not in 2NF.

 That is, every non-key column must be dependent upon the entire primary key. FIRST is in 1NF but
not in 2NF because status and city are functionally dependent upon only on the column s# of the
composite primary key (s#, p#).

 This can be illustrated by listing the functional dependencies in the table:

s# city, status city


status
(s#,p#)qty
The process for transforming a 1NF table to 2NF is:

1. Identify any determinants other than the composite key, and the columns they determine.
2. Create and name a new table for each determinant and the unique columns it determines.
3. Move the determined columns from the original table to the new table. The determinate becomes the
primary key of the new table.
4. Delete the columns you just moved from the original table except for the determinate which will
serve as a foreign key.
5. The original table may be renamed to maintain semantic meaning.

To transform FIRST into 2NF we move the columns s#, status, and city to a new table called SECOND. The
column s# becomes the primary key of this new table

G. E. Sastry, Assistant Professor Page 13 of 50


 Tables in 2NF but not in 3NF still contain modification anomalies. In the example of SECOND, they
are:

 INSERT. The fact that a particular city has a certain status (Rome has a status of 50) cannot be
inserted until there is a supplier in the city.

 DELETE. Deleting any row in SUPPLIER destroys the status information about the city as well as
the association between supplier and city.

Third Normal Form (3NF):-

Definition: A relation is said to be in 3NF, if it is already in 2NF and there exists no transitive
dependency in that relation.

 Speaking inversely, if a table contains transitive dependency, then it is not in 3NF, and the table must
be split to bring it into 3NF

What is a transitive dependency? Within a relation if we see


A  B [B depends on A]
And
B  C [C depends on B]
Then we may derive
A  C [C depends on A]

Table PARTS is already in 3NF. The non-key column, qty, is fully dependent upon the primary key
(s#, p#). SUPPLIER is in 2NF but not in 3NF because it contains a transitive dependency. A transitive
dependency is occurs when a non-key column that is a determinant of the primary key is the determinate of
other columns. The concept of a transitive dependency can be illustrated by showing the functional
dependencies in SUPPLIER:

s# —> city
city —>status
Then,
s# —>status

Note that SUPPLIER.status is determined both by the primary key s# and the non-key column city. The
process of transforming a table into 3NF is:

G. E. Sastry, Assistant Professor Page 14 of 50


1. Identify any determinants, other the primary key, and the columns they determine.
2. Create and name a new table for each determinant and the unique columns it determines.

G. E. Sastry, Assistant Professor Page 15 of 50


3. Move the determined columns from the original table to the new table. The determinate becomes the
primary key of the new table.
4. Delete the columns you just moved from the original table except for the determinate which will
serve as a foreign key.
5. The original table may be renamed to maintain semantic meaning.

To transform SUPPLIER into 3NF, we create a new table called CITY_STATUS and move the
columns city and status into it. Status is deleted from the original table, city is left behind to serve as a
foreign key to CITY_STATUS, and the original table is renamed to SUPPLIER_CITY to reflect its semantic
meaning. The results are shown in Figure 3 below.

Figure 3: Tables in 3NF

The results of putting the original table into 3NF has created three tables. These can be represented in
"psuedo-SQL" as:

PARTS (#s, p#, qty) Primary


Key (s#,#p)
Foreign Key (s#) references SUPPLIER_CITY.s#

SUPPLIER_CITY(s#, city)
Primary Key (s#)
Foreign Key (city) references CITY_STATUS.city

CITY_STATUS (city, status)


Primary Key (city)

Advantages of Third Normal Form

 The advantages to having relational tables in 3NF is that it eliminates redundant data which in turn
saves space and reduces manipulation anomalies.

For example, the improvements to our sample database are:

 INSERT. Facts about the status of a city, Rome has a status of 50, can be added even though there is
not supplier in that city. Likewise, facts about new suppliers can be added even though they have not
yet supplied parts.
 DELETE. Information about parts supplied can be deleted without destroying information about a
supplier or a city.
 UPDATE. Changing the location of a supplier or the status of a city requires modifying only one
row.

Boyce-Codd normal form (BCNF)

G. E. Sastry, Assistant Professor Page 16 of 50


Definition: - A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every
Dependency (determinant) is a candidate key.

G. E. Sastry, Assistant Professor Page 17 of 50


 A relation which is in 3NF is almost always in BCNF. These could be some situation when a 3NF
relation may not be in BCNF the following conditions are found true.

1. The candidate keys are composite.


2. There are more than one candidate keys in the relation.
3. There are some common attributes in the relation.

Consider, as an example, the above relation. It is assumed that:

1. A professor can work in more than one department.


2. The percentage of the time he spends in each department is given.
3. Each department has only one Head of Department.
The relation diagram for the above relation is given as the following:

 The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are
duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information that
Rao is the Head of Department of Chemistry.

 The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and
deleting Head of Dept. form the given relation. The normalized relations are shown in the following.

G. E. Sastry, Assistant Professor Page 18 of 50


Fourth Normal Form (4NF):

 In general, a multivalued dependency occurs when a relation R has attributes A, B, and C such that A
determines a set of values for B, A determines a set of values for C, and B and C are independent of
each other.
Definition: - A relation is said to be in 4NF if it is in BCNF/3NF and contains no multi valued attributes.
(Or)
Let R be a relation schema, X and Y be nonempty subsets of the attributes of R, and F be a set of
dependencies that includes both FDs and MVDs. R is in 4NF if, for every non trivial MVD X->->Y that

 Trivial Multi value dependency :it is one MVD that satisfied in Y ⊆ X or XY = R, or (trivial)
holds over R, X is a super key.

N Now consider another table example involving Course, Student_name and text_book.

The attributes Projname' and Children name are multivalued facts about the attribute 'Ename'.

However, since a Project has no influence over the Chidren name, these multi-valued facts about 10 courses are
independent of each other. Thus the table contains an MVD. Multi-value facts are represented by.

Here, in above database following MVDs exists:

Ename --> --> Projname

Ename--> --> Children name

Here, Project and children are independent of each other. So all the anomalies occur in the above table

Solution of above anomalies with Fourth Normal Form

This problem of MVD is handled in Fourth Normal Form. To put it into 4NF, two separate tables are formed as
shown below:

G. E. Sastry, Assistant Professor Page 19 of 50


Emp-Proj(ename,projname)

Emp-Children (ename,childrenname)

5th Normal Form (5NF):

Definition: - A relation is said to be in 5NF if and only if every join dependency in relation is implied by the
candidate keys of relation.
Or
Simply we can say “A table is in fifth normal form (5NF) if it is in 4NF and it cannot have a lossless
decomposition into any number of smaller tables.

The functional dependencies are

PNO ->> SNO

SNO->>JNO

PNO->>JNO

where PNO is Project number, SNO is Supplier number and JNO is Parts number.

The above table is not in 5nf. To make it into 5nf the table is decomposed into 3 TABLES.ie. Supplier-
Project, Supplier-Parts and Project- Parts

When we join all these tables, we get the original relation.

G. E. Sastry, Assistant Professor Page 20 of 50


Desirable Properties of Decomposition

Two important properties


Lossless-join decomposition

 Required for semantic correctness


 When we join the decomposed relations, we should get back exact original relation contents
 No spurious tuples

Dependency preservation:

 It is a property of decomposition. The FD s that hold on the relation R should be preserved directly
or indirectly even after decomposition.
 i.e. we should decompose the relation R in such a way that the FDs that hold good on R must be
derived or holds good on the decomposed relations.

Candidate Keys

G. E. Sastry, Assistant Professor Page 21 of 50


 For some entities, there may be more than one field or collection of fields that can serve as the
primary key. These are called candidate keys.
UNIT II: Transaction Processing: Consistency, Atomicity, Isolation and Durability, Serializable
Schedule, Recoverable Schedule, Concurrency Control, Time-stamp based protocols, Isolation Levels, Online
Analytical Processing, Database performance
Tuning and Query optimization: Query Tree, Cost of Query, Join, Selection and Projection
Implementation Algorithms and Optimization Database Security: Access Control, MAC, RBAC,
Authorization, SQL Injection Attacks

Transaction - Single-User versus Multiuser Systems

What is a Transaction…?
 A transaction, in the context of a database, is a logical unit that is independently executed for data retrieval
or updates.
 A logical unit of work that must be either entirely completed or aborted.
Any action that reads from and/or writes to a database may consist of
o Simple SELECT statement to generate a list of table contents
o A series of related UPDATE statements to change the values of attributes in
various tables
o A series of INSERT statements to add rows to one or more tables
o A combination of SELECT, UPDATE, and INSERT statements
 Successful transaction changes the database from one consistent state to another
 Most real-world database transactions are formed by two or more database requests.
 Transaction has four properties such as: Atomicity, Consistency, isolation, durability. (ACID Properties)
Single-User versus Multiuser Systems:-

o One criterion for classifying a database system is according to the number of users who can use the system
concurrently—that is, at the same time
o A DBMS is single-user if at most one user at a time can use the system, and it is multiuser if many users
can use the system—and hence users can access the database—concurrently.
 Single Users:
o Single-user database supports one user at a time. Single-user can access the database at one point of
time. These types of systems are optimized for a personal desktop experience not for multiple users of
the system at the same time. All the resources are always available for the user to work.
o Example: Stand-alone personal computers, Microsoft Access, etc.
 Multi Users:
o Most other DBMSs are multiuser.
o Example: an airline reservations system is used by hundreds of travel agents and reservation clerks
concurrently.
o Systems in banks, insurance agencies, stock exchanges, supermarkets, and the like are also operated on
by many users who submit transactions concurrently to the system
o Multiple users can access databases—and use computer systems—simultaneously because of the
concept of multiprogramming, which allows the computer to execute multiple programs or processes at
the same time.
o If only a single central processing unit (CPU) exists, it can actually execute at most one process at a time.
o However, multiprogramming operating systems execute some commands from one process, then
suspend that process and execute some commands from the next process, and so on. A process is
resumed at the point where it was suspended whenever it gets its turn to use the CPU again. Hence,
concurrent execution of processes is actually interleaved which shows two processes A and B executing
concurrently in an interleaved fashion.
 Interleaving keeps the CPU busy when a process requires an input or output (I/O) operation, such as
reading a block from disk.
 The CPU is switched to execute another process rather than remaining idle during I/O time.
 Interleaving also prevents a long process from delaying other processes.

Fig: Interleaved processing versus parallel processing of concurrent transactions.


 If the computer system has multiple hardware processors (CPUs), parallel processing of multiple
processes is possible
Transactions, Database Items, Read and Write Operations, and DBMS Buffers

o A transaction is an executing program that forms a logical unit of database processing.


o A transaction includes one or more database access operations—these can include insertion, deletion,
modification, or retrieval operations.
o The database operations that form a transaction can either be embedded within an application program
or they can be specified interactively via a high-level query language such as SQL
o One way of specifying the transaction boundaries is by specifying explicit begin transaction and end
transaction statements in an application program
o A single application program may contain more than one transaction if it contains several transaction
boundaries.
o If the database operations in a transaction do not update the database but only retrieve data, the
transaction is called a read-only transaction; otherwise it is known as a read-write transaction.
o A database is basically represented as a collection of named data items.
o Using the simplified database model, the basic database access operations that a transaction can include
are as follows:
 read_item(X). Reads a database item named X into a program variable. To simplify our notation, we
assume that the program variable is also named X.
 write_item(X). Writes the value of program variable X into the database item named X.

Executing a read_item(X) command includes the following steps:


1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if that disk block is not already in some main
memory buffer).
3. Copy item X from the buffer to the program variable named X.
Executing a write_item(X) command includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if that disk block is not already in some main
memory buffer).
3. Copy item X from the program variable named X into its correct location in the buffer.
4. Store the updated block from the buffer back to disk (either immediately or at some later point in time).
o The DBMS will maintain in the database cache, a number of data buffers in main memory.
o Each buffer typically holds the contents of one database disk block, which contains some of the database
items being processed
o When these buffers are all occupied, and additional database disk blocks must be copied into memory,
some buffer replacement policy is used to choose which of the current buffers is to be replaced. If the
chosen buffer has been modified, it must be written back to disk before it is reused.
o The read-set of a transaction is the set of all items that the transaction reads, and the write-set is the set
of all items that the transaction writes.
o Concurrency control and recovery mechanisms are mainly concerned with the database commands in a
Transaction.
o Transactions submitted by the various users may execute concurrently and may access and update the
same database items
o If this concurrent execution is uncontrolled, it may lead to problems, such as an inconsistent database

Transaction States and Additional Operations:-

o A transaction is an atomic unit of work that should either be completed in its entirety or not done at all.
o For recovery purposes, the system needs to keep track of when each transaction starts, terminates, and
commits or aborts

Therefore, the recovery manager of the DBMS needs to keep track of the following operations:

 BEGIN_TRANSACTION : This marks the beginning of transaction execution.


 READ or WRITE : These specify read or write operations on the database items that are executed as part of
a transaction.
 END_TRANSACTION : This specifies that READ and WRITE transaction operations have ended and
marks the end of transaction execution. However, at this point it may be necessary to check whether the
changes introduced by the transaction can be permanently applied to the database (committed) or whether
the transaction has to be aborted because it violates serializability or for some other reason.
 COMMIT_TRANSACTION: This signals a successful end of the transaction so that any changes (updates)
executed by the transaction can be safely committed to the database and will not be undone.
 ROLLBACK (or ABORT): This signals that the transaction has ended unsuccessfully, so that any changes or
effects that the transaction may have applied to the database must be undone.

o A transaction goes into an active state immediately after it starts execution, where it can execute its
READ and WRITE operations
o When the transaction ends, it moves to the partially committed state
o At this point, some recovery protocols need to ensure that a system failure will not result in an inability
to record the changes of the transaction permanently
o Once this check is successful, the transaction is said to have reached its commit point and enters the
committed state.
o When a transaction is committed, it has concluded its execution successfully and all its changes must be
recorded permanently in the database, even if a system failure occurs.
o However, a transaction can go to the failed state if one of the checks fails or if the transaction is aborted
during its active state.
o The transaction may then have to be rolled back to undo the effect of its WRITE operations on the
database.
o The terminated state corresponds to the transaction leaving the system.
o The transaction information
that is maintained in system
tables while the transaction has
been running is removed when
the transaction terminates.
o Failed or aborted transactions
may be restarted later—either
automatically or after being
resubmitted by the user—as
brand new transactions
The System Log

o To be able to recover from failures that affect transactions, the system maintains a log to keep track of
all transaction operations that affect the values of database items, as well as other transaction
information that may be needed to permit recovery from failures.
o The log is a sequential, append-only file that is kept on disk, so it is not affected by any type of failure
except for disk or catastrophic failure.
o Typically, one (or more) main memory buffers hold the last part of the log file, so that log entries are
first added to the main memory buffer.
o When the log buffer is filled, or when certain other conditions occur, the log buffer is appended to the
end of the log file on disk. In addition, the log file from disk is periodically backed up to archival storage
(tape) to guard against catastrophic failures.

The following are the types of entries—called log records—that are written to the log file and the
corresponding action for each log record. In these entries, T refers to a unique transaction-id that is generated
automatically by the system for each transaction and that is used to identify each transaction:
1. [start_transaction, T]: Indicates that transaction T has started execution.
2. [write_item, T, X, old_value, new_value] : Indicates that transaction T has changed the value of
database item X from old_value to new_value.
3. [read_item, T, X] : Indicates that transaction T has read the value of database item X.
4. [commit, T]: Indicates that transaction T has completed successfully, and affirms that its effect can be
committed (recorded permanently) to the database.
5. [abort, T] : Indicates that transaction T has been aborted.

o It is possible to undo the effect of these WRITE operations of a transaction T by tracing backward
through the log and resetting all items changed by a WRITE operation of T to their old_values.
o Redo of an operation may also be necessary if a transaction has its updates recorded in the log but a
failure occurs before the system can be sure that all these new_values have been written to the actual
database on disk from the main memory buffers.
Commit Point of a Transaction

o A transaction T reaches its commit point when all its operations that access the database have been
executed successfully and the effect of all the transaction operations on the database have been recorded
in the log.
o Beyond the commit point, the transaction is said to be committed, and its effect must be permanently
recorded in the database.
o The transaction then writes a commit record [commit, T] into the log. If a system failure occurs, we can
search back in the log for all transactions T that have written a [start_transaction, T] record into the log
but have not written their [commit, T] record yet; these transactions may have to be rolled back to undo
their effect on the database during the recovery process.
o Transactions that have written their commit record in the log must also have recorded all their WRITE
operations in the log, so their effect on the database can be redone from the log records.
o It is common to keep one or more blocks of the log file in main memory buffers, called the log buffer,
until they are filled with log entries and then to write them back to disk only once, rather than writing to
disk every time a log entry is added.
o This saves the overhead of multiple disk writes of the same log file buffer.
o At the time of a system crash, only the log entries that have been written back to disk are considered in
the recovery process because the contents of main memory may be lost.
o Hence, before a transaction reaches its commit point, any portion of the log that has not been written to
the disk yet must now be written to the disk. This process is called force-writing the log buffer before
committing a transaction.
Desirable Properties of Transactions

Transactions should possess several properties, often called the ACID properties; they should be
enforced by the concurrency control and recovery methods of the DBMS. The following are the ACID
properties:
 Atomicity. A transaction is an atomic unit of processing; it
should either be performed in its entirety or not performed at
all.
 Consistency preservation. A transaction should be
consistency preserving, meaning that if it is completely
executed from beginning to end without interference from
other transactions, it should take the database from one
consistent state to another.
 Isolation. A transaction should appear as though it is being
executed in isolation from other transactions, even though many transactions are executing concurrently.
That is, the execution of a transaction should not be interfered with by any other transactions executing
concurrently.
 Durability or permanency. The changes applied to the database by a committed transaction must persist in
the database. These changes must not be lost because of any failure
Serializability

When more than one transaction is executed by the operating system in a multiprogramming environment,
there are possibilities that instructions of one transactions are interleaved with some other transaction.
Schedule: A schedule (or history) S of transactions (T1, T2 …TN) is ...an interleaving of the set of actions
contained in (T1, T2 .. TN) such that the actions of any single transaction are in order.
Ex. If T1 has (a, b, c) operations and T2 has (p, q, r, s) operations then possible schedules are,
S1: (a, b, p, c, q, r, s)
S2: (p, a, q, b, c, r, s)
Serial Schedule: A schedule in which transactions are aligned in such a way that one transaction is executed
first. When the first transaction completes its cycle then next transaction is executed. Transactions are ordered
one after other. This type of schedule is called serial schedule as transactions are executed in a serial manner.

Characterizing Schedules Based on Serializability


Suppose that two users—for example, two airline reservations agents—submit to the DBMS transactions T1
and T2 at approximately the same time. If no interleaving of operations is permitted, there are only two possible
outcomes:
1. Execute all the operations of transaction T1 (in sequence) followed by all the operations of transaction
T2 (in sequence).
2. Execute all the operations of transaction T2 (in sequence) followed by all the operations of transaction
T1 (in sequence).
These two schedules—called serial schedules.
If interleaving of operations is allowed, there will be many possible orders in which the system can
execute the individual operations of the transactions.
The concept of serializability of schedules is used to identify which schedules are correct when
transaction executions have interleaving of their operations in the schedules.

Figure:

Examples of serial and nonserial schedules involving transactions T1 and T2. (a) Serial schedule A: T1 followed
by T2. (b) Serial schedule B: T2 followed by T1. (c) Two non serial schedules C and D with interleaving of
operations.

o Assume that the initial values of database items are X = 90 and Y = 90 and that N = 3 and M = 2. After
executing transactions T1 and T2, we would expect the database values to be X = 89 and Y = 93,
according to the meaning of the transactions.
o Sure enough, executing either of the serial schedules A or B gives the correct results.
o Now consider the non serial schedules C and D. Schedule gives the results X = 92 and Y = 93, in which
the X value is erroneous, whereas schedule D gives the correct results.

Serial, Non serial, and Conflict-Serializable Schedules


o Formally, a schedule S is serial if, for every transaction T participating in the schedule, all the operations
of T are executed consecutively in the schedule; otherwise, the schedule is called non serial.
o Therefore, in a serial schedule, only one transaction at a time is active—the commit (or abort) of the
active transaction initiates execution of the next transaction.
o No interleaving occurs in a serial schedule.
o As long as every transaction is executed from beginning to end in isolation from the operations of other
transactions, we get a correct end result on the database.
o The problem with serial schedules is that they limit concurrency by prohibiting interleaving of operations.
o In a serial schedule, if a transaction waits for an I/O operation to complete, we cannot switch the CPU
processor to another transaction, thus wasting valuable CPU processing time.
o Additionally, if some transaction T is quite long, the other transactions must wait for T to complete all
its operations before starting.
o Hence, serial schedules are considered unacceptable in practice. However, if we can determine which
other schedules are equivalent to a serial schedule, we can allow these schedules to occur.

The definition of serializable schedule is as follows:

 A schedule S of n transactions is serializable if it is equivalent to some serial schedule of the same n


transactions.
o Notice that there are n! possible serial schedules of n transactions and many more possible non serial
schedules.
o We can form two disjoint groups of the non serial schedules—those that are equivalent to one (or more)
of the serial schedules and hence are serializable, and those that are not equivalent to any serial schedule
and hence are not serializable.
o Two schedules are called result equivalent if they produce the same final state of the database. However,
two different schedules
may accidentally produce the
same final state.
o Two schedules are said to
be conflict equivalent if the
order of any two
conflicting
operations is the same in both schedules.
o We define a schedule S to be conflict serializable if it is conflict equivalent to some serial schedule S`.
In such a case, we can reorder the non-conflicting operations in S until we form the equivalent serial
schedule S`

Testing for Conflict Serializability of a Schedule

o There is a simple algorithm for determining whether a particular schedule is conflict serializable or not.
o Most concurrency control methods do not actually test for serializability.
o Rather protocols, or rules, are developed that guarantee that any schedule that follows these rules will be
serializable.
o The algorithm looks at only the read_item and write_item operations in a schedule to construct a
precedence graph (or serialization graph), which is a directed graph G = (N, E) that consists of a set of
nodes N = {T1, T2, ..., Tn } and a set of directed edges E = {e1, e2, ..., em }.
o There is one node in the graph for each transaction Ti in the schedule. Each edge ei in the graph is of the
form (Tj → Tk ), 1 ≤ j ≤ n, 1 ≤ k ≤ n, where Tj is the starting node of ei and Tk is the ending node of ei .
o Such an edge from node Tj to node Tk is created by the algorithm if one of the operations in Tj appears
in the schedule before some conflicting operation in Tk.
Algorithm. Testing Conflict Serializability of a Schedule S
1) For each transaction Ti participating in schedule S, create a node labeled Ti in the precedence graph.
2) For each case in S where Tj executes a read_item(X) after Ti executes a write_item(X), create an edge (Ti
→ Tj ) in the precedence graph.
3) For each case in S where Tj executes a write_item(X) after Ti executes a read_item(X), create an edge
(Ti → Tj ) in the precedence graph.
4) For each case in S where Tj executes a write_item(X) after Ti executes write_item(X), create an edge (Ti
→ Tj ) in the precedence graph.
5) The schedule S is serializable if and only if the precedence graph has no cycles

o If there is a cycle in the precedence graph, schedule S is not (conflict) serializable; if there is no cycle, S
is serializable
o If there is no cycle in the precedence graph, we can create an equivalent serial schedule S` that is
equivalent to S, by ordering the transactions that participate in S as follows: Whenever an edge exists in
the precedence graph from Ti to Tj , Ti must appear before Tj in the equivalent serial schedule S`.

Figure Constructing the precedence graphs for


schedules A to D to test for conflict serializability.
(a) Precedence graph for serial schedule A.
(b) Precedence graph for serial schedule B.
(c) Precedence graph for schedule C (not serializable).
(d) Precedence graph for schedule D (serializable,
equivalent to schedule A).

How Serializability Is Used for Concurrency Control

o A serial schedule represents inefficient processing because no interleaving of operations from different
transactions is permitted.
o This can lead to low CPU utilization while a transaction waits for disk I/O, or for another transaction to
terminate, thus slowing down processing considerably.
o A serializable schedule gives the benefits of concurrent execution. In practice, it is quite difficult to test
for the serializability of a schedule.
o The interleaving of operations from concurrent transactions—which are usually executed as processes
by the operating system—is typically determined by the operating system scheduler which allocates
resources to all processes.
o Factors such as system load, time of transaction submission, and priorities of processes contribute to the
ordering of operations in a schedule.
o Hence, it is difficult to determine how the operations of a schedule will be interleaved beforehand to
ensure serializability.

Fig : The read and write operations of three transactions T1, T2, and T3
Another example of serializability testing.
(d) Precedence graph for schedule E.
(e) Precedence graph for schedule F.
(f) Precedence graph with two equivalent serial schedules.
Two-Phase Locking Techniques for Concurrency Control

o A lock is a variable associated with a data item that describes the status of the item with respect to
possible operations that can be applied to it.
o Generally, there is one lock for each data item in the database.
o Locks are used as a means of synchronizing the access by concurrent transactions to the database items.
o Two problems associated with the use of locks deadlock and starvation and show how these problems
are handled in concurrency control protocols.
Types of Locks and System Lock Tables
Binary Locks.
1. Shared/Exclusive (or Read/write) Locks.
2. Conversion of Locks.
Binary Locks

A binary lock can have two states or values: locked and unlocked (or 1 and 0, for simplicity). A distinct
lock is associated with each database item X.
If the value of the lock on X is 1, item X cannot be accessed by a database operation that requests the item. If
the value of the lock on X is 0, the item can be accessed when requested, and the lock value is changed
to 1.
Two operations, lock_item and unlock_item, are used with binary locking.
A transaction requests access to an item X by first issuing a lock_item(X) operation.
If LOCK(X) = 1, the transaction is forced to wait.
If LOCK(X) = 0, it is set to 1 (the transaction locks the item) and the transaction is allowed to access item
X.
When the transaction is through using the item, it issues an unlock_item(X) operation, which sets
LOCK(X) back to 0 (unlocks the item) so that X may be accessed by other transactions. Hence, a binary lock
enforces mutual exclusion on the data item.
A description of the lock_item(X) and unlock_item(X) operations is shown below

It is quite simple to implement a binary


lock, all that is needed is a binary-valued
variable, LOCK, associated with each data item
X in the database.
In its simplest form, each lock can be a
record with three fields: plus a queue for
transactions that are waiting to access the item.
The system needs to maintain only these
records for the items that are currently locked in a
lock table, which could be organized as a hash
file on the item name.

Items not in the lock table are considered to be unlocked.


Figure. Lock and unlock operations for binary locks.

The DBMS has a lock manager subsystem to keep track of and control access to locks.

If the simple binary locking scheme described here is used, every transaction must obey the following rules:

1. A transaction T must issue the operation lock_item(X) before any read_item(X) or write_item(X)
operations are performed in T.
2. A transaction T must issue the operation unlock_item(X) after all read_item(X) and write_item(X)
operations are completed in T.
3. A transaction T will not issue a lock_item(X) operation if it already holds the lock on item X.
4. A transaction T will not issue an unlock_item(X) operation unless it already holds the lock on item X.
Between the lock_item(X) and unlock_item(X) operations in transaction T, T is said to hold the lock on
item X. At most one transaction can hold the lock on a particular item. Thus no two transactions can access the
same item concurrently.
Shared/Exclusive (or Read/Write) Locks: The preceding binary locking scheme is too restrictive for
database items because at most, one transaction can hold a lock on a given item. We should allow several
transactions to access the same item X if they all access X for reading purposes only.
This is because read operations on the same item by different transactions are not conflicting. However,
if a transaction is to write an item X, it must have exclusive access to X.
For this purpose, a different type of lock called a multiple-mode lock is used. In this scheme called
shared/exclusive or read/write locks there are three locking operations: read_lock(X), write_lock(X), and
unlock(X).
A lock associated with an item X, LOCK(X),
now has three possible states: read-locked, write-
locked, or unlocked.
A read-locked item is also called share-
locked because other transactions are allowed to
read the item, whereas a write-locked item is called
exclusive- locked because a single transaction
exclusively holds the lock on the item.
Each record in the lock table will have four
fields: Again, to save space, the system needs to
maintain lock records only for locked items in the
lock table
The value (state) of LOCK is either read-
locked or write-locked, suitably coded (if we assume
no records are kept in the lock table for unlocked
items).
Fig: Locking and unlocking operations for two
mode (read-write or shared-
exclusive) locks.

If LOCK(X)=write-locked, the value of


locking_transaction(s) is a single transaction that
holds the exclusive (write) lock on X.
If LOCK(X)=read-locked, the value of
locking transaction(s) is a list of one or more
transactions that hold the shared (read) lock on X.
As before, each of the three locking operations should be considered indivisible; no interleaving should
be allowed once one of the operations is started until either the operation terminates by granting the lock or the
transaction is placed in a waiting queue for the item
When we use the shared/exclusive locking scheme, the system must enforce the following rules:
 A transaction T must issue the operation read_lock(X) or write_lock(X) before any read_item(X)
operation is performed in T.
 A transaction T must issue the operation write_lock(X) before any write_item(X) operation is
performed in T.
 A transaction T must issue the operation unlock(X) after all read_item(X) and write_item(X)
operations are completed in T.
 A transaction T will not issue a read_lock(X) operation if it already holds a read (shared) lock or a
write (exclusive) lock on item X.
 A transaction T will not issue a write_lock(X) operation if it already holds a read (shared) lock or
write (exclusive) lock on item X.
 A transaction T will not issue an unlock(X) operation unless it already holds a read (shared) lock or a
write (exclusive) lock on item X.
Conversion of Locks

o Sometimes it is desirable to relax conditions 4 and 5 in the preceding list in order to allow lock
conversion; that is, a transaction that already holds a lock on item X is allowed under certain conditions
to convert the lock from one locked state to another.
o For example, it is possible for a transaction T to issue a read_lock(X) and then later to upgrade the lock
by issuing a write_lock(X) operation.
o If T is the only transaction holding a read lock on X at the time it issues the write_lock(X) operation, the
lock can be upgraded; otherwise, the transaction must wait.
o It is also possible for a transaction T to issue a write_lock(X) and then later to downgrade the lock by
issuing a read_lock(X) operation.
o When upgrading and downgrading of locks is used, the lock table must include transaction identifiers in
the record structure for each lock (in the locking_transaction(s) field) to store the information on which
transactions hold locks on the item.
o The descriptions of the read_lock(X) and write_lock(X) operations must be changed appropriately to
allow for lock upgrading and downgrading.
Concurrency Control Based on Timestamp Ordering

o The use of locks, combined with the 2PL protocol, guarantees serializability of schedules.
o The serializable schedules produced by 2PL have their equivalent serial schedules based on the order in
which executing transactions lock the items they acquire.
o If a transaction needs an item that is already locked, it may be forced to wait until the item is released.
o Some transactions may be aborted and restarted because of the deadlock problem.
o A different approach that guarantees serializability involves using transaction timestamps to order
transaction execution for an equivalent serial schedule
Timestamps

o Typically, timestamp values are assigned in the order in which the transactions are submitted to the
system, so a timestamp can be thought of as the transaction start time.
o Concurrency control techniques based on timestamp ordering do not use locks; hence, deadlocks cannot
occur.
o Timestamps can be generated in several ways.
o One possibility is to use a counter that is incremented each time its value is assigned to a transaction.
o The transaction timestamps are numbered 1, 2, 3, ... in this scheme.
o A computer counter has a finite maximum value, so the system must periodically reset the counter to
zero when no transactions are executing for some short period of time.
o Another way to implement timestamps is to use the current date/time value of the system clock and
ensure that no two timestamp values are generated during the same tick of the clock.
The Timestamp Ordering Algorithm

o The idea for this scheme is to order the transactions based on their timestamps.
o A schedule in which the transactions participate is then serializable, and the only equivalent serial
schedule permitted has the transactions in order of their timestamp values.
o This is called timestamp ordering (TO).
o The algorithm must ensure that, for each item accessed by conflicting operations in the schedule, the
order in which the item is accessed does not violate the timestamp order.
o To do this, the algorithm associates with each database item X two timestamp (TS) values:
1. read_TS(X). The read timestamp of item X is the largest timestamp among all the timestamps of
transactions that have successfully read item X—that is, read_TS(X) = TS(T), where T is the youngest
transaction that has read X successfully.
2. write_TS(X). The write timestamp of item X is the largest of all the timestamps of transactions that have
successfully written item X—that is, write_TS(X) = TS(T), where T is the youngest transaction that has written
X successfully.
Basic Timestamp Ordering (TO): Whenever some transaction T tries to issue a read_item(X) or a write_item(X)
operation, the basic TO algorithm compares the timestamp of T with read_TS(X) and write_TS(X) to ensure
that the timestamp order of transaction execution is not violated.
If this order is violated, then transaction T is aborted and resubmitted to the system as a new transaction
with a new timestamp.
If T is aborted and rolled back, any transaction T1 that may have used a value written by T must also be
rolled back
Similarly, any transaction T2 that may have used a value written by T1 must also be rolled back, and so
on. This effect is known as cascading rollback and is one of the problems associated with basic TO, since the
schedules produced are not guaranteed to be recoverable. An additional protocol must be enforced to ensure that
the schedules are recoverable, cascade less, or strict.
The concurrency control algorithm must check whether conflicting operations violate the timestamp ordering in
the following two cases:

1. Whenever a transaction T issues a write_item(X) operation, the following is checked:


a. If read_TS(X) > TS(T) or if write_TS(X) > TS(T), then abort and roll back T and reject the operation.
This should be done because some younger transaction with a timestamp greater than TS(T)—and hence
after T in the timestamp ordering—has already read or written the value of item X before T had a chance
to write X, thus violating the timestamp ordering.
b. If the condition in part (a) does not occur, then execute the write_item(X) operation of T and set
write_TS(X) to TS(T).
2. Whenever a transaction T issues a read_item(X) operation, the following is checked:
a. If write_TS(X) > TS(T), then abort and roll back T and reject the operation. This should be done
because some younger transaction with timestamp greater than TS(T)—and hence after T in the
timestamp ordering—has already written the value of item X before T had a chance to read X.
b. If write_TS(X) ≤ TS(T), then execute the read_item(X) operation of T and set read_TS(X) to the
larger of TS(T) and the current read_TS(X).
o Whenever the basic TO algorithm detects two conflicting operations that occur in the incorrect order, it
rejects the later of the two operations by aborting the transaction that issued it.
o The schedules produced by basic TO are hence guaranteed to be conflict serializable, like the 2PL
protocol.
o However, some schedules are possible under each protocol that are not allowed under the other.
o Thus, neither protocol allows all possible serializable schedules.
Strict Timestamp Ordering (TO):

o A variation of basic TO called strict TO ensures that the schedules are both strict (for easy
recoverability) and (conflict) serializable.
o In this variation, a transaction T that issues a read_item(X) or write_item(X) such that TS(T) >
write_TS(X) has its read or write operation delayed until the transaction Tthat wrote the value of X
(hence TS(T) = write_TS(X)) has committed or aborted.
o To implement this algorithm, it is necessary to simulate the locking of an item X that has been written
by transaction T until Tis either committed or aborted. This algorithm does not cause deadlock, since T
waits for T only if TS(T) > TS(T).
Thomas’s Write Rule:

A modification of the basic TO algorithm, known as Thomas’s write rule, does not enforce conflict
serializability, but it rejects fewer write operations by modifying the checks for the write_item(X) operation as
follows:

1. If read_TS(X) > TS(T), then abort and roll back T and reject the operation.
2. If write_TS(X) > TS(T), then do not execute the write operation but continue processing. This is because
some transaction with timestamp greater than TS(T)—and hence after T in the timestamp ordering—has
already written the value of X. Thus, we must ignore the write_item(X) operation of T because it is already
outdated and obsolete. Notice that any conflict arising from this situation would be detected by case (1).
3. If neither the condition in part (1) nor the condition in part (2) occurs, then execute the write_item(X)
operation of T and set write_TS(X) to TS(T).
o To implement this algorithm, it is necessary to simulate the locking of an item X
that has been written by transaction T until Tis either committed or aborted. This
algorithm does not cause deadlock, since T waits for T only if TS(T) > TS(T).
Thomas’s Write Rule:

A modification of the basic TO algorithm, known as Thomas’s write rule, does not
enforce conflict serializability, but it rejects fewer write operations by modifying the
checks for the write_item(X) operation as follows:

4. If read_TS(X) > TS(T), then abort and roll back T and reject the operation.
5. If write_TS(X) > TS(T), then do not execute the write operation but continue
processing. This is because some transaction with timestamp greater than TS(T)—and
hence after T in the timestamp ordering—has already written the value of X. Thus,
we must ignore the write_item(X) operation of T because it is already outdated and
obsolete. Notice that any conflict arising from this situation would be detected by
case (1).
6. If neither the condition in part (1) nor the condition in part (2) occurs, then execute the
write_item(X) operation of T and set write_TS(X) to TS(T).

You might also like