Module 6
Module 6
Transaction
Transaction
o The transaction is a set of logically related operation. It contains a group of tasks.
o A transaction is an action or series of actions. It is performed by a single user to
perform operations for accessing the contents of the database.
Operations of Transaction:
Following are the main operations of transaction:
1. Read Operation
Read operation is used to read the value of X from the database and
stores it in a buffer in main memory.
Example: Data selection or retrieval language
Select * from student;
2. Write(X):
Write operation is used to write the value back to the database from the
buffer.
Example: Data Manipulation Language
Update student
SET Name=’Bhavna’
Where Sid=1186;
Let's take an example to debit transaction from an account which consists of following
operations:
R(X);
X = X - 500;
W(X);
1) Atomicity
The term atomicity defines that the data remains atomic. It means if any operation is
performed on the data, either it should be performed or executed completely or should
not be executed at all. It further means that the operation should not break in between
or execute partially. In the case of executing operations on the transaction, the
operation should be completely executed and not partially.
Example:
In the above diagram, it can be seen that after crediting $10, the amount is still $100 in
account B. So, it is not an atomic transaction.
The below image shows that both debit and credit operations are done successfully.
Thus the transaction is atomic.
Thus, when the amount loses atomicity, then in the bank systems, this becomes a huge
issue, and so the atomicity is the main focus in the bank systems.
2) Consistency
The word consistency means that the value should remain preserved always. In DBMS,
the integrity of the data should be maintained, which means if a change in the database
is made, it should remain preserved always. In the case of transactions, the integrity of
the data is very essential so that the database remains consistent before and after the
transaction. The data should always be correct.
Example:
In the above figure, there are three accounts, A, B, and C, where A is making a
transaction T one by one to both B & C. There are two operations that take place, i.e.,
Debit and Credit. Account A firstly debits $50 to account B, and the amount in account A
is read $300 by B before the transaction. After the successful transaction T, the available
amount in B becomes $150. Now, A debits $20 to account C, and that time, the value
read by C is $250 (that is correct as a debit of $50 has been successfully done to B). The
debit and credit operation from account A to C has been done successfully. We can see
that the transaction is done successfully, and the value is also read correctly. Thus, the
data is consistent. In case the value read by B and C is $300, which means that data is
inconsistent because when the debit operation executes, it will not be consistent.
3) Isolation
The term 'isolation' means separation. In DBMS, Isolation is the property of a database
where no data should affect the other one and may occur concurrently. In short, the
operation on one database should begin when the operation on the first database gets
complete. It means if two operations are being performed on two different databases,
they may not affect the value of one another. In the case of transactions, when two or
more transactions occur simultaneously, the consistency should remain maintained. Any
changes that occur in any particular transaction will not be seen by other transactions
until the change is not committed in the memory.
Example: If two operations are concurrently running on two different accounts, then the
value of both accounts should not get affected. The value should remain persistent. As
you can see in the below diagram, account A is making T1 and T2 transactions to
account B and C, but both are executing independently without affecting each other. It
is known as Isolation.
4) Durability
Durability ensures the permanency of something. In DBMS, the term durability ensures
that the data after the successful execution of the operation becomes permanent in the
database. The durability of the data should be so perfect that even if the system fails or
leads to a crash, the database still survives. However, if gets lost, it becomes the
responsibility of the recovery manager for ensuring the durability of the database. For
committing the values, the COMMIT command must be used every time we make
changes.
Transaction States
A transaction is a unit of database processing which contains a set of operations.
For example, deposit of money, balance enquiry, reservation of tickets etc.
Every transaction starts with delimiters begin transaction and terminates with
end transaction delimiters. The set of operations within these two delimiters
constitute one transaction.
A transaction is divided into states to handle various situations such as failure. It
passes through various states during its lifetime. The state of a transaction is
defined by the current activity it is performing.
Active state
o The active state is the first state of every transaction. In this state, the transaction is being
executed.
o For example: Insertion or deletion or updating a record is done here. But all the records
are still not saved to the database.
Partially committed
o In the partially committed state, a transaction executes its final operation, but the data is
still not saved to the database.
o In the total mark calculation example, a final display of the total marks step is executed
in this state.
Committed
o A transaction is said to be in a committed state if it executes all its operations successfully. In
this state, all the effects are now permanently saved on the database system.
Failed state
o If any of the checks made by the database recovery system fails, then the transaction is
said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to
fetch the marks, then the transaction will fail to execute.
Aborted
o If the transaction fails in the middle of the transaction then before executing the
transaction, all the executed transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the two
operations:
1. Re-start the transaction
2. Kill the transaction
TCL Commands
Refer the theory, Syntax and example of above TCL commands from the pdf of
Experiment no. 7
Transactions Schedule
A series of operation from one transaction to another transaction is known as
schedule. It is used to preserve the order of the operation in each of the
individual transaction.
The transaction that successfully completes its execution will have to commit all
instructions executed by it at the end of execution.
Serial Executions/Transactions/Schedule
o The serial schedule is a type of schedule where one transaction is executed
completely before starting another transaction.
o In the serial schedule, when the first transaction completes its cycle, then the next
transaction is executed.
o
o In the given (a) figure, Schedule A shows the serial schedule where T1 followed
by T2.
o In the given (b) figure, Schedule B shows the serial schedule where T2 followed
by T1.
Concurrent Execution/Transactions/Schedule
o If interleaving of operations is allowed, then there will be concurrent schedule.
o It contains many possible orders in which the system can execute the individual
operations of the transactions.
o In the given figure (c) and (d), Schedule C and Schedule D are the non-serial schedules. It
has interleaving of operations.
o In the given figure (c) and (d), Schedule C and Schedule D are the non-serial
schedules. It has interleaving of operations.
Advantages
The advantages of concurrent Execution are as follows −
Waiting time will be decreased.
Response time will decrease.
Resource utilization will increase.
System performance & Efficiency is increased.
Serializable schedule
o The serializability of schedules is used to find non-serial schedules that allow the
transaction to execute concurrently without interfering with one another.
o It identifies which schedules are correct when executions of the transaction have
interleaving of their operations.
o Serializability of any non-serial schedule can be verified using two types mainly:
Conflict Serializability
View Serializability.
Conflict Equivalent
In the conflict equivalent, one can be transformed to another by swapping non-
conflicting operations. In the given example, S2 is conflict equivalent to S1 (S1 can be
converted to S2 by swapping non-conflicting operations).
T1 T2
Read(A)
Write(A)
Read(B)
Write(B)
Read(A)
Write(A)
Read(B)
Write(B)
View Serializability
o A schedule will view serializable if it is view equivalent to a serial schedule.
o If a schedule is conflict serializable, then it will be view serializable.
o The view serializable which does not conflict serializable contains blind writes.
Example:
Schedule S
Schedule S1
In both schedules S and S1, there is no read except the initial read that's why we don't
need to check that condition.
The initial read operation in S is done by T1 and in S1, it is also done by T1.
The final write operation in S is done by T3 and in S1, it is also done by T3. So, S and S1
are view Equivalent.
The first schedule S1 satisfies all three conditions, so we don't need to check another
schedule.
o Explanation:
o The precedence graph for schedule S1 contains a cycle that's why Schedule S1 is
non-serializable.
Explanation:
The precedence graph for schedule S2 contains no cycle that's why ScheduleS2 is
serializable.
Concurrency Control
o Concurrency Control is the management procedure that is required for controlling
concurrent execution of the operations that take place on a database.
o The DBMS uses concurrency control to manage multi-user database.
o In a multi-user system, multiple users can access and use the same database at
one time, which is known as the concurrent execution of the database. It means
that the same database is executed simultaneously on a multi-user system by
different users.
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires
an appropriate lock on it.
Locking is necessary in a concurrent environment to assure that one process should
not retrieve or update a record which another process is updating.
There are two types of lock:
o Shared Lock
o Exclusive Lock
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.]]
o It can be shared between the transactions because when the transaction holds a lock, then
it can't update the data on the data item.
o If transaction T1 has obtained a shared lock on data item X, then transaction T1 can only
read data item X, but cannot write on data item X.
o Example
LOCK TABLE customer IN
SHARED MODE;
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by the transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
o If transaction T1 has obtained exclusive lock on data item X, then transaction T1 can read
data item X and also can write on data item X.
o Example
LOCK TABLE custo.mer IN
EXCLUSIVE MODE;
If read and write operations introduce the first unlock operation in the
transaction, then it is said to be Two-Phase Locking Protocol.
The transaction can release the shared lock after the lock point.
The transaction cannot release any exclusive lock until the transaction
commits.
In strict two-phase locking protocol, if one transaction rollback then the
other transaction should also have to roll back. The transactions are
dependent on each other. This is called Cascading schedule.
The transaction cannot release either of the locks, i.e., neither shared lock
nor exclusive lock.
Serailizability is guaranteed in a Rigorous two-phase locking protocol.
Deadlock is not guaranteed in rigorous two-phase locking protocol.
Advantages
TO protocol ensures serializability since the precedence graph is as follows:
TS protocol ensures freedom from deadlock that means no transaction ever waits.
But the schedule may not be recoverable and may not even be cascade- free.
Disadvantages
The schedule here is not necessarily recoverable or free of a cascade.
Thomas write Rule
Thomas Write Rule provides the guarantee of serializability order for the protocol.
It improves the Basic Timestamp Ordering Algorithm.
The basic Thomas write rules are as follows:
o If TS(T) < R_TS(X) then transaction T is aborted and rolled back, and
operation is rejected.
o If TS(T) < W_TS(X) then don't execute the W_item(X) operation of the
transaction and continue processing.
o If neither condition 1 nor condition 2 occurs, then allowed to execute
the WRITE operation by transaction Ti and set W_TS(X) to TS(T).
Outdated Write Example:
The main update in Thomas Write Rule is ignoring the Obsolete Write
Operations. This is done because some transaction with a timestamp greater
than TS(T) (i.e., a transaction after T in TS ordering) has already written the value
of X. Hence, logically user can ignore the Write(X) operation of T which
becomes obsolete. Let us see this through an example:
Suppose a user has a schedule in which two transactions T 1 and T2. Now, TS(T2)
< TS(T1). This means T1 arrived after T2 and hence has a larger TS value than T 2.
This implies that the serializability of the schedule allowed is T2 –> T1. Consider
the partial schedule given below:
Obsolete Writes are hence ignored in this rule which is in accordance with the
2nd protocol. It seems to be more logical as users skip an unnecessary
procedure of restarting the entire transaction. This protocol is just a
modification to the Basic TO protocol.
Log-Based Recovery
o The log is a sequence of records. Log of each transaction is maintained in some
stable storage so that if any failure occurs, then it can be recovered from there.
o If any operation is performed on the database, then it will be recorded in the log.
o But the process of storing the logs should be done before the actual transaction is
applied in the database.
o There are several types of log record
1. Initial Log record
To start the transaction initial transaction is recorded.
Log Record: <Tn Strart>
Example: <T1 Start>
Transaction T1 is started
2. Update log record
An update log record describes a single database write.
Log Record: <Tn, X, V1, V2>
Example: <T1, A, 100, 500>
3. Completion log record
It notes that all work has been done for this particular
transaction. (It has been fully committed or aborted)
Example: <T1 Commit> - T1 is committed to server
<T1 Rollback> - T1 abort its transaction.
4. Checkpoint record
It notes that a checkpoint has been made.
These are used to speed up recovery.
It marks transaction status.
Example: <T1 Checkpoint A> - Transaction T1 is committed
to server.
Checkpoint
o The checkpoint is a type of mechanism where all the previous logs are removed from the
system and permanently stored in the storage disk.
o The checkpoint is like a bookmark. While the execution of the transaction, such
checkpoints are marked, and the transaction is executed then using the steps of the
transaction, the log files will be created.
o When it reaches to the checkpoint, then the transaction will be updated into the database,
and till that point, the entire log file will be removed from the file. Then the log file is
updated with the new step of transaction till next checkpoint and so on.
o The checkpoint is used to declare a point before which the DBMS was in the consistent
state, and all transactions were committed.
Recovery using Checkpoint
In the following manner, a recovery system recovers the database from this failure:
COMMIT
START
COMMIT
START
COMMIT
START
FAILURE
o The recovery system reads log files from the end to start. It reads log files from T4 to T1.
o Recovery system maintains two lists, a redo-list, and an undo-list.
o For example: In the log file, transaction T2 and T3 will have <Tn, Start> and <Tn,
Commit>. The T1 transaction will have only <Tn, commit> in the log file. That's why the
transaction is committed after the checkpoint is crossed. Hence it puts T1, T2 and T3
transaction into redo list.
o The transaction is put into undo state if the recovery system sees a log with <Tn, Start>
but no commit or abort log found. In the undo-list, all the transactions are undone, and
their logs are removed.
Advantages
Database Storage Checkpoints can only be used to restore from logical errors (for
example, a human error).
Because all the data blocks are on the same physical device, Database Storage
Checkpoints cannot be used to restore files due to a media failure.
Shadow Paging
Shadow paging is one of the techniques that is used to recover from failure.
We all know that recovery means to get back the information, which is lost.
It helps to maintain database consistency in case of failure.
This is where the database is divided into pages that may be stored in any order
on the disk.
In order to identify the location of any given page, we use something called a page
table.
Shadow Paging is recovery technique that is used to recover database.
In this recovery technique, database is considered as made up of fixed size of
logical units of storage which are referred as pages.
Pages are mapped into physical blocks of storage, with help of the page
table which allow one entry for each logical page of database.
This method uses two page tables named current page table and shadow page
table.
The entries which are present in current page table are used to point to
most recent database pages on disk. Another table i.e., Shadow page
table is used when the transaction starts which is copying current page
table.
After this, shadow page table gets saved on disk and current page table
is going to be used for transaction.
Entries present in current page table may be changed during execution
but in shadow page table it never gets changed. After transaction, both
tables become identical.
This technique is also known as Cut-of-Place updating.
In this image 2 write operations are performed on page 3 and 5. Before start of
write operation on page 3, current page table points to old page 3. When write
operation starts following steps are performed:
o Firstly, search start for available free block in disk blocks.
o After finding free block, it copies page 3 to free block which is
represented by Page 3 (New).
o Now current page table points to Page 3 (New) on disk but shadow page
table points to old page 3 because it is not modified.
o The changes are now propagated to Page 3 (New) which is pointed by
current page table.
Advantages:
This method requires fewer disk accesses to perform operation.
In this method, recovery from crash is inexpensive and quite fast.
There is no need of operations like- Undo and Redo.
Disadvantages:
Data Fragmentation
Commit overhead
Garbage Collection
ARIES Algorithm
ARIES stands for Algorithm for Recovery and Isolation Exploiting Semantics.
ARIES is a recovery algorithm that is designed for no force type of backup
approach.
Principles of ARIES Algorithm
1. Write-ahead logging:
Any change to an object is first recorded in the log, and the log
must be written to stable storage before changes to the object are written
to disk.
2. Repeating history during Redo:
On restart after a crash, ARIES retraces the actions of a database
before the crash and brings the system back to the exact state that it was
in before the crash. Then it undoes the transactions still active at crash
time.
3. Logging changes during Undo:
Changes made to the database while undoing transactions are
logged to ensure such an action isn't repeated in the event of repeated
restarts.
Deadlock Prevention
o Deadlock prevention method is suitable for a large database. If the resources are
allocated in such a way that deadlock never occurs, then the deadlock can be
prevented.
o The Database management system analyzes the operations of the transaction
whether they can create a deadlock situation or not. If they do, then the DBMS
never allowed that transaction to be executed.
o Deadlock prevention using timestamp:
1. Wait-Die Scheme
2. Wound-wait
Wait-Die scheme
In this scheme, if a transaction requests for a resource which is already held with a
conflicting lock by another transaction then the DBMS simply checks the timestamp of
both transactions. It allows the older transaction to wait until the resource is available for
execution.
Let's assume there are two transactions Ti and Tj and let TS(T) is a timestamp of any
transaction T. If T2 holds a lock by some other transaction and T1 is requesting for
resources held by T2 then the following actions are performed by DBMS:
1. Check if TS(Ti) < TS(Tj) - If Ti is the older transaction and Tj has held some resource, then
Ti is allowed to wait until the data-item is available for execution. That means if the older
transaction is waiting for a resource which is locked by the younger transaction, then the
older transaction is allowed to wait for resource until it is available.
2. Check if TS(Ti) < TS(Tj) - If Ti is older transaction and has held some resource and if Tj is
waiting for it, then Tj is killed and restarted later with the random delay but with the same
timestamp.
Wound wait scheme
o In wound wait scheme, if the older transaction requests for a resource which is held by the
younger transaction, then older transaction forces younger one to kill the transaction and
release the resource. After the minute delay, the younger transaction is restarted but with
the same timestamp.
o If the older transaction has held a resource which is requested by the Younger transaction,
then the younger transaction is asked to wait until older releases it.
Deadlock Detection
In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS should
detect whether the transaction is involved in a deadlock or not. The lock manager
maintains a Wait for the graph to detect the deadlock cycle in the database.
Wait for Graph
o This is the suitable method for deadlock detection. In this method, a graph is created based
on the transaction and their lock. If the created graph has a cycle or closed loop, then there
is a deadlock.
o The wait for the graph is maintained by the system for every transaction which is waiting
for some data held by the others. The system keeps checking the graph if there is any cycle
in the graph.
The wait for a graph for the above scenario is shown below:
Deadlock Recovery
When a detection algorithm detects that there is a deadlock, the system
must be able to recover from the deadlock.
The most common solution to recover from the dead-lock situation is
to roll back one or more transactions and break the deadlock.
Here, three actions are needed to be taken:
(a)Selection of a Victim:
Given a set of deadlocked transactions, one must determine which
transaction has to be rolled back to break the deadlock. Then, that
transaction should be rolled back which occurs at minimum cost. Many
other factors also play a role in deciding which transactions need to be
rolled back.
(b)Rollback:
Once it is decided that a particular transaction must be rolled back, then
determine how far this transaction should be rolled back.
The simplest solution is ‘Total Rollback’. That is, we abort the transaction
and then restart it.
Sometimes, it is possible to do ‘partial rollback’, which retains the
consistency of the database. ‘Partial Rollback’ requires the system to
maintain additional information about the state of all the running
transactions.
(c)Starvation:
In a system where the selection of a transaction is based primarily on cost
factors, then it is possible to pick the same transaction to be rolled back,
known as ‘victim’. As a result, the transaction never completes its
designated task, thus there is a ‘starvation’. So, here one should decide
that a selected transaction be rolled back for a finite number of times
only. Then ‘starvation’ does not occur.