Transaction Online
Transaction Online
TRANSACTION PROCESSING
Transaction
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account.
This small transaction contains several low-level tasks:
X's Account
1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)
Y's Account
1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)
Read(X): Read operation is used to read the value of X from the database and stores it in a
buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following
operations:
1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);
Transaction in DBMS is a set of logically related operations executed as a single unit. These
logic are followed to perform modification on data while maintaining integrity and
consistency. Transactions are performed in a way that concurrent actions from different users
don’t malfunction the database. Transfer of money from one account to another in a bank
management system is the best example of Transaction.
A transaction goes through several states during its lifetime. These states indicate the current
status of the transaction and guide how it will proceed. They determine whether the
transaction will be successfully completed (committed) or stopped (aborted). These states
also use a transaction log to keep track of the process.
A transaction state refers to the current phase or condition of a transaction during its
execution in a database. It represents the progress of the transaction and determines whether
it will successfully complete (commit) or fail (abort).
Read Operation: Reads data from the database, stores it temporarily in memory (buffer), and
uses it as needed.
Write Operation: Updates the database with the changed data using the buffer.
From the start of executing instructions to the end, these operations are treated as a single
transaction. This ensures the database remains consistent and reliable throughout the process.
Different Types of Transaction States in DBMS
1. Active State – This is the first stage of a transaction, when the transaction’s instructions are
being executed.
It is the first stage of any transaction when it has begun to execute. The execution of the
transaction takes place in this state.
Operations such as insertion, deletion, or updation are performed during this state.
During this state, the data records are under manipulation and they are not saved to the
database, rather they remain somewhere in a buffer in the main memory.
2. Partially Committed –
The transaction has finished its final operation, but the changes are still not saved to the
database.
After completing all read and write operations, the modifications are initially stored in main
memory or a local buffer. If the changes are made permanent on the DataBase then the state
will change to “committed state” and in case of failure it will go to the “failed state”.
3. Failed State –If any of the transaction-related operations cause an error during the active or
partially committed state, further execution of the transaction is stopped and it is brought into
a failed state. Here, the database recovery system makes sure that the database is in a
consistent state.
5. Aborted State- If a transaction reaches the failed state due to a failed check, the database
recovery system will attempt to restore it to a consistent state. If recovery is not possible, the
transaction is either rolled back or cancelled to ensure the database remains consistent.
In the aborted state, the DBMS recovery system performs one of two actions:
Kill the transaction: The system terminates the transaction to prevent it from affecting other
operations.
Restart the transaction: After making necessary adjustments, the system reverts the
transaction to an active state and attempts to continue its execution.
6. Commuted- This state of transaction is achieved when all the transaction-related operations
have been executed successfully along with the Commit operation, i.e. data is saved into the
database after the required manipulations in this state. This marks the successful completion
of a transaction.
6. Terminated State – If there isn’t any roll-back or the transaction comes from the
“committed state”, then the system is consistent and ready for new transaction and the
old transaction is terminated.
ACID Properties in DBMS
A transaction is a single logical unit of work that interacts with the database,
potentially modifying its content through read and write operations. To maintain
database consistency both before and after a transaction, specific properties, known as
ACID properties must be followed.
This article focuses on the ACID properties in DBMS, which are essential for
ensuring data consistency, integrity, and reliability during database transactions.
Atomicity:
By this, we mean that either the entire transaction takes place at once or doesn’t
happen at all. There is no midway i.e. transactions do not occur partially. Each
transaction is considered as one unit and either runs to completion or is not executed
at all. It involves the following two operations.
— Abort : If a transaction aborts, changes made to the database are not visible.
— Commit : If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
Consistency:
Consistency ensures that a database remains in a valid state before and after a
transaction. It guarantees that any transaction will take the database from one
consistent state to another, maintaining the rules and constraints defined for the data.
Referring to the example above,
The total amount before and after the transaction must be maintained.
Total before T occurs = 500 + 200 = 700 .
Total after T occurs = 400 + 300 = 700 .
Therefore, the database is consistent . Inconsistency occurs in case T1 completes but
T2 fails.
Isolation:
This property ensures that multiple transactions can occur concurrently without
leading to the inconsistency of the database state. Transactions occur independently
without interference. Changes occurring in a particular transaction will not be visible
to any other transaction until that particular change in that transaction is written to
memory or has been committed. This property ensures that when multiple transactions
run at the same time, the result will be the same as if they were run one after another
in a specific order.
Let X = 500, Y = 500.
Consider two transactions T and T”.
Suppose T has been executed till Read (Y) and then T’’ starts. As a result, interleaving
of operations takes place due to which T’’ reads the correct value of X but the
incorrect value of Y and sum computed by
T’’: (X+Y = 50, 000+500=50, 500) .
is thus not consistent with the sum at end of the transaction:
T: (X+Y = 50, 000 + 450 = 50, 450) .
This results in database inconsistency, due to a loss of 50 units. Hence, transactions
must take place in isolation and changes should be visible only after they have been
made to the main memory.
Durability:
This property ensures that once the transaction has completed execution, the updates
and modifications to the database are stored in and written to disk and they persist
even if a system failure occurs. These updates now become permanent and are stored
in non-volatile memory. The effects of the transaction, thus, are never lost.
The ACID properties, in totality, provide a mechanism to ensure the correctness and
consistency of a database in a way such that each transaction is a group of operations
that acts as a single unit, produces consistent results, acts in isolation from other
operations, and updates that it makes are durably stored.
ACID properties are the four key characteristics that define the reliability and
consistency of a transaction in a Database Management System (DBMS).
The acronym ACID stands for Atomicity, Consistency, Isolation, and Durability.
Here is a brief description of each of these properties:
The main goal of concurrency control is to ensure that simultaneous transactions do not lead
to data conflicts or violate the consistency of the database. The concept of serializability is
often used to achieve this goal.
Advantages of Concurrency
In general, concurrency means that more than one transaction can work on a system.
The advantages of a concurrent system are:
Waiting Time: It means if a process is in a ready state but still the process does not get the
system to get execute is called waiting time. So, concurrency leads to less waiting time.
Response Time: The time wasted in getting the response from the CPU for the first time, is
called response time. So, concurrency leads to less Response Time.
Resource Utilization: The amount of Resource utilization in a particular system is called
Resource Utilization. Multiple transactions can run parallel in a system. So, concurrency
leads to more Resource Utilization.
Efficiency: The amount of output produced in comparison to given input is called efficiency.
So, Concurrency leads to more Efficiency.
Disadvantages of Concurrency
Deadlocks: Deadlocks can occur when two or more transactions are waiting for each other to
release resources, causing a circular dependency that can prevent any of the transactions from
completing. Deadlocks can be difficult to detect and resolve, and can result in reduced
throughput and increased latency.
Reduced concurrency: Concurrency control can limit the number of users or applications
that can access the database simultaneously. This can lead to reduced concurrency and slower
performance in systems with high levels of concurrency.
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:
What is a Lock?
It is associated with each data item that represents the state of the item with respect to the
possible operations that can be performed on it. Its value is used in a locking scheme to
manipulation of the associated data item and control concurrent access. Locking a data item
being used by a transaction can prevent other transactions running simultaneously from using
these locked data items. This is one of the most commonly used techniques to ensure
serialization, manipulation of the value of a lock is called locking.
Types of Locks
Various types of locks are used to control concurrency. Depending on the type of lock, the
lock manager grants or denies access to other operations on the same data item. First of all
discuss the binary locks which are simple and have less practical use. Then we will discuss
shared and exclusive locks also called read/write locks which have more locking capabilities
and have large practical usage.
Binary locks
o Locked
o Unlocked
Which are represented by 1 or 0 for simplicity. Each data item has a separate lock associated
with it. If the data item is locked then it cannot be accessed by database operations that
request the data item and if the data item is unlocked then it can be accessed when requested.
Following two operations are associated with binary locking of a data item A.
o Lock (A)
o Unlock (A)
A transaction requests access to a data item by first locking the data item using the Lock()
operation. While doing so, if another operation of another concurrent transaction tries to
access the same data item, it is forced to wait until the transaction that locked the data item
has unlocked the same data item using the Unlock() operation. When a given transaction has
locked a data item completes all its operations with that data item, it automatically unlocks
the data item so that other transactions can use it.
The following rules must be followed whenever binary locking schemes are used:
o Lock (): This operation must be issued by the transaction before any update
operations such as a read or write performed on the transaction.
o Unlock (): This operation must be issued by the transaction after all read or write
operations in the transaction have completed.
o A lock() operation cannot be released by a transaction if it already holds a lock on the
data item.
o An Unlock () operation cannot be issued by the transaction unless it already holds a
lock on the data item.
Consider two transactions T1 and T2 both update the account balance by Rs 200 and Rs 300
respectively and the possible schedules is shown if these transactions are made to run
concurrently.
Handling Deadlocks
Deadlock is a situation where a process or a set of processes is blocked, waiting for some
other resource that is held by some other waiting process. It is an undesirable state of the
system. In other words, Deadlock is a critical situation in computing where a process, or a
group of processes, becomes unable to proceed because each is waiting for a resource that is
held by another process in the same group. This scenario leads to a complete standstill,
rendering the affected processes inactive and the system inefficient.
Necessary Condition for a Deadlock
The following are the four conditions that must hold simultaneously for a deadlock to
occur.
Mutual Exclusion: A resource can be used by only one process at a time. If another process
requests for that resource, then the requesting process must be delayed until the resource has
been released.
Hold and wait: Some processes must be holding some resources in the non-shareable mode
and at the same time must be waiting to acquire some more resources, which are currently
held by other processes in the non-shareable mode.
No pre-emption: Resources granted to a process can be released back to the system only as a
result of voluntary action of that process after the process has completed its task.
Circular wait: Deadlocked processes are involved in a circular chain such that each process
holds one or more resources being requested by the next process in the chain.
1. Deadlock Prevention
1. Deadlock Prevention
The strategy of deadlock prevention is to design the system in such a way that the possibility
of deadlock is excluded. The indirect methods prevent the occurrence of one of three
necessary conditions of deadlock i.e., mutual exclusion, no pre-emption, and hold and wait.
The direct method prevents the occurrence of circular wait. Prevention techniques -Mutual
exclusion – are supported by the OS. Hold and Wait – the condition can be prevented by
requiring that a process requests all its required resources at one time and blocking the
process until all of its requests can be granted at the same time simultaneously. But this
prevention does not yield good results because:
If a process that is holding some resource, requests another resource that can not be
immediately allocated to it, all resources currently being held are released and if necessary,
request again together with the additional resource.
If a process requests a resource that is currently held by another process, the OS may pre-
empt the second process and require it to release its resources. This works only if both
processes do not have the same priority.
Circular wait One way to ensure that this condition never holds is to impose a total ordering
of all resource types and to require that each process requests resources in increasing order of
enumeration, i.e., if a process has been allocated resources of type R, then it may
subsequently request only those resources of types following R in ordering.
2. Deadlock Avoidance
The deadlock avoidance Algorithm works by proactively looking for potential deadlock
situations before they occur. It does this by tracking the resource usage of each process and
identifying conflicts that could potentially lead to a deadlock. If a potential deadlock is
identified, the algorithm will take steps to resolve the conflict, such as rolling back one of the
processes or pre-emptively allocating resources to other processes. The Deadlock Avoidance
Algorithm is designed to minimize the chances of a deadlock occurring, although it cannot
guarantee that a deadlock will never occur. This approach allows the three necessary
conditions of deadlock but makes judicious choices to assure that the deadlock point is never
reached. It allows more concurrency than avoidance detection A decision is made
dynamically whether the current resource allocation request will, if granted, potentially lead
to deadlock. It requires knowledge of future process requests. Two techniques to avoid
deadlock :
Advantages
Disadvantages
Banker’s Algorithm
The Banker’s Algorithm is based on the concept of resource allocation graphs. A resource
allocation graph is a directed graph where each node represents a process, and each edge
represents a resource. The state of the system is represented by the current allocation of
resources between processes. For example, if the system has three processes, each of which is
using two resources, the resource allocation graph would look like this:
Processes A, B, and C would be the nodes, and the resources they are using would be the
edges connecting them. The Banker’s Algorithm works by analyzing the state of the system
and determining if it is in a safe state or at risk of entering a deadlock.
To determine if a system is in a safe state, the Banker’s Algorithm uses two matrices: the
available matrix and the need matrix. The available matrix contains the amount of each
resource currently available. The need matrix contains the amount of each resource required
by each process.
The Banker’s Algorithm then checks to see if a process can be completed without
overloading the system. It does this by subtracting the amount of each resource used by the
process from the available matrix and adding it to the need matrix. If the result is in a safe
state, the process is allowed to proceed, otherwise, it is blocked until more resources become
available.
The Banker’s Algorithm is a powerful tool for resource allocation problems, but it is not
foolproof. It can be fooled by processes that consume more resources than they need, or by
processes that produce more resources than they need. Also, it can be fooled by processes
that consume resources in an unpredictable manner. To prevent these types of problems, it is
important to carefully monitor the system to ensure that it is in a safe state.
3. Deadlock Detection
Deadlock detection is used by employing an algorithm that tracks the circular waiting and
kills one or more processes so that the deadlock is removed. The system state is examined
periodically to determine if a set of processes is deadlocked. A deadlock is resolved by
aborting and restarting a process, relinquishing all the resources that the process held.
• This technique does not limit resource access or restrict process action.
• Requested resources are granted to processes whenever possible.
• It never delays the process initiation and facilitates online handling.
• The disadvantage is the inherent pre-emption losses.
4.Deadlock Ignorance
In the Deadlock ignorance method the OS acts like the deadlock never occurs and completely
ignores it even if the deadlock occurs. This method only applies if the deadlock occurs very
rarely. The algorithm is very simple. It says, ” if the deadlock occurs, simply reboot the
system and act like the deadlock never occurred.” That’s why the algorithm is called the
Ostrich Algorithm.
Advantages
Disadvantages
• Ostrich Algorithm does not provide any information about the deadlock situation.
• It can lead to reduced performance of the system as the system may be blocked for a
long time.
The Timestamp Ordering Protocol is a method used in database systems to order transactions
based on their timestamps. A timestamp is a unique identifier assigned to each transaction,
typically determined using the system clock or a logical counter. Transactions are executed in
the ascending order of their timestamps, ensuring that older transactions get higher priority.
For example:
If Transaction T1 enters the system first, it gets a timestamp TS(T1) = 007 (assumption).
This means T1 is “older” than T2 and T1 should execute before T2 to maintain consistency.
Transaction Priority:
Older transactions (those with smaller timestamps) are given higher priority.
For example, if transaction T1 has a timestamp of 007 times and transaction T2 has a
timestamp of 009 times, T1 will execute first as it entered the system earlier.
Ensuring Serializability:
The protocol ensures that the schedule of transactions is serializable. This means the
transactions can be executed in an order that is logically equivalent to their timestamp order.
The protocol manages concurrent execution such that the timestamps determine the
serializability order.
The timestamp ordering protocol ensures that any conflicting read and write operations are
executed in timestamp order.
If R_TS(X) > TS(T) and if W_TS(X) > TS(T), then abort and rollback T and reject the
operation. else,
Execute W_item(X) operation of T and set W_TS(X) to TS(T) to the larger of TS(T) and
current W_TS(X).
If W_TS(X) > TS(T), then abort and reject T and reject the operation, else
If W_TS(X) <= TS(T), then execute the R_item(X) operation of T and set R_TS(X) to the
larger of TS(T) and current R_TS(X).
Whenever the Basic TO algorithm detects two conflicting operations that occur in an
incorrect order, it rejects the latter of the two operations by aborting the Transaction that
issued it.
Conflict Serializable: Ensures all conflicting operations follow the timestamp order.
Starvation of Newer Transactions : Older transactions are prioritized, which can delay or
starve newer transactions.
High Overhead: Maintaining and updating timestamps for every data item adds significant
system overhead.
Inefficient for High Concurrency: The strict ordering can reduce throughput in systems with
many concurrent transactions.
The Strict Timestamp Ordering Protocol is an enhanced version of the Basic Timestamp
Ordering Protocol. It ensures a stricter control over the execution of transactions to avoid
cascading rollbacks and maintain a more consistent schedule.
Database Systems like any other computer system, are subject to failures but the data stored
in them must be available as and when required. When a database fails it must possess the
facilities for fast recovery. It must also have atomicity i.e. either transactions are completed
successfully and committed (the effect is recorded permanently in the database) or the
transaction should have no effect on the database.
Database recovery techniques are used in database management systems (DBMS) to restore a
database to a consistent state after a failure or error has occurred. The main goal of recovery
techniques is to ensure data integrity and consistency and prevent data loss.
The rollback/undo recovery technique is based on the principle of backing out or undoing the
effects of a transaction that has not been completed successfully due to a system failure or
error. This technique is accomplished by undoing the changes made by the transaction using
the log records stored in the transaction log. The transaction log contains a record of all the
transactions that have been performed on the database. The system uses the log records to
undo the changes made by the failed transaction and restore the database to its previous state.
The commit/redo recovery technique is based on the principle of reapplying the changes
made by a transaction that has been completed successfully to the database. This technique is
accomplished by using the log records stored in the transaction log to redo the changes made
by the transaction that was in progress at the time of the failure or error. The system uses the
log records to reapply the changes made by the transaction and restore the database to its
most recent consistent state.
Checkpoint Recoveryis a technique used to improve data integrity and system stability,
especially in databases and distributed systems. It entails preserving the system’s state at
regular intervals, known as checkpoints, at which all ongoing transactions are either
completed or not initiated. This saved state, which includes memory and CPU registers, is
kept in stable, non-volatile storage so that it can withstand system crashes. In the event of a
breakdown, the system can be restored to the most recent checkpoint, which reduces data loss
and downtime. The frequency of checkpoint formation is carefully regulated to decrease
system overhead while ensuring that recent data may be restored quickly.
Overall, recovery techniques are essential to ensure data consistency and availability in
Database Management System, and each technique has its own advantages and limitations
that must be considered in the design of a recovery system.
Database Systems
There are both automatic and non-automatic ways for both, backing up data and recovery
from any failure situations. The techniques used to recover lost data due to system crashes,
transaction errors, viruses, catastrophic failure, incorrect command execution, etc. are
database recovery techniques. So to prevent data loss recovery techniques based on deferred
updates and immediate updates or backing up data can be used. Recovery techniques are
heavily dependent upon the existence of a special file known as a system log. It contains
information about the start and end of each transaction and any updates which occur during
the transaction. The log keeps track of all transaction operations that affect the values of
database items. This information is needed to recover from transaction failure.
The log is kept on disk start_transaction(T): This log entry records that transaction T starts
the execution.
read_item(T, X): This log entry records that transaction T reads the value of database
item X.
write_item(T, X, old_value, new_value): This log entry records that transaction T changes
the value of the database item X from old_value to new_value. The old value is sometimes
known as a before an image of X, and the new value is known as an afterimage of X.
commit(T): This log entry records that transaction T has completed all accesses to the
database successfully and its effect can be committed (recorded permanently) to the database.
checkpoint: A checkpoint is a mechanism where all the previous logs are removed from the
system and stored permanently in a storage disk. Checkpoint declares a point before which
the DBMS was in a consistent state, and all the transactions were committed.
A transaction T reaches its commit point when all its operations that access the database have
been executed successfully i.e. the transaction has reached the point at which it will not abort
(terminate without completing). Once committed, the transaction is permanently recorded in
the database. Commitment always involves writing a commit entry to the log and writing the
log to disk. At the time of a system crash, the item is searched back in the log for all
transactions T that have written a start_transaction(T) entry into the log but have not written a
commit(T) entry yet; these transactions may have to be rolled back to undo their effect on the
database during the recovery process.
Undoing: If a transaction crashes, then the recovery manager may undo transactions i.e.
reverse the operations of a transaction. This involves examining a transaction for the log
entry write_item(T, x, old_value, new_value) and setting the value of item x in the database
to old-value. There are two major techniques for recovery from non-catastrophic transaction
failures: deferred updates and immediate updates.
Deferred Update: This technique does not physically update the database on disk until a
transaction has reached its commit point. Before reaching commit, all transaction updates are
recorded in the local transaction workspace. If a transaction fails before reaching its commit
point, it will not have changed the database in any way so UNDO is not needed. It may be
necessary to REDO the effect of the operations that are recorded in the local transaction
workspace, because their effect may not yet have been written in the database. Hence, a
deferred update is also known as the No-undo/redo algorithm.
Immediate Update: In the immediate update, the database may be updated by some
operations of a transaction before the transaction reaches its commit point. However, these
operations are recorded in a log on disk before they are applied to the database, making
recovery still possible. If a transaction fails to reach its commit point, the effect of its
operation must be undone i.e. the transaction must be rolled back hence we require both undo
and redo. This technique is known as undo/redo algorithm.
Caching/Buffering: In this one or more disk pages that include data items to be updated are
cached into main memory buffers and then updated in memory before being written back to
disk. A collection of in-memory buffers called the DBMS cache is kept under the control of
DBMS for holding these buffers. A directory is used to keep track of which database items
are in the buffer. A dirty bit is associated with each buffer, which is 0 if the buffer is not
modified else 1 if modified.
Backward Recovery: The term ” Rollback ” and ” UNDO ” can also refer to backward
recovery. When a backup of the data is not available and previous modifications need to be
undone, this technique can be helpful. With the backward recovery method, unused
modifications are removed and the database is returned to its prior condition. All adjustments
made during the previous traction are reversed during the backward recovery. In other words,
it reprocesses valid transactions and undoes the erroneous database updates.
Forward Recovery: “ Roll forward “and ” REDO ” refers to forwarding recovery. When a
database needs to be updated with all changes verified, this forward recovery technique is
helpful. Some failed transactions in this database are applied to the database to roll those
modifications forward. In other words, the database is restored using preserved data and valid
transactions counted by their past saves.
Backup Techniques
There are different types of Backup Techniques. Some of them are listed below.
Full database Backup: In this full database including data and database, Meta information
needed to restore the whole database, including full-text catalogs are backed up in a
predefined time series.
Differential Backup: It stores only the data changes that have occurred since the last full
database backup. When some data has changed many times since the last full database
backup, a differential backup stores the most recent version of the changed data. For this first,
we need to restore a full database backup.
Transaction Log Backup: In this, all events that have occurred in the database, like a record
of every single statement executed is backed up. It is the backup of transaction log entries and
contains all transactions that had happened to the database. Through this, the database can be
recovered to a specific point in time. It is even possible to perform a backup from a
transaction log if the data files are destroyed and not even a single committed transaction is
lost.