Module -6 Transaction Management in Dbms
Module -6 Transaction Management in Dbms
Transaction Management
And Concurrency Control
in dbms
Transaction Management
Transactions are a set of operations used to perform a logical set
of work.
A transaction usually means that the data in the database has
changed. One of the major uses of DBMS is to protect the user
data from system failures. It is done by ensuring that all the data
is restored to a consistent state when the computer is restarted
after a crash.
The transaction is any one execution of the user program in a
DBMS. One of the important properties of the transaction is that
it contains a finite number of steps. Executing the same program
multiple times will generate multiple transactions.
Example: Consider the following example of transaction operations to be
performed to withdraw cash from an ATM vestibule.
Steps for ATM Transaction
1. Transaction Start.
2. Insert your ATM card.
3. Select a language for your transaction.
4. Select the Savings Account option.
5. Enter the amount you want to withdraw.
6. Enter your secret pin.
7. Wait for some time for processing.
8. Collect your Cash.
9. Transaction Completed.
A transaction can include the following basic database access operation.
• Read/Access data (R): Accessing the database item from disk (where the database stored data) to
memory variable.
• Write/Change data (W): Write the data item from the memory variable to the disk.
• Commit: Commit is a transaction control language that is used to permanently save the changes
done in a transaction
Example: Transfer of 50₹ from Account A to Account B. Initially A= 500₹, B= 800₹. This data is
brought to RAM from Hard Disk.
Transaction States
Desirable Properties of Transaction (ACID Properties)
• Atomicity
• Consistency
• Isolation
• Durability
Desirable Properties of Transaction (ACID Properties)
Atomicity :
• States that all operations of the transaction take place at once if not, the
transactions are aborted.
• There is no midway, i.e., the transaction cannot occur partially. Each
transaction is treated as one unit and either run to completion or is not
executed at all.
• Atomicity involves the following two operations:
• Abort: If a transaction stops or fails, none of the changes it made will be
saved or visible.
• Commit: If a transaction completes successfully, all the changes it made will
be saved and visible.
Consistency :
• The rules (integrity constraint) that keep the database accurate
and consistent are followed before and after a transaction.
• When a transaction is completed, it leaves the database either as
it was before or in a new stable state.
• This property means every transaction works with a reliable and
consistent version of the database.
• The transaction is used to transform the database from one
consistent state to another consistent state. A transaction
changes the database from one consistent state to another
consistent state.
Isolation :
• It shows that the data which is used at the time of
execution of a transaction cannot be used by the second
transaction until the first one is completed.
• In isolation, if the transaction T1 is being executed and
using the data item X, then that data item can’t be
accessed by any other transaction T2 until the
transaction T1 ends.
• The concurrency control subsystem of the DBMS
enforced the isolation property
Durability :
• The durability property is used to indicate the performance of
the database’s consistent state. It states that the transaction
made the permanent changes.
• They cannot be lost by the erroneous operation of a faulty
transaction or by the system failure. When a transaction is
completed, then the database reaches a state known as the
consistent state. That consistent state cannot be lost, even in
the event of a system’s failure.
• The recovery subsystem of the DBMS has the responsibility of
Durability property.
Detailed Explanation of Transaction Operations
Read Operation : Reading data from a database is like checking your bank balance.
The system fetches the data and shows it to you. This operation doesn’t change the
data; it just displays it.
Write Operation : Writing data updates, the database. Imagine transferring money
between accounts. The system reads the current balances, updates them, and
writes the new values back. This operation changes the stored data.
Commit Operation : The commit operation saves all changes made during the
transaction permanently. Think of it as confirming your purchase in an online store.
Once committed, the changes are final, and you can’t undo them.
Rollback Operation : Rollback undoes all changes made during a transaction. If
something goes wrong, the database returns to its previous state. It’s like canceling
an order before it’s shipped, ensuring no unwanted changes persist.
Concept of Schedule in DBMS
Schedule, as the name suggests, is a process of lining the transactions
and executing them one by one. When there are multiple transactions
that are running in a concurrent manner and the order of operation is
needed to be set so that the operations do not overlap each other,
Scheduling is brought into play and the transactions are timed
accordingly.
Serial Schedules: Schedules in which the
transactions are executed non-interleaved,
i.e., a serial schedule is one in which no
transaction starts until a running transaction
has ended are called serial
schedules. Example: Consider the following
schedule involving two transactions T 1 and
T 2.
where R(A) denotes that a read operation is
performed on some data item ‘A’ This is a
serial schedule since the transactions
perform serially in the order T 1 —> T 2
Non-Serial Schedule:
In a non-serial Schedule, multiple
transactions execute
concurrently/simultaneously, unlike the
serial Schedule, where one transaction
must wait for another to complete all its
operations.
In the Non-Serial Schedule, the other
transaction proceeds without the
completion of the previous transaction.
All the transaction operations
are interleaved or mixed with each other.
Non-serial schedules are NOT always
recoverable, cascades, strict and
consistent.
Serializability in DBMS
Serializability in DBMS is a concept that helps to identify which non-serial schedules
are correct and will maintain the consistency of the database.
A serializable schedule always leaves the database in a consistent state.
A serial schedule is always a serializable schedule because, in a serial Schedule, a
transaction only starts when the other transaction has finished execution.
A non-serial schedule of n transactions is said to be a serializable schedule, if it is
equivalent to the serial Schedule of those n transactions. A serial schedule does not
allow concurrency; only one transaction executes at a time, and the other starts
when the already running transaction is finished.
Difference between Serial Schedule and
Serializable Schedule
Types of Serializability:
1. Conflict Serializability
2. View Serializability
Conflict Serializability: A schedule is called conflict serializable if it can
be transformed into a serial schedule by swapping non-conflicting
operations. An operations pair become conflicting if all conditions
satisfy:
1. Both belong to separate transactions.
2. They have the same data item.
3. They contain at least one write operation
Conflict Serializability
Conflict serializability checks if a schedule of transactions can be
transformed into a sequence where transactions are executed one
after another, without overlapping, while keeping the same
results. This type of serializability focuses on the order of
conflicting operations, meaning those that can affect each other’s
outcomes.
In this schedule, Write(A)/Read(A) and Write(B)/Read(B) are called
as conflicting operations. This is because all the above conditions hold
true for them.
Thus, by swapping(non-conflicting) 2nd pair of the read/write operation
of 'A' data item and 1st pair of the read/write operation of 'B' data item,
this non-serial Schedule can be converted into a serializable Schedule.
Therefore, it is conflict serializable.
Advantages of Conflict Serializability
• Data Integrity: Ensures that the final state of the database is consistent, as it
prevents conflicting transactions from interfering with each other.
• Clear Rules: The rules for conflict serializability are straightforward and easy to
understand, making it easier to implement in database systems.
• Efficient Execution: Many database systems can optimize transaction execution
based on conflict serializability, potentially improving performance.
• Detectable Issues: The use of precedence graphs makes it easier to detect
cycles and conflicts, allowing for quicker identification of problematic
schedules.
• Strong Isolation: Provides a strong level of isolation between transactions,
which can be crucial for applications requiring high reliability.
Disadvantages of Conflict Serializability
• Restrictive: The strict nature of conflict serializability can lead to reduced
concurrency, as it may unnecessarily block transactions that could otherwise run
simultaneously.
• Complexity in Management: Implementing conflict serializability may require
additional mechanisms, such as locking, which can complicate transaction
management.
• Performance Overhead: The need to check for conflicts and maintain locks can
introduce performance overhead, especially in high-load environments.
• Not Always Necessary: In some applications, the strict guarantees of conflict
serializability may be more than what is needed, leading to inefficiencies.
• Deadlock Potential: The use of locks to enforce conflict serializability can lead to
deadlocks, where two or more transactions are waiting indefinitely for each other
to release resources.
View Serializability
• A schedule is viewed serializable if it
is view equivalent to a serial
schedule. If a schedule is conflict
serializable, then it will be view
serializable. The view serializable
which does not conflict with
serializable, contains blind writes.
• On the other hand, view
serializability is a bit broader. It
ensures that even if the
transactions overlap, they
produce the same final state as
some serial execution. This means
that as long as the final view of
the database is consistent with a
serial order, the schedule is
considered valid.
Advantages of View Serializability
• Greater Flexibility: View serializability allows transactions to overlap, which can
improve system performance and resource utilization compared to stricter
methods.
• Increased Concurrency: By permitting non-conflicting transactions to run
simultaneously, view serializability can enhance throughput in high-transaction
environments.
• Maintains Consistency: It ensures that the final state of the database is consistent
with some serial execution, which is essential for data integrity.
• Broader Applicability: It can be used in scenarios where the strict order of
operations is not necessary, making it suitable for many real-world applications.
• Simpler Management: Since it allows more overlapping operations, the
management of transactions can sometimes be less complex compared to conflict
serializability.
Disadvantages of View Serializability
• Complexity of Validation: Determining whether a schedule is view serializable
can be more complex than checking for conflict serializability, requiring
detailed analysis of transaction outcomes.
• Potential for Inconsistency: While it aims to maintain a consistent final state,
the overlapping nature of transactions can lead to challenges in ensuring that
all intermediate states are valid.
• Less Strong Isolation: It does not provide as strong a level of isolation as
conflict serializability, which might be a concern for applications requiring high
reliability.
• Performance Issues: In some cases, allowing too much overlap can lead to
performance bottlenecks or resource contention, particularly if transactions are
not carefully managed.
• Not Always Enforced: Some database systems may not fully support view
serializability, limiting its practical application in certain environments
Difference Between Conflict and View Serializability
Advantages of Serializability
1. Execution is predictable: In serializable, the DBMS's threads are all performed
simultaneously. The DBMS doesn't include any such surprises. In DBMS, no
data loss or corruption occurs and all variables are updated as intended.
2. DBMS executes each thread independently, making it much simpler to
understand and troubleshoot each database thread. This can greatly simplify
the debugging process. The concurrent process is therefore not a concern for
us.
3. Lower Costs: The cost of the hardware required for the efficient operation of
the database can be decreased with the aid of the serializable property. It may
also lower the price of developing the software.
4. Increased Performance: Since serializable executions provide developers the
opportunity to optimize their code for performance, they occasionally
outperform non-serializable equivalents.
Non-Serializability in DBMS
A non-serial schedule that is not serializable is called a non-serializable
schedule. Non-serializable schedules may/may not be consistent or
recoverable. Non-serializable Schedule is divided into types:
1. Recoverable Schedule
2. Non-recoverable Schedule
Non-Serializability in DBMS
Recoverable Schedule
A schedule is recoverable if
each transaction commits only after all
the transactions from which it has read
have committed. In other words, if
some transaction T j is reading value
updated or written by some other
transaction T i , then the commit of
T j must occur after the commit of
T i . Example – Consider the
following schedule involving two
transactions T 1 and T 2 .
1. Cascading Schedule:
Also called Avoids cascading
aborts/rollbacks (ACA). When there
is a failure in one transaction and
this leads to the rolling back or
aborting other dependent
transactions, then such scheduling
is referred to as Cascading rollback
or cascading abort. Example:
2. Cascadeless Schedule:
Schedules in which transactions read values only after all transactions
whose changes they are going to read commit are called cascadeless
schedules. Avoids that a single transaction abort leads to a series of
transaction rollbacks.
A strategy to prevent cascading aborts is to disallow a transaction from
reading uncommitted changes from another transaction in the same
schedule.
In other words, if some transaction T j wants to read value updated or
written by some other transaction T i , then the commit of T j must read it
after the commit of T i . Example: Consider the following schedule
involving two transactions T 1 and T 2 .
This schedule is cascadeless. Since
the updated value of A is read by
T 2 only after the updating transaction
i.e. T 1 commits. Example: Consider
the following schedule involving two
transactions T 1 and T 2 .
Example: Consider the following
schedule involving two
transactions T 1 and T 2 .
It is a recoverable schedule but it
does not avoid cascading aborts. It
can be seen that if T 1 aborts,
T 2 will have to be aborted too in
order to maintain the correctness of
the schedule as T 2 has already
read the uncommitted value
written by T 1 .
3. Strict Schedule
A schedule is strict if for any two transactions T i , T j , if a
write operation of T i precedes a conflicting operation of
T j (either read or write), then the commit or abort event
of T i also precedes that conflicting operation of T j . In
other words, T j can read or write updated or written
value of T i only after T i commits/aborts.
This is a strict schedule since T 2 reads and writes A
which is written by T 1 only after the commit of T 1 .
Non-Recoverable Schedule
Consider the following schedule
involving two transactions T 1 and T 2.
T 2 read the value of A written by T 1 ,
and committed. T 1 later aborted,
therefore the value read by T 2 is
wrong, but since T 2 committed, this
schedule is non-recoverable .
Concurrency Control
Concurrency control is a concept in Database Management Systems
(DBMS) that ensures multiple transactions can simultaneously access or
modify data without causing errors or inconsistencies. It provides
mechanisms to handle the concurrent execution in a way that
maintains ACID properties.
Concurrency Control is a crucial Database Management System
(DBMS) component. It manages simultaneous operations without them
conflicting with each other. The primary aim is maintaining consistency,
integrity, and isolation when multiple users or applications access the
database simultaneously.
But concurrent execution can lead to various challenges:
1. Lost Updates: Consider two users trying to update the same data. If one
user reads a data item and then another user reads the same item and
updates it, the first user’s updates could be lost if they weren’t aware of
the second user’s actions.
2. Uncommitted Data: If one user accesses data that another user has
updated but not yet committed (finalized), and then the second user
decides to abort (cancel) their transaction, the first user has invalid data.
3. Inconsistent Retrievals: A transaction reads several values from the
database, but another transaction modifies some of those values in the
middle of its operation.
To address these challenges, the DBMS employs concurrency
control techniques.
Ensure Database Consistency: Without concurrency control, simultaneous transactions could interfere
with each other, leading to inconsistent database states. Proper concurrency control ensures the database
remains consistent even after numerous concurrent transactions.
Avoid Conflicting Updates: When two transactions attempt to update the same data simultaneously, one
update might overwrite the other without proper control. Concurrency control ensures that updates don’t
conflict and cause unintended data loss.
Prevent Dirty Reads: Without concurrency control, one transaction might read data that another
transaction is in the middle of updating (but hasn’t finalized). This can lead to inaccurate or “dirty”
reads, where the data doesn’t reflect the final, committed state.
Enhance System Efficiency: By managing concurrent access to the database, concurrency control allows
multiple transactions to be processed in parallel. This improves system throughput and makes optimal use
of resources.
Protect Transaction Atomicity: For a series of operations within a transaction, it’s crucial that all
operations succeed (commit) or none do (abort). Concurrency control ensures that transactions are
atomic and treated as a single indivisible unit, even when executed concurrently with others.
Lock-Based Protocol
• Lock based protocol mechanism is very crucial in concurrency control which controls
concurrent access to a data item
• It ensures that one transaction should not retrieve and update record while another
transaction is performing a write operation on it.
Example
In traffic light signal that indicates stop and go, when one signal is allowed to pass at a
time and other signals are locked, in the same way in a database transaction, only one
transaction is performed at a time meanwhile other transactions are locked
• If this locking is not done correctly then it will display inconsistent and incorrect data
• It maintains the order between the conflicting pairs among transactions during execution
• There are two lock modes,
• Shared Lock(S)
• Exclusive Lock(X)
• Shared lock(S)
Shared locks can only read without performing any changes to it from the database
• Shared Locks are represented by S.
• S – lock is requested using lock – s instruction.
Exclusive Lock(X)
• The data items accessed using this instruction can perform both read and write
operations
• Exclusive Locks are represented by X.
• X – lock is requested using lock – X instruction.
Concurrency Control Techniques in DBMS
The various concurrency control techniques are:
Two-phase locking Protocol
Time stamp ordering Protocol
Multi version concurrency control
Validation concurrency control
1. Two-phase locking Protocol
Two-phase locking (2PL) is a protocol used in database management systems to
control concurrency and ensure transactions are executed in a way that preserves
the consistency of a database. It’s called “two-phase” because, during each
transaction, there are two distinct phases: the Growing phase and the Shrinking
phase.
1. Two-phase locking Protocol
Breakdown of the Two-Phase Locking protocol
1. Phases:
1. Growing Phase: During this phase, a transaction can obtain (acquire)
any number of locks as required but cannot release any. This phase
continues until the transaction acquires all the locks it needs and no
longer requests.
2. Shrinking Phase: Once the transaction releases its first lock, the
Shrinking phase starts. During this phase, the transaction can release
but not acquire any more locks.
2. Lock Point: The exact moment when the transaction switches from the
Growing phase to the Shrinking phase (i.e. when it releases its first lock) is
termed the lock point.
2. Time stamp ordering Protocol
The Timestamp Ordering Protocol is a concurrency control method used in database management
systems to maintain the serializability of transactions. This method uses a timestamp for each
transaction to determine its order in relation to other transactions. Instead of using locks, it ensures
transaction order based on their timestamps.
Breakdown of the Time stamp ordering protocol
1. Read Timestamp (RTS):
1. This is the latest or most recent timestamp of a transaction that has read the data item.
2. Every time a data item X is read by a transaction T with timestamp TS, the RTS of X is updated to TS
if TS is more recent than the current RTS of X.
2. Write Timestamp (WTS):
1. This is the latest or most recent timestamp of a transaction that has written or updated the data
item.
2. Whenever a data item X is written by a transaction T with timestamp TS, the WTS of X is updated to
TS if TS is more recent than the current WTS of X.
The timestamp ordering protocol uses these timestamps to determine whether a transaction’s request
to read or write a data item should be granted. The protocol ensures a consistent ordering of
operations based on their timestamps, preventing the formation of cycles and, therefore, deadlocks.
3. Multi version concurrency control
Multi version Concurrency Control (MVCC) is a technique used in database management systems to
handle concurrent operations without conflicts, using multiple versions of a data item. Instead of
locking the items for write operations (which can reduce concurrency and lead to bottlenecks or
deadlocks), MVCC will create a separate version of the data item being modified.
Breakdown of the Multi version concurrency control (MVCC)
1. Multiple Versions: When a transaction modifies a data item, instead of changing the item in
place, it creates a new version of that item. This means that multiple versions of a database
object can exist simultaneously.
2. Reads aren’t Blocked: One of the significant advantages of MVCC is that read operations don’t
get blocked by write operations. When a transaction reads a data item, it sees a version of that
item consistent with the last time it began a transaction or issued a read, even if other
transactions are currently modifying that item.
3. Timestamps or Transaction IDs: Each version of a data item is tagged with a unique identifier,
typically a timestamp or a transaction ID. This identifier determines which version of the data
item a transaction sees when it accesses that item. A transaction will always see its own
writes, even if they are uncommitted.
4. Garbage Collection: As transactions create newer versions of data items, older versions can
become obsolete. There’s typically a background process that cleans up these old versions, a
procedure often referred to as “garbage collection.”
5. Conflict Resolution: If two transactions try to modify the same data item concurrently, the
system will need a way to resolve this. Different systems have different methods for conflict
resolution. A common one is that the first transaction to commit will succeed, and the other
transaction will be rolled back or will need to resolve the conflict before proceeding.
4. Validation concurrency control (Optimistic
Concurrency Control)
Validation (or Optimistic) Concurrency Control (VCC) is an advanced database
concurrency control technique. Instead of acquiring locks on data items, as is done in
most traditional (pessimistic) concurrency control techniques, validation concurrency
control allows transactions to work on private copies of database items and validates the
transactions only at the time of commit.
The central idea behind optimistic concurrency control is that conflicts between
transactions are rare, and it’s better to let transactions run to completion and only
check for conflicts at commit time.
4. Validation concurrency control
Breakdown of Validation Concurrency Control (VCC):
Phases: Each transaction in VCC goes through three distinct phases:
Read Phase: The transaction reads values from the database and makes changes to its
private copy without affecting the actual database.
Validation Phase: Before committing, the transaction checks if the changes made to its
private copy can be safely written to the database without causing any conflicts.
Write Phase: If validation succeeds, the transaction updates the actual database with the
changes made to its private copy.
Validation Criteria: During the validation phase, the system checks for potential conflicts
with other transactions. If a conflict is found, the system can either roll back the
transaction or delay it for a retry, depending on the specific strategy implemented.