L5 Transaction and Concurrency Control
L5 Transaction and Concurrency Control
Transaction is a logical unit of work that comprises one or more database operations(like
Read/write/commit/rollback) . In a transaction both read and write operations are
fundamental actions that ensure ACID properties of transactions (data consistency and
integrity)
Read(R)-> A read operation involves retrieving/fetching data from the database.
Write(W)->A write operation in a transaction involves modifying data in the database
Swiggy Order Payment page enter bank details otp network failure rollback
/issues
Concurrency control ensures that multiple transactions can run concurrently without
compromising data consistency.
Example : Consider a banking system where two transactions are happening concurrently
1. Ram giving Shyam 100rs
2. Shyam giving Ram 50rs , the data should be consistent for both transactions
ACID properties are the properties which ensures that transactions are processed reliably
and accurately, even in complex situations(sytem failures/network issues)
This guarantees that the database remains in a consistent state despite any failures or
interruptions during the transaction.
Ex :Consider Ram is transferring money to Shyam. The transaction must deduct the amount from
the Ram’s account and add it to the Shyam's account as a single operation.
If at any moment or at any part, this transaction fails (e.g., due to insufficient funds/system
error/network error), the entire transaction is rolled back, ensuring that none of the accounts is
affected partially.
LET’S START WITH DBMS :)
ACID Properties
It guarantees that the database remains in a consistent state before and after the
execution of each transaction.
Ex: Consider you had 100rs in your account but you want 50rs cash, so you
transferred 50rs to a person X and he gave you 50rs cash.
Before transaction- 100rs(in acc)
After transaction- 100rs( 50rs in acc+ 50 rs cash)
LET’S START WITH DBMS :)
ACID Properties
I-> Isolation : It ensures that if there are two transactions 1 and 2, then the changes
made by Transaction 1 are not visible to Transaction 2 until Transaction 1 commits.
While the transaction is reading data, the dbms ensures that the data is consistent and
isolated from other transactions. This means that other transactions cannot modify the
data being read by the current transaction until it is committed or rolled back.
Most DBMS use a technique called Write-Ahead Logging (WAL) to ensure durability.
Before modifying data in the database, the DBMS writes the changes to a transaction log
(often stored on disk) in a sequential manner. This ensures that if there is a failure event, the
database can recover to a consistent state.
Ex : Consider if your are transferring 100rs to your friend and there is a sudden power outage
or the system crashes right after the transaction is committed, the changes (the transfer of
100) will still be saved in the database. When the system is back up, both your account and
your friend's account will reflect the updated balances.
T1 T2
W R
W W
Isolation level : It determines the degree to which the operations in one transaction are
isolated from those in other transactions.
T2
Application
T1
DB
LET’S START WITH DBMS :)
Isolation levels and its types
T1 T2
W(A)
R(A)
LET’S START WITH DBMS :)
Isolation levels and its types
Consider if T2 modifies the data which T1 already Read and if T1 continue the transaction the
data will be changed T1 T2
R(A)
R(A)
W(A)
Commit
R(A)
LET’S START WITH DBMS :)
Isolation levels and its types
Read Uncommitted: The lowest isolation level where transactions can see uncommitted
changes made by other transactions. If Transaction T1 is writing a value to a table,
Transaction T2 can read this value before T1 commits.
Read Committed: It ensures that any data read during the transaction is committed at the
moment it is read. If T1 has done some write operation T2 can only read the data when T1 is
commited
Dirty Reads: No
Non-Repeatable Reads: Yes
Phantom Reads: Yes
LET’S START WITH DBMS :)
Isolation levels and its types
Repeatable Read: It ensures that if a transaction reads a row, it will see the same values for
that row during the entire transaction, even if other transactions modify the data and
commit. If Transaction T1 reads a value, Transaction T2 cannot modify that value until T1
completes. But T2 can insert new rows that T1 can see on subsequent reads.
Dirty Reads: No
Non-Repeatable Reads: No
Phantom Reads: Yes
LET’S START WITH DBMS :)
Isolation levels and its types
Dirty Reads: No
Non-Repeatable Reads: No
Phantom Reads: No
LET’S START WITH DBMS :)
Schedule and its Types
Schedule : It refers to the sequence in which a set of concurrent/multiple transactions are
executed. You can also say it as a sequence in which the operations (such as read, write,
commit, and abort) of multiple transactions are executed. It is really helpful to ensure data
consistency and integrity.
If there are T1, T2, T3....TN (n) transactions then the possible schedules= n! ( n factorial)
Ex : Schedule sc1 T1 T2
R(A)
R(A)
W(A)
Commit
Commit
LET’S START WITH DBMS :)
Schedule and its Types
Incomplete schedule : An incomplete schedule is one where not all transactions have
reached their final state of either commit or abort.
T1 T2
T1:Read(A) R(A)
T1:Write(A)
W(A)
T2:Read(B)
T2:Write(B) R(B)
T2:COMMIT W(B)
Commit
Here, T1 is still in progress as there is no COMMIT for transaction T1.
LET’S START WITH DBMS :)
Schedule and its Types
Complete schedule : A complete schedule is one where all the transactions in the schedule
have either committed or aborted.
T1 T2
T1:Read(A) R(A)
T1:Write(A)
W(A)
T1:COMMIT
T2:Read(B) Commit
T2:Write(B) R(B)
T2:COMMIT
W(B)
Commit
LET’S START WITH DBMS :)
Schedule and its Types
Types of Schedule
1. Serial Schedule
2. Concurrent or Non-Serial Schedule
3. Conflict-Serializable Schedule
4. View-Serializable Schedule
5. Recoverable Schedule
6. Irrecoverable Schedule
7. Cascadeless Schedule
8. Cascading Schedule
9. Strict Schedule
LET’S START WITH DBMS :)
Schedule and its Types
1.Serial Schedule : A serial schedule is one where transactions are executed one after
another. We can say it like if there are two transactions T1 and T2, T1 should commit to
completeion before T2 starts.
T1 T2
T1:Read(A) W(A)
T1:Write(A)
T1:COMMIT(T1) Commit
T2:Read(B) R(B)
T2:Write(B)
W(B)
T2:COMMIT(T2)
Commit
Challenges:
1. Since there is poor throughput(no of transactions completed per unit time) and memory
utilisation, this is not suggested as it can be can be inefficient.
2. Since wait time is high, less no of transactions are completed.
LET’S START WITH DBMS :)
Schedule and its Types
2. Non-Serial/Concurrent Schedule : A non-serial schedule is one where multiple
transactions can execute simultaneously(operations of multiple transactions are
alternate/interleaved executions). We can say it like if there are two transactions T1 and T2,
T2 doesn’t need to wait for T1 to commit, it can start at any point.
T1 T2 T3
Example : T1 ,T2, T3
R(A)
R(A)
R(B)
W(A)
COMMIT
Challenges:
1. Consistentcy issue may arise because of non-serial execution. It requires robust
concurrency control mechanisms to ensure data consistency and integrity.
2. We can use Serializability and Concurrency Control Mechanisms to ensure consistency.
LET’S START WITH DBMS :)
Schedule and its Types
3. Conflict-Serializable Schedule : A schedule is conflict-serializable if it can be
transformed into a serial schedule by swapping adjacent non-conflicting operations.
R(A)
W(A)
R(A)
W(A)
COMMIT
COMMIT
Recoverable Schedule
LET’S START WITH DBMS :)
Schedule and its Types
6. Irrecoverable Schedule : An irrecoverable schedule allows a transaction to commit even
if it has read data from another uncommitted transaction. This can lead to inconsistencies
and make it impossible to recover from certain failures.
T1 T2
R(A)
W(A)
R(A)
W(A)
COMMIT
FAIL
Irrecoverable Schedule
LET’S START WITH DBMS :)
Schedule and its Types
7. Cascading Schedule : This schedule happens when the failure or abort of one transaction
causes a series of other transactions to also abort.
T1 T2 T3
T1 Writes to A: T1 writes to data item A
T2 Reads A: T2 reads the uncommitted value of A R(A)
written by T1 W(A)
R(A)
Now, if T1 fails and aborts, T2 must also abort because it
has read an uncommitted value from T1. R(A)
Cascading Schedule
Issues:
1. Performance degradation because multiple transactions need to be rolled back
2. Improper CPU resource utilisation
LET’S START WITH DBMS :)
Schedule and its Types
8. Cascadeless Schedule : It ensures transactions only read committed data, such that the
abort of one transaction does not lead to the abort of other transactions that have already
read its uncommitted changes. T1 T2 T3
T1 Writes to A: T1 writes to data item A
T2 Reads A: T2 reads the committed value of A R(A)
Issues : R(A)
W(A)
COMMIT
Strict Schedule
LET’S START WITH DBMS :)
Concurrent VS Parallel Schedule
Multi-threading on a single-core CPU, where threads take Multi-threading on a multi-core CPU, where
Example
turns using the CPU. threads run concurrently on different cores.
LET’S START WITH DBMS :)
Serializability and its types
Serializability: It ensures that concurrent transactions yield results that are consistent with
some serial execution i.e the final state of the database after executing a set of transactions
concurrently should be the same as if the transactions had been executed one after another
in some order.
R(A) W(A) T1 T2
W(A)
Coomit
SERIAL SCHEDULE CONCURRENT SCHEDULE
A concurrent schedule does not always have a cycle.
A concurrent schedule can be conflict-serializable, meaning that it is equivalent to some serial schedule of
transactions and its conflict graph does not have any cycles.
LET’S START WITH DBMS :)
Now, since a cycle is detected we need to serialize them
T1 T2
So, we use the serializibilty here
R(A)
T1 T2 Conflict-Serializability-> to detect the cycle using
conflict graph
R(A)
View-Serializability-> to check if schedule is
W(A)
serializable after a cycle is detected.
W(A)
Now, why are we only swapping the non-conflict pairs and not the conflict ones?
So if we swap the conflict pairs, the order of exceution if it was
T1 : R(A)
T2: W(A)
the results values may change as first we were reading A and then writing/modifying it, but
now it will be writing A and then reading the modified value so the result might change if we
change the order of execution.
LET’S START WITH DBMS :)
Conflict-Serializability
T1 T2
R(X)
W(X)
S1
R(X)
R(Y)
W(Y)
R(Y)
T1 T2
Conflict-Serializability W(X)
R(X)
Q. Find a conflict equivalent for a schedule S1
R(Y)
R(X)
S1
W(X)
R(Y)
R(X)
W(Y)
R(Y)
T1 T2
Conflict-Serializability W(X)
R(Y)
2. After the first swap again search for adjacent non-conflicting W(Y)
Conflicts occur when two operations from different transactions access the same
data item and at least one of them is a write operation.
Cycle Detection: The schedule is conflict-serializable if and only if the conflict graph is
acyclic. If there are no cycles in the graph, it means that the schedule can be serialized
without violating the order of conflicting operations.
LET’S START WITH DBMS :)
Conflict-Serializability
T1 T2 T3 T1 reads A
T2 reads A
R(A)
T1 writes A
R(A) T3 writes A
W(A)
T2 writes B
T3 reads B
W(A)
W(B)
R(B)
T1 T2 T3
Conflict-Serializability R(A)
Conflict-Serializability R(A)
T3
Conflict-Serializability R(A)
Find the indegree(the number of edges directed into that node) and if its 0 it can be the first in serial
execution
T1 - 1 ,T2- 0, T3- 2 , T2 would be the first as indegree is 0
T2 must precede T1
T1 must precede T3
Therefore, one possible equivalent serial schedule is T2→T1→T3.
T1 T2
View-Serializability R(A)
T1 T2 T3 T1 T2 T3
R(A) W(A)
S W(A) R(A) S’
W(B) W(B)
LET’S START WITH DBMS :)
View-Serializability
T1 T2 T3 T1 T2 T3
W(B) R(A)
S S’
R(B) W(B)
R(A) R(B)
LET’S START WITH DBMS :)
View-Serializability
T1 T2 T3 T1 T2 T3
W(A) R(A)
S S’
R(A) W(A)
W(A) W(A)
LET’S START WITH DBMS :)
View-Serializability
The number of possible serial schedules for n transactions is given by the number of permutations of the
transactions: n!
T1 T2 T3
R(A)
W(A)
W(A)
W(A)
LET’S START WITH DBMS :) T1 T2 T3
R(A)
View-Serializability
W(A)
Step 1: Find if conflict serializable or not.
W(A)
Step 2: Find the possible serial schedules -> 3!
Step 3: Choose one possibility and check for view equivalent W(A)
conditions (T1->T2->T3)
How it helps?
1. Data Consistency: Ensures that data remains accurate and reliable despite
concurrent access.
2. Isolation: Maintains the isolation property of transactions, so the outcome of a
transaction is not affected by other concurrently executing transactions.
3. Serializability: Ensures that the result of concurrent transactions is the same as if
the transactions had been executed serially
LET’S START WITH DBMS :)
Concurrency control mechanisms
Dirty Reads: When a transaction reads data that has been modified by another
transaction but not yet committed. If the first transaction rolls back, the other
transaction will have read invalid data. (WR)
LET’S START WITH DBMS :)
Concurrency control mechanisms
Phantom Reads: Occurs when a transaction reads a set of rows that satisfy a
condition, but another transaction inserts or deletes rows that affect the set
before the first transaction completes. This results in the first transaction reading
different sets of rows if it re-executes the query.
LET’S START WITH DBMS :)
Concurrency control mechanisms
a. Binary Locks: A simple mechanism where a data item can be either locked (in use) or
unlocked.If a thread tries to acquire the lock when it's already locked, it must wait until the
lock is released by the thread currently holding it.
Shared Lock (S-lock): Allows multiple transactions to read a data item simultaneously
but prevents any of them from modifying it. Multiple transactions can hold a shared
lock on the same data item at the same time.
Exclusive Lock (X-lock): Allows a transaction to both read and modify a data item.
When an exclusive lock is held by a transaction, no other transaction can read or
modify the data item.
LET’S START WITH DBMS :)
Concurrency control mechanisms
Note : When a transaction acquires a shared lock on a data item, other transactions can also
acquire shared locks on that same item, enabling concurrent reads. However, no transaction
can acquire an exclusive lock on that item as long as one or more shared locks are held.
When a transaction acquires an exclusive lock on a data item, it has full control over that
item, meaning it can both read and modify it. No other transaction can acquire a lock on the
same data item until the exclusive lock is released.
While shared and exclusive locks are vital for maintaining data integrity and consistency in
concurrent environments, they can introduce significant challenges in terms of performance,
deadlocks, reduced concurrency, and system complexity.
LET’S START WITH DBMS :)
Concurrency control mechanisms
Drawbacks of shared-exclusive locks
Performance issues : Managing locks requires additional CPU and memory resources. The
process of acquiring, releasing, and managing locks can introduce significant overhead
Concurrency issues : Exclusive locks prevent other transactions from accessing locked
data, which can significantly reduce concurrency.
Deadlocks : Shared and exclusive locks can lead to deadlocks, where two or more
transactions hold locks that the other transactions need.
Irrecoverable : If Transaction B commits after the lock is release based on a modified value
in transaction A which fails after sometime.
LET’S START WITH DBMS :)
Concurrency control mechanisms
Deadlock : It is a situtaion when 2 or more transactions wait for one another to give up
the locks.
R1
Assigned to Waiting for
P1 P2
Two-Phase Locking (2PL) : This protocol ensures serializability by dividing the execution
of a transaction into two distinct phases
Any transaction which is following 2PL locking achieves serializability and consistency.
LET’S START WITH DBMS :)
Concurrency control mechanisms
Two-Phase Locking (2PL)
Advantages :
1. It guarantees that the schedule of transactions will be serializable, meaning the
results of executing transactions concurrently will be the same as if they were
executed in some serial order.
2. By ensuring that transactions are serializable, 2PL helps maintain data integrity and
consistency, which is critical in environments where data accuracy is essential.
Disadvantages :
1. Deadlocks, starvation and cascading rollbacks
2. Transactions must wait for locks to be released by other transactions. This can lead
to increased waiting times and lower system throughput.
3. In case of a system failure, recovering from a crash can be complex
LET’S START WITH DBMS :)
Concurrency control mechanisms
Advantages:
Prevents Cascading Aborts
Ensures Strict Serializability
Disadvantages:
Since write locks are held until the end of the transaction, other transactions may be
blocked for extended periods
Transactions may experience longer wait times to acquire locks
Deadlocks and starvation is there
LET’S START WITH DBMS :)
Concurrency control mechanisms
Advantages:
Since all locks are held until the end of the transaction, the system can easily ensure
that transactions are serializable and can be recovered
Prevents Cascading Aborts and Dirty Reads
Disadvantages:
Performance bottlenecks
Increased Transaction Duration
Deadlocks and starvation is there
LET’S START WITH DBMS :)
Concurrency control mechanisms
If the transaction is unable to acquire all the required locks (because some are already
held by other transactions), it waits and retries. The transaction only starts execution
once it has successfully acquired all the necessary locks.
Since a transaction never starts executing until it has all the locks it needs, deadlocks
cannot occur because no transaction will ever hold some locks and wait for others
In this scenario, deadlocks cannot occur because neither T1 nor T2 starts execution until
it has all the locks it needs.
LET’S START WITH DBMS :)
Concurrency control mechanisms
Write Timestamp (WTS): The last timestamp of any transaction that has
successfully written the data item.
1.Check the following condition whenever a transaction Ti issues a Read (X) operation:
If W_TS(A) >TS(Ti) then the operation is rejected. (rollback Ti)
If W_TS(A) <= TS(Ti) then the operation is executed. (set R_TS(A) as the
max of (R_TS(A), TS(Ti)
System Failure: Occurs when the entire system crashes due to hardware or
software failures, leading to loss of in-memory data.
Media Failure: Occurs when the physical storage (e.g., hard drives) is damaged,
resulting in data loss or corruption.
LET’S START WITH DBMS :)
Database recovery management
Recovery Phases
Analysis Phase: Identifies the point of failure and the transactions that were
active at that time.
Recovery Techniques
Backup and Restore: Regular backups are taken to ensure data can be
restored. Full, incremental, and differential backups are common types.