Transaction Management PDEU April 2023
Transaction Management PDEU April 2023
Transaction Management PDEU April 2023
Concurrency Control
Minal Bhise
Distributed Databases Research Group
DA-IICT, Gandhinagar
[email protected]
1
Outline
Definition
ACID properties
Transaction states
Schedule: Serial, Interleaved, equivalent
Serializability
Concurrency Control Protocols:
Lock based Protocols: Two Phase Locking, Time stamp based, deadlock detection, prevention,
recovery
Time Stamp based Protocols, Validation based Protocols
Crash Recovery Protocol
Algorithm for Recovery and Isolation Exploiting Semantics ARIES
2
Transaction
• Transaction is a logical unit of work that contains one or more SQL statements. A
transaction is an atomic unit. The effects of all the SQL statements in a
transaction can be either all committed (applied to the database) or all rolled
back (undone from the database)
• A user’s program may carry out many operations on the data retrieved from the
database, but the DBMS is only concerned about what data is read/written
from/to the database
• A transaction is the DBMS’s abstract view of a user program: a sequence of reads
and writes
3
Transaction Concept
• A transaction is a unit of program execution that accesses and
possibly updates various data items.
• E.g., transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
If the transaction fails after step 3 and before step 6, money will be
“lost” leading to an inconsistent database state
• Failure could be due to software or hardware
• The system should ensure that updates of a partially executed transaction are
not reflected in the database
Durability
Transaction to transfer $50 from account A to account B. t=0, A=B=100
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
• Once the user has been notified that the transaction has completed (i.e., the
transfer of the $50 has taken place), the updates to the database by the
transaction must persist even if there are software or hardware failures.
7
Consistency
• The sum of A and B is unchanged by the execution of the transaction
(A+B=200)
• In general, consistency requirements include
• Explicitly specified integrity constraints such as primary keys and foreign keys
• Implicit integrity constraints
• e.g., sum of balances of all accounts, minus sum of loan amounts must
equal value of cash-in-hand
• A transaction must see a consistent database.
• During transaction execution the database may be temporarily inconsistent.
• When the transaction completes successfully the database must be consistent
• Erroneous transaction logic can lead to inconsistency
Isolation
• if between steps 3 and 6, another transaction T2 is allowed to access
the partially updated database, it will see an inconsistent database
(the sum A + B will be less than it should be).
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
[1] http://casem3.blogspot.com/2016/10/magnetic-disk-primary-computer-storage.html
Schedules
• Sequences of instructions that specify the chronological order in
which instructions of concurrent transactions are executed
• A schedule for a set of transactions must consist of all instructions
of those transactions
• Must preserve the order in which the instructions appear in each
individual transaction.
• A transaction that successfully completes its execution will have a
commit instructions as the last statement
• By default transaction assumed to execute commit instruction as
its last step
• A transaction that fails to successfully complete its execution will have
an abort instruction as the last statement
Schedule 1
• Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
• At t=0, A=1000, B=2000, A+B= 3000, A=855, B=2145
• A serial schedule in which T1 is followed by T2 :
Schedule 2
• A serial schedule where T2 is followed by T1
• At t=0, A=1000, B=2000, A+B= 3000, A=850, B=2150
Schedule 3
• Let T1 and T2 be the transactions defined previously. The following schedule is not a
serial schedule, but it is equivalent to Schedule 1
• At t=0, A=1000, B=2000, A+B= 3000, A=855, B=2145
• read(X) transfers the data item X from the database to a variable, also
called X, in a buffer in main memory belonging to the transaction that
executed the read operation.
• write(X) transfers the value in the variable X in the main-memory
buffer of the transaction that executed the write to the data item X in
the database.
Conflicting Instructions
• Instructions li and lj of transactions Ti and Tj respectively, conflict if and
only if there exists some item Q accessed by both li and lj, and at least
one of these instructions wrote Q
1. li = read(Q), lj = read(Q). li and lj don’t conflict
2. li = read(Q), lj = write(Q). They conflict
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
• Intuitively, a conflict between li and lj forces a (logical) temporal order
between them
• If li and lj are consecutive in a schedule and they do not conflict, their
results would remain the same even if they had been interchanged in
the schedule
Conflict Serializability
• If a schedule S can be transformed into a schedule S’ by a series of
swaps of non-conflicting instructions, we say that S and S’ are conflict
equivalent.
• We say that a schedule S is conflict serializable if it is conflict
equivalent to a serial schedule
Conflict Serializability
Schedule 3 can be transformed into Schedule 6, a serial schedule where T2
follows T1, by series of swaps of non-conflicting instructions. Therefore
Schedule 3 is conflict serializable
Schedule 3 Schedule 6
Conflict Serializability
• a schedule that is not conflict serializable:
29
Precedence Graph
• This graph consists of a pair G = (V, E), where V is a set of vertices and E is a set of
edges.
• The set of vertices consists of all the transactions participating in the schedule.
The set of edges consists of all edges Ti → Tj for which one of three conditions
holds:
• 1. Ti executes write(Q) before Tj executes read(Q).
• 2. Ti executes read(Q) before Tj executes write(Q).
• 3. Ti executes write(Q) before Tj executes write(Q).
• If an edge Ti →Tj exists in the precedence graph, then, in any serial schedule S′
equivalent to S, Ti must appear before Tj.
30
Example
• A schedule that is not conflict serializable:
• One node per transaction; edge from Ti to Tj if actions of Ti precedes and conflicts with one of
Tj’s actions
• The cycle in the graph G (V,E) reveals the problem. The output of T1 depends on T2, and vice-
versa A
T1 T2
B
Locks
• Transaction may not release locks
• Phase 2: Shrinking Phase
• Transaction may release locks Time
• Transaction may not obtain locks
• Require that each transaction locks all its data items before it begins
execution (predeclaration).
• Impose partial ordering of all data items and require that a transaction can
lock data items only in the order specified by the partial order (graph-based
protocol).
Deadlock Prevention Strategies
• Following schemes use transaction timestamps for the sake of
deadlock prevention alone.
• TS(T1) < TS (T2)
• wait-die scheme
• older transaction may wait for younger one to release data item. Younger
transactions never wait for older ones; they are rolled back instead.
• a transaction may die several times before acquiring needed data item
• wound-wait scheme
• older transaction wounds (forces rollback) of younger transaction instead of
waiting for it. Younger transactions may wait for older ones.
• may be fewer rollbacks than wait-die scheme
Deadlock Prevention
• Assign priorities based on timestamps. Assume Ti wants a lock that Tj holds. Two
policies are possible:
• Wait-Die: It Ti has higher priority, Ti waits for Tj; otherwise Ti aborts
• Wound-wait: If Ti has higher priority, Tj aborts; otherwise Ti waits
T1 T2 T1 T2
T4 T3 T3 T3
Deadlock Detection
• Deadlocks can be described as a wait-for graph, which consists of a pair G = (V,E),
• V is a set of vertices (all the transactions in the system)
• E is a set of edges; each element is an ordered pair Ti Tj.
• If Ti Tj is in E, then there is a directed edge from Ti to Tj, implying that Ti is
waiting for Tj to release a data item.
• When Ti requests a data item currently being held by Tj, then the edge Ti Tj is
inserted in the wait-for graph. This edge is removed only when Tj is no longer
holding a data item needed by Ti.
• The system is in a deadlock state if and only if the wait-for graph has a cycle.
Must invoke a deadlock-detection algorithm periodically to look for cycles.
Deadlock Recovery
When deadlock is detected: Selection of victim, rollback, starvation
• Selection of victim:
• Select that transaction as victim that will incur minimum cost
• How long it has computed and how long is still to go?
• Number of data items already used by it
• Number of data items to be used further by it
• Number of transactions involved in rollback
Deadlock Recovery
• Rollback -- determine how far to roll back transaction
Total rollback: Abort the transaction and then restart it.
Partial rollback: More effective to roll back transaction only as far as necessary to break
deadlock.
• Needs to keep additional information like state of all running ,
sequence of lock request/grants and updates performed by
transactions
• Detection mechanism should decide which locks the selected
transactions needs to release to break the deadlock
• The selected transaction must be rolled back to the point where it obtained first of
these locks, undoing all actions it took after that
point
• Starvation
• happens if same transaction is always chosen as victim.
• Include the number of rollbacks in the cost factor to avoid starvation
Multiple Granularity
• Allow data items to be of various sizes and define a hierarchy of data
granularities, where the small granularities are nested within larger ones.
• Can be represented graphically as a tree (but don't confuse with tree-locking
protocol)
• When a transaction locks a node in the tree explicitly, it implicitly locks all the
node's descendents in the same mode.
• Granularity of locking (level in tree where locking is done):
• fine granularity (lower in tree): high concurrency, high locking overhead
• coarse granularity (higher in tree): low locking overhead, low concurrency
Granularity
Database
contains Files
Pages
Tuples
Example of Granularity Hierarchy
• the oldest active transaction has timestamp > 9, then Q5 will never be
required again