0% found this document useful (0 votes)
176 views14 pages

Unit 4

The document discusses various concurrency control schemes used to ensure serializability when transactions execute concurrently in a database. It describes lock-based protocols that use locking to control concurrent access to data items. Specifically, it discusses two-phase locking and timestamp ordering protocols. It also introduces validation-based protocols that aim to reduce overhead by validating transactions before applying writes.

Uploaded by

Latika Parashar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
176 views14 pages

Unit 4

The document discusses various concurrency control schemes used to ensure serializability when transactions execute concurrently in a database. It describes lock-based protocols that use locking to control concurrent access to data items. Specifically, it discusses two-phase locking and timestamp ordering protocols. It also introduces validation-based protocols that aim to reduce overhead by validating transactions before applying writes.

Uploaded by

Latika Parashar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Ashish Dadhich

DBMS

Unit 4 Concurrency Control


When several transactions execute concurrently in the database, however, the isolation property may no longer be preserved. To ensure that it is, the system must control the interaction among the concurrent transactions; this control is achieved through one of a variety of mechanisms called concurrency-control schemes.

Lock-Based Protocols
One way to ensure serializability is to require that data items be accessed in a mutually exclusive manner; that is, while one transaction is accessing a data item, no other transaction can modify that data item. The most common method used to implement this requirement is to allow a transaction to access a data item only if it is currently holding a lock on that item.

Locks
There are various modes in which a data item may be locked. In this section, we restrict our attention to two modes: 1. Shared. If a transaction Ti has obtained a shared-mode lock (denoted by S) on item Q, then Ti can read, but cannot write, Q. 2. Exclusive. If a transaction Ti has obtained an exclusive-mode lock (denoted by X) on item Q, then Ti can both read and write Q. We require that every transaction request a lock in an appropriate mode on data item Q, depending on the types of operations that it will perform on Q. The transaction makes the request to the concurrency-control manager. The transaction can proceed with the operation only after the concurrency-control manager grants the lock to the transaction.

Lock-compatibility matrix comp.

An element comp(A, B) of the matrix has the value true if and only if mode A is compatible with mode B. To access a data item, transaction Ti must first lock that item. If the data item is already locked by another transaction in an incompatible mode, the concurrency control manager will not grant the lock until all incompatible locks held by other transactions have been released. Thus, Ti is made to wait until all incompatible locks held by other transactions have been released.

Ashish Dadhich Transaction Ti may unlock a data item that it had locked at some earlier point. Note that a transaction must hold a lock on a data item as long as it accesses that item. Moreover, for a transaction to unlock a data item immediately after its final access of that data item is not always desirable, since serializability may not be ensured.

DBMS

Schedule 1

Unfortunately, locking can lead to an undesirable situation. A state where neither of these transactions can ever proceed with its normal execution. This situation is called deadlock. When deadlock occurs, the system must roll back one of the two transactions. Once a transaction has been rolled back, the data items that were locked by that transaction are unlocked. These data items are then available to the other transaction, which can continue with its execution. If we do not use locking, or if we unlock data items as soon as possible after reading or writing them, we may get inconsistent states. On the other hand, if we do not unlock a data item before requesting a lock on another data item, deadlocks may occur. We shall require that each transaction in the system follow a set of rules, called a locking protocol, indicating when a transaction may lock and unlock each of the data items. Locking protocols restrict the number of possible schedules. The set of all such schedules is a proper subset of all possible serializable schedules.
2

Ashish Dadhich

DBMS

Two-Phase Locking Protocol


This protocol requires that each transaction issue lock and unlock requests in two phases: 1. Growing phase. A transaction may obtain locks, but may not release any lock. 2. Shrinking phase. A transaction may release locks, but may not obtain any new locks. Initially, a transaction is in the growing phase. The transaction acquires locks as needed. Once the transaction releases a lock, it enters the shrinking phase, and it can issue no more lock requests. Note that the unlock instructions do not need to appear at the end of the transaction. The point in the schedule where the transaction has obtained its final lock (the end of its growing phase) is called the lock point of the transaction. Two-phase locking does not ensure freedom from deadlock. Cascading rollback may occur under two-phase locking. Cascading rollbacks can be avoided by a modification of two-phase locking called the strict two-phase locking protocol. This protocol requires not only that locking be two phase, but also that all exclusive-mode locks taken by a transaction be held until that transaction commits. This requirement ensures that any data written by an uncommitted transaction are locked in exclusive mode until the transaction commits, preventing any other transaction from reading the data. Another variant of two-phase locking is the rigorous two-phase locking protocol, which requires that all locks be held until the transaction commits. With rigorous two-phase locking, transactions can be serialized in the order in which they commit. Most database systems implement either strict or rigorous two-phase locking. A refinement of the basic two-phase locking protocol, in which lock conversions are allowed. We shall provide a mechanism for upgrading a shared lock to an exclusive lock, and downgrading an exclusive lock to a shared lock. We denote conversion from shared to exclusive modes by upgrade, and from exclusive to shared by downgrade. Lock conversion cannot be allowed arbitrarily. Rather, upgrading can take place in only the growing phase, whereas downgrading can take place in only the shrinking phase.

Timestamp-Based Protocols
With each transaction Ti in the system, we associate a unique fixed timestamp, denoted by TS(Ti). This timestamp is assigned by the database system before the transaction Ti starts execution. If a transaction Ti has been assigned timestamp TS(Ti), and a new transaction Tj enters the system, then TS(Ti) < TS(Tj ). There are two simple methods for implementing this scheme: 1. Use the value of the system clock as the timestamp; that is, a transactions timestamp is equal to the value of the clock when the transaction enters the system.
3

Ashish Dadhich 2. Use a logical counter that is incremented after a new timestamp has been assigned; that is, a transactions timestamp is equal to the value of the counter when the transaction enters the system. The timestamps of the transactions determine the serializability order. Thus, if TS(Ti) < TS(Tj ), then the system must ensure that the produced schedule is equivalent to a serial schedule in which transaction Ti appears before transaction Tj . To implement this scheme, we associate with each data item Q two timestamp values: W-timestamp(Q) denotes the largest timestamp of any transaction that executed write(Q) successfully. R-timestamp(Q) denotes the largest timestamp of any transaction that executed read(Q) successfully. These timestamps are updated whenever a new read(Q) or write(Q) instruction is executed. The timestamp-ordering protocol ensures that any conflicting read and write operations are executed in timestamp order. This protocol operates as follows: 1. Suppose that transaction Ti issues read(Q). a. If TS(Ti) < W-timestamp(Q), then Ti needs to read a value of Q that was already overwritten. Hence, the read operation is rejected, and Ti is rolled back. b. If TS(Ti) W-timestamp(Q), then the read operation is executed, and Rtimestamp(Q) is set to the maximum of R-timestamp(Q) and TS(Ti). 2. Suppose that transaction Ti issues write(Q). a. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously, and the system assumed that that value would never be produced. Hence, the system rejects the write operation and rolls Ti back. b. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, the system rejects this write operation and rolls Ti back. c. Otherwise, the system executes the write operation and sets W-timestamp(Q) to TS(Ti). If a transaction Ti is rolled back by the concurrency-control scheme as result of issuance of either a read or write operation, the system assigns it a new timestamp and restarts it. There are, however, schedules that are possible under the two-phase locking protocol, but are not possible under the timestamp protocol, and vice versa. The timestamp-ordering protocol ensures conflict serializability. This is because conflicting operations are processed in timestamp order. The protocol ensures freedom from deadlock, since no transaction ever waits.
4

DBMS

Ashish Dadhich

DBMS

Validation-Based Protocols
A concurrency-control scheme imposes overhead of code execution and possible delay of transactions. It may be better to use an alternative scheme that imposes less overhead. A difficulty in reducing the overhead is that we do not know in advance which transactions will be involved in a conflict. To gain that knowledge, we need a scheme for monitoring the system. We assume that each transaction Ti executes in two or three different phases in its lifetime, depending on whether it is a read-only or an update transaction. The phases are, in order, 1. Read phase. During this phase, the system executes transaction Ti. It reads the values of the various data items and stores them in variables local to Ti. It performs all write operations on temporary local variables, without updates of the actual database. 2. Validation phase. Transaction Ti performs a validation test to determine whether it can copy to the database the temporary local variables that hold the results of write operations without causing a violation of serializability. 3. Write phase. If transaction Ti succeeds in validation (step 2), then the system applies the actual updates to the database. Otherwise, the system rolls back Ti. Each transaction must go through the three phases in the order shown. However, all three phases of concurrently executing transactions can be interleaved. To perform the validation test, we need to know when the various phases of transactions Ti took place. We shall, therefore, associate three different timestamps with transaction Ti: 1. Start(Ti), the time when Ti started its execution. 2. Validation(Ti), the time when Ti finished its read phase and started its validation phase. 3. Finish(Ti), the time when Ti finished its write phase. We determine the serializability order by the timestamp-ordering technique, using the value of the timestamp Validation(Ti). Thus, the value TS(Ti) = Validation(Ti) and, if TS(Tj ) < TS(Tk), then any produced schedule must be equivalent to a serial schedule in which transaction Tj appears before transaction Tk. The reason we have chosen Validation(Ti), rather than Start(Ti), as the timestamp of transaction Ti is that we can expect faster response time provided that conflict rates among transactions are indeed low. The validation test for transaction Tj requires that, for all transactions Ti with TS(Ti) < TS(Tj ), one of the following two conditions must hold: 1. Finish(Ti) < Start(Tj ). Since Ti completes its execution before Tj started, the serializability order is indeed maintained. 2. The set of data items written by Ti does not intersect with the set of data items read by Tj, and Ti completes its write phase before Tj starts its validation phase (Start(Tj ) < Finish(Ti) < Validation(Tj )). This condition ensures that the writes of Ti and Tj do not overlap. Since the writes of Ti do not affect the read of Tj , and since Tj cannot affect the read of Ti, the serializability order is indeed maintained.
5

Ashish Dadhich

DBMS

Deadlock Handling
A system is in a deadlock state if there exists a set of transactions such that every transaction in the set is waiting for another transaction in the set. None of the transactions can make progress in such a situation. The only remedy to this undesirable situation is for the system to invoke some drastic action, such as rolling back some of the transactions involved in the deadlock. Rollback of a transaction may be partial: That is, a transaction may be rolled back to the point where it obtained a lock whose release resolves the deadlock. There are two principal methods for dealing with the deadlock problem. We can use a deadlock prevention protocol to ensure that the system will never enter a deadlock state. Alternatively, we can allow the system to enter a deadlock state, and then try to recover by using a deadlock detection and deadlock recovery scheme. As we shall see, both methods may result in transaction rollback. Prevention is commonly used if the probability that the system would enter a deadlock state is relatively high; otherwise, detection and recovery are more efficient.

Deadlock Prevention
There are two approaches to deadlock prevention. One approach ensures that no cyclic waits can occur by ordering the requests for locks, or requiring all locks to be acquired together. The other approach is closer to deadlock recovery, and performs transaction rollback instead of waiting for a lock, whenever the wait could potentially result in a deadlock. The simplest scheme under the first approach requires that each transaction locks all its data items before it begins execution. Moreover, either all are locked in one step or none are locked. There are two main disadvantages to this protocol: (1) it is often hard to predict, before the transaction begins, what data items need to be locked; (2) data-item utilization may be very low, since many of the data items may be locked but unused for a long time. Another approach for preventing deadlocks is to impose an ordering of all data items, and to require that a transaction lock data items only in a sequence consistent with the ordering. We have seen one such scheme in the tree protocol, which uses a partial ordering of data items. The second approach for preventing deadlocks is to use preemption and transaction rollbacks. In preemption, when a transaction T2 requests a lock that transaction T1 holds, the lock granted to T1 may be preempted by rolling back of T1, and granting of the lock to T2. To control the preemption, we assign a unique timestamp to each transaction. The system uses these timestamps only to decide whether a transaction should wait or roll back. Locking is still used for concurrency control. If a transaction is rolled back, it retains its old timestamp when restarted. Two different deadlock prevention schemes using timestamps have been proposed:
6

Ashish Dadhich

DBMS

1. The waitdie scheme is a non-preemptive technique. When transaction Ti requests a data item currently held by Tj , Ti is allowed to wait only if it has a timestamp smaller than that of Tj (that is, Ti is older than Tj ). Otherwise, Ti is rolled back (dies). 2. The woundwait scheme is a preemptive technique. It is a counterpart to the waitdie scheme. When transaction Ti requests a data item currently held by Tj , Ti is allowed to wait only if it has a timestamp larger than that of Tj (that is, Ti is younger than Tj ). Otherwise, Tj is rolled back (Tj is wounded by Ti). Or in other simple words, 1. Wait-die: If Ti has higher priority, it is allowed to wait; otherwise it is aborted. 2. Wound-wait: If Ti has higher priority, abort Tj; otherwise Ti waits. The major problem with both of these schemes is that unnecessary rollbacks may occur.

Deadlock Detection and Recovery


If a system does not employ some protocol that ensures deadlock freedom, then a detection and recovery scheme must be used. An algorithm that examines the state f the system is invoked periodically to determine whether a deadlock has occurred. If one has, then the system must attempt to recover from the deadlock. To do so, the system must: Maintain information about the current allocation of data items to transactions, as well as any outstanding data item requests. Provide an algorithm that uses this information to determine whether the system has entered a deadlock state. Recover from the deadlock when the detection algorithm determines that a deadlock exists.

Deadlock Detection
Deadlocks can be described precisely in terms of a directed graph called a wait-for graph. This graph consists of a pair G = (V, E), where V is a set of vertices and E is a set of edges. The set of vertices consists of all the transactions in the system. Each element in the set E of edges is an ordered pair Ti Tj. If Ti Tj is in E, then there is a directed edge from transaction Ti to Tj, implying that transaction Ti is waiting for transaction Tj to release a data item that it needs. When transaction Ti requests a data item currently being held by transaction Tj, then the edge Ti Tj is inserted in the wait-for graph. This edge is removed only when transaction Tj is no longer holding a data item needed by transaction Ti. A deadlock exists in the system if and only if the wait-for graph contains a cycle. Each transaction involved in the cycle is said to be deadlocked. To detect deadlocks, the system needs to maintain the wait-for graph, and periodically to invoke an algorithm that searches for a cycle in the graph.

Recovery from Deadlock


When a detection algorithm determines that a deadlock exists, the system must recover from the deadlock. The most common solution is to roll back one or more transactions to break the deadlock. Three actions need to be taken:
7

Ashish Dadhich 1. Selection of a victim: - we must determine which transaction (or transactions) to roll back to break the deadlock. We should roll back those transactions that will incur the minimum cost. 2. Rollback: - The simplest solution is a total rollback: Abort the transaction and then restart it. However, it is more effective to roll back the transaction only as far as necessary to break the deadlock. Such partial rollback requires the system to maintain additional information about the state of all the running transactions. 3. Starvation: - In a system where the selection of victims is based primarily on cost factors, it may happen that the same transaction is always picked as a victim. As a result, this transaction never completes its designated task, thus there is starvation. We must ensure that transaction can be picked as a victim only a (small) finite number of times. The most common solution is to include the number of rollbacks in the cost factor.

DBMS

Database Failure and Recovery


A computer system, like any other device, is subject to failure from a variety of causes: disk crash, power outage, software error, a fire in the machine room, even sabotage. In any failure, information may be lost. Therefore, the database system must take actions in advance to ensure that the atomicity and durability properties of transactions are preserved. An integral part of a database system is a recovery scheme that can restore the database to the consistent state that existed before the failure. The recovery scheme must also provide high availability; that is, it must minimize the time for which the database is not usable after a crash.

Failure Classification
There are various types of failure that may occur in a system, each of which needs to be dealt with in a different manner. The simplest type of failure is one that does not result in the loss of information in the system. The failures that are more difficult to deal with are those that result in loss of information. These are the following types of failure: 1. Transaction failure: There are two types of errors that may cause a transaction to fail: Logical error: The transaction can no longer continue with its normal execution because of some internal condition, such as bad input, data not found, overflow, or resource limit exceeded. System error: The system has entered an undesirable state (for example, deadlock), as a result of which a transaction cannot continue with its normal execution. The transaction, however, can be re-executed at a later time. 2. System crash: There is a hardware malfunction, or a bug in the database software or the operating system, that causes the loss of the content of volatile storage, and brings transaction processing to a halt. The content of nonvolatile storage remains intact, and is not corrupted. The assumption that hardware errors and bugs in the software bring the system to a halt, but do not corrupt the nonvolatile storage contents, is known as the fail-stop assumption.
8

Ashish Dadhich Well-designed systems have numerous internal checks, at the hardware and the software level that brings the system to a halt when there is an error. Hence, the fail-stop assumption is a reasonable one. 3. Disk failure: A disk block loses its content as a result of either a head crash or failure during a data transfer operation. Copies of the data on other disks, or archival backups on tertiary media, such as tapes, are used to recover from the failure. Algorithms to ensure database consistency and transaction atomicity despite failures, known as recovery algorithms, have two parts: 1. Actions taken during normal transaction processing to ensure that enough information exists to allow recovery from failures. 2. Actions taken after a failure to recover the database contents to a state that ensures database consistency, transaction atomicity, and durability.

DBMS

Recovery Schemes
Log-Based Recovery
The most widely used structure for recording database modifications is the log. The log is a sequence of log records, recording all the update activities in the database. There are several types of log records. An update log record describes a single database write. It has these fields: Transaction identifier is the unique identifier of the transaction that performed the write operation. Data-item identifier is the unique identifier of the data item written. Typically, it is the location on disk of the data item. Old value is the value of the data item prior to the write. New value is the value that the data item will have after the write. Other special log records exist to record significant events during transaction processing, such as the start of a transaction and the commit or abort of a transaction. We denote the various types of log records as: <Ti start>: - Transaction Ti has started. <Ti, Xj, V1, V2>: - Transaction Ti has performed a write on data item Xj . Xj had value V1 before the write, and will have value V2 after the write. <Ti commit>: - Transaction Ti has committed. <Ti abort>: - Transaction Ti has aborted. Whenever a transaction performs a write, it is essential that the log record for that write be created before the database is modified. Once a log record exists, we can output the modification to the database if that is desirable. Also, we have the ability to undo a modification that has already been output to the database. We undo it by using the oldvalue field in log records.

Deferred Database Modification


The deferred-modification technique ensures transaction atomicity by recording all database modifications in the log, but deferring the execution of all write operations of a transaction until the transaction partially commits.
9

Ashish Dadhich

DBMS

When a transaction partially commits, the information on the log associated with the transaction is used in executing the deferred writes. If the system crashes before the transaction completes its execution, or if the transaction aborts, then the information on the log is simply ignored. The execution of transaction Ti proceeds as follows. Before Ti starts its execution, a record <Ti start> is written to the log. A write(X) operation by Ti results in the writing of a new record to the log. Finally, when Ti partially commits, a record <Ti commit> is written to the log. When transaction Ti partially commits, the records associated with it in the log are used in executing the deferred writes. Since a failure may occur while this updating is taking place, we must ensure that, before the start of these updates, all the log records are written out to stable storage. Once they have been written, the actual updating takes place, and the transaction enters the committed state. Observe that only the new value of the data item is required by the deferred modification technique. Thus, we can simplify the general update-log record structure that we saw in the previous section, by omitting the old-value field. Using the log, the system can handle any failure that results in the loss of information on volatile storage. The recovery scheme uses the following recovery procedure: redo(Ti) sets the value of all data items updated by transaction Ti to the new values. The set of data items updated by Ti and their respective new values can be found in the log. The redo operation must be idempotent; that is, executing it several times must be equivalent to executing it once. This characteristic is required if we are to guarantee correct behavior even if a failure occurs during the recovery process. After a failure, the recovery subsystem consults the log to determine which transactions need to be redone. Transaction Ti needs to be redone if and only if the log contains both the record <Ti start> and the record <Ti commit>. Thus, if the system crashes after the transaction completes its execution, the recovery scheme uses the information in the log to restore the system to a previous consistent state after the transaction had completed.

Immediate Database Modification


The immediate-modification technique allows database modifications to be output to the database while the transaction is still in the active state. Data modifications written by active transactions are called uncommitted modifications. In the event of a crash or a transaction failure, the system must use the old-value field of the log records to restore the modified data items to the value they had prior to the start of the transaction. The undo operation, described next, accomplishes this restoration.
10

Ashish Dadhich Before a transaction Ti starts its execution, the system writes the record <Ti start> to the log. During its execution, any write(X) operation by Ti is preceded by the writing of the appropriate new update record to the log. When Ti partially commits, the system writes the record <Ti commit> to the log. Since the information in the log is used in reconstructing the state of the database, we cannot allow the actual update to the database to take place before the corresponding log record is written out to stable storage. We therefore require that, before execution of an output(B) operation, the log records corresponding to B be written onto stable storage. Using the log, the system can handle any failure that does not result in the loss of information in nonvolatile storage. The recovery scheme uses two recovery procedures: undo(Ti) restores the value of all data items updated by transaction Ti to the old values. redo(Ti) sets the value of all data items updated by transaction Ti to the new values. The set of data items updated by Ti and their respective old and new values can be found in the log. The undo and redo operations must be idempotent to guarantee correct behavior even if a failure occurs during the recovery process. After a failure has occurred, the recovery scheme consults the log to determine which transactions need to be redone, and which need to be undone: Transaction Ti needs to be undone if the log contains the record <Ti start>, but does not contain the record <Ti commit>. Transaction Ti needs to be redone if the log contains both the record <Ti start> and the record <Ti commit>.

DBMS

Checkpoints
The system periodically performs checkpoints, which require the following sequence of actions to take place: 1. Output onto stable storage all log records currently residing in main memory. 2. Output to the disk all modified buffer blocks. 3. Output onto stable storage a log record <checkpoint>. Transactions are not allowed to perform any update actions, such as writing to a buffer block or writing a log record, while a checkpoint is in progress. The presence of a <checkpoint> record in the log allows the system to streamline its recovery procedure.

Shadow Paging Recovery


An alternative to log-based crash-recovery techniques is shadow paging. The shadow paging technique is essentially an improvement on the shadow-copy technique. Under certain circumstances, shadow paging may require fewer disk accesses than do the log based methods discussed previously. There are, however, disadvantages to the shadow paging approach, as we shall see, that limit its use. For example, it is hard to extend shadow paging to allow multiple transactions to execute concurrently.

11

Ashish Dadhich As before, the database is partitioned into some number of fixed-length blocks, which are referred to as pages. Assume that there are n pages, numbered 1 through n. These pages do not need to be stored in any particular order on disk. However, there must be a way to find the ith page of the database for any given i. We use a page table for this purpose. The page table has n entriesone for each database page. Each entry contains a pointer to a page on disk. The first entry contains a pointer to the first page of the database, the second entry points to the second page, and so on. The logical order of database pages does not need to correspond to the physical order in which the pages are placed on disk. The key idea behind the shadow-paging technique is to maintain two page tables during the life of a transaction: the current page table and the shadow page table. When the transaction starts, both page tables are identical. The shadow page table is never changed over the duration of the transaction. The current page table may be changed when a transaction performs a write operation. All input and output operations use the current page table to locate database pages on disk. Suppose that the transaction Tj performs a write(X) operation, and that X resides on the ith page. The system executes the write operation as follows: 1. If the ith page (that is, the page on which X resides) is not already in main memory, then the system issues input(X). 2. If this is the write first performed on the ith page by this transaction, then the system modifies the current page table as follows: a. It finds an unused page on disk. Usually, the database system has access to a list of unused (free) pages. b. It deletes the page found in step 2a from the list of free page frames; it copies the contents of the ith page to the page found in step 2a. c. It modifies the current page table so that the ith entry points to the page found in step 2a. 3. It assigns the value of xj to X in the buffer page. Intuitively, the shadow-page approach to recovery is to store the shadow page table in nonvolatile storage, so that the state of the database prior to the execution of the transaction can be recovered in the event of a crash, or transaction abort. When the transaction commits, the system writes the current page table to nonvolatile storage. The current page table then becomes the new shadow page table, and the next transaction is allowed to begin execution. It is important that the shadow page table be stored in nonvolatile storage, since it provides the only means of locating database pages. The current page table may be kept in main memory (volatile storage).We do not care whether the current page table is lost in a crash, since the system recovers by using the shadow page table.

DBMS

Recovery with Concurrent Transactions


Regardless of the number of concurrent transactions, the system has a single disk buffer and a single log. All transactions share the buffer blocks. We allow immediate modification, and permit a buffer block to have data items updated by one or more transactions.
12

Ashish Dadhich

DBMS

Interaction with Concurrency Control


The recovery scheme depends greatly on the concurrency-control scheme that is used. To roll back a failed transaction, we must undo the updates performed by the transaction. Suppose that a transaction T0 has to be rolled back, and a data item Q that was updated by T0 has to be restored to its old value. Using the log-based schemes for recovery, we restore the value by using the undo information in a log record. Suppose now that a second transaction T1 has performed yet another update on Q before T0 is rolled back. Then, the update performed by T1 will be lost if T0 is rolled back. Therefore, we require that, if a transaction T has updated a data item Q, no other transaction may update the same data item until T has committed or been rolled back. We can ensure this requirement easily by using strict two-phase lockingthat is, two-phase locking with exclusive locks held until the end of the transaction.

Transaction Rollback
We roll back a failed transaction, Ti, by using the log. The system scans the log backward; for every log record of the form <Ti, Xj, V1, V2> found in the log, the system restores the data item Xj to its old value V1. Scanning of the log terminates when the log record <Ti, start> is found. Scanning the log backward is important, since a transaction may have updated a data item more than once. As an illustration, consider the pair of log records <Ti, A, 10, 20> <Ti, A, 20, 30> The log records represent a modification of data item A by Ti, followed by another modification of A by Ti. Scanning the log backward sets A correctly to 10. If the log were scanned in the forward direction, A would be set to 20, which is incorrect. If strict twophase locking is used for concurrency control, locks held by a transaction T may be released only after the transaction has been rolled back as described.

Checkpoints
We used checkpoints to reduce the number of log records that the system must scan when it recovers from a crash. Since we assumed no concurrency, it was necessary to consider only the following transactions during recovery: Those transactions that started after the most recent checkpoint The one transaction, if any, that was active at the time of the most recent checkpoint The situation is more complex when transactions can execute concurrently, since several transactions may have been active at the time of the most recent checkpoint. In a concurrent transaction-processing system, we require that the checkpoint log record be of the form <checkpoint L>, where L is a list of transactions active at the time of the checkpoint. Again, we assume that transactions do not perform updates either on the buffer blocks or on the log while the checkpoint is in progress.

13

Ashish Dadhich

DBMS

Restart Recovery
When the system recovers from a crash, it constructs two lists: The undo-list consists of transactions to be undone, and the redo-list consists of transactions to be redone. The system constructs the two lists as follows: Initially, they are both empty. The system scans the log backward, examining each record, until it finds the first <checkpoint> record: For each record found of the form <Ti commit>, it adds Ti to redo-list. For each record found of the form <Ti start>, if Ti is not in redo-list, then it adds Ti to undo-list. When the system has examined all the appropriate log records, it checks the list L in the checkpoint record. For each transaction Ti in L, if Ti is not in redo-list then it adds Ti to the undo-list. Once the redo-list and undo-list have been constructed, the recovery proceeds as follows: 1. The system rescans the log from the most recent record backward, and performs an undo for each log record that belongs transaction Ti on the undo-list. Log records of transactions on the redo-list are ignored in this phase. The scan stops when the <Ti start> records have been found for every transaction Ti in the undo-list. 2. The system locates the most recent <checkpoint L> record on the log. Notice that this step may involve scanning the log forward, if the checkpoint record was passed in step 1. 3. The system scans the log forward from the most recent<checkpoint L>record, and performs redo for each log record that belongs to a transaction Ti that is on the redo-list. It ignores log records of transactions on the undo-list in this phase. It is important in step 1 to process the log backward, to ensure that the resulting state of the database is correct. After the system has undone all transactions on the undo-list, it redoes those transactions on the redo-list. It is important, in this case, to process the log forward. When the recovery process has completed, transaction processing resumes. It is important to undo the transaction in the undo-list before redoing transactions in the redo-list, using the algorithm in steps 1 to 3; otherwise, a problem may occur.

14

You might also like