0% found this document useful (0 votes)
13 views

Module 6

Uploaded by

sohamniwate90
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Module 6

Uploaded by

sohamniwate90
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 98

Transactions Management and

Concurrency
Transaction concept, Transaction state
diagram, ACID properties,
Concurrent Execution, Seialzability : Conflict
and view Serializibility
Concurrent Control: Lock-based, timestamp-based
protocols.
Recovery System: Failure Classification, Log based
recovery,ARIES, Checkpoint, Shadow Paging.
Deadlock Hnadling: Avoidance, Prevention,Wait-die
Transaction concept
• Database Transaction is an atomic unit that
contains one or more SQL statements.
• It is a series of operations that performs as a
single unit of work against a database.
• It is a logical unit of work.
• It has a beginning and an end to specify its
boundary.
• Let's take an simple example of bank
transaction, Suppose a Bank clerk transfers Rs.
1000 from X's account to Y's account.

• X's Account
• open-account (X)
prev-balance = X.balance
curr-balance = prev-balance – 1000
X.balance = curr-balance
close-account (X)
• Decreasing Rs. 1000 from X's account, saving new
balance that is current balance and after
completion of transaction the last step is closing
the account.
• Y's Account
• open-account (Y)
prev - balance = Y.balance
curr - balance = prev-balance + 1000
Y.balance = curr-balance
close-account (Y)
• Adding Rs. 1000 in the Y's account and saving
new balance that is current balance and after
completion of transaction the last step is
closing the account.

• The above example defines a very simple and


small transaction that tells how the
transaction management actual works.
Transaction States
• A transaction is a small unit of program which
contains several low level tasks.
• It is an event which occurs on the database.
• It has the following states,

1. Active
2. Partially Committed
3. Failed
4. Aborted
5. Committed
• 1. Active : Active is the initial state of every
transaction. The transaction stays in Active state
during execution.
2. Partially Committed : Partially committed
state defines that the transaction has executed
the final statement.
3. Failed : Failed state defines that the execution
of the transaction can no longer proceed
further.
4. Aborted : Aborted state defines that the
transaction has rolled back and the database is
being restored to the consistent state.
5. Committed : If the transaction has completed
its execution successfully, then it is said to be
committed.
Transaction Properties
• Following are the Transaction Properties, referred to by an
acronym ACID properties:

1. Atomicity
2. Consistency
3. Isolation
4. Durability

ACID properties are the most important concepts of database


theory.
• A transaction is a small unit of program which contains several
low level tasks.
• These properties guarantee that the database transactions are
processed reliably.
1. Atomicity

• Atomicity defines that all operations of the


transactions are either executed or none.
• Atomicity is also known as 'All or Nothing', it means
that either perform the operations or not perform
at all.
• It is maintained in the presence of deadlocks, CPU
failures, disk failures, database and application
software failures.
• It can be turned off at system level and session
level.
Example: Atomicity
• Business Transactions
• A business transaction might involve confirming
a shipping address, charging the customer and
creating an order. If one of these steps fails, all
should fail.
• File Systems
• A file operation such as cut-and-paste whereby
the source file isn't deleted unless it is
successfully pasted.
• Suppose Account A has a balance of 400$ & B has
700$. Account A is transferring 100$ to Account B.
This is a transaction that has two operations,
a) Debiting 100$ from A’s balance
b) Creating 100$ to B’s balance.
Let’s say first operation passed successfully while
second failed, in this case A’s balance would be
300$ while B would be having 700$ instead of
800$. This is unacceptable in a banking system.
Either the transaction should fail without
executing any of the operation or it should process
both the operations. The Atomicity property
ensures that.
2. Consistency
• Consistency defines that after the transaction
is finished, the database must remain in a
consistent state.
• It preserves consistency of the database.
• If execution of transaction is successful, then
the database remains in a consistent state. If
the transaction fails, then the transaction will
be rolled back and the database will be
restored to a state consistent.
Example: Consistency
• For example,
• Account A is having a balance of 400$ and it is
transferring 100$ to account B & C both. So we have
two transactions here.
• Let’s say these transactions run concurrently and both
the transactions read 400$ balance, in that case the
final balance of A would be 300$ instead of 200$. This
is wrong.
• If the transaction were to run in isolation then the
second transaction would have read the correct
balance 300$ (before debiting 100$) once the first
transaction went successful.
3. Isolation
• Isolation defines that the transactions are
securely and independently processed at the
same time without interference.
• Isolation property does not ensure the order of
transactions.
• The operations cannot access or see the data in
an intermediate state during a transaction.
• Isolation is needed when there are concurrent
transactions occurring at the same time.
Example: Isolation
• If Joe issues a transaction against a database at the same
time that Mary issues a different transaction, both
transactions should operate on the database in an isolated
manner.
• The database should either perform Joe’s entire
transaction before executing Mary’s or vice-versa.
• This prevents Joe’s transaction from reading intermediate
data produced as a side effect of part of Mary’s
transaction that will not eventually be committed to the
database.
• It is important to note that the isolation property does not
ensure that a specific transaction will execute first, only
that they will not interfere with each other.
Isolation levels

• There are four levels of isolation.


• Higher isolation limits the ability of users to
concurrently access the same data. The higher the
isolation level, the greater system resources are
required and the more likely database transactions
will block one another.
• As the isolation level is lowered, the more there is a
chance that users will encounter read phenomena
such as uncommitted dependencies, also known as
dirty reads, which result in data being read from a row
that has been modified by another user but not yet
committed to the database.
• Serializable is the highest level, which means that
the transactions will be completed before
another transaction is able to start.
• Repeatable reads allow transactions to be
accessed once the transaction has started, even
though it hasn’t been finished.
• Read committed allows the data to be accessed
after the data has been committed to the
database, but not before then.
• Read uncommitted is the lowest level of isolation
and allows data to be accessed before the
changes have been made.
4. Durability
• Durability states that after completion of
transaction successfully, the changes are
required for the database.
• Durability holds its latest updates even if the
system fails or restarts.
• It has the ability to recover committed
transaction updates even if the storage media
fails.
Concurrent Execution

• Concurrency control is the process of managing


simultaneous execution of transactions (such as
queries, updates, inserts, deletes and so on) in a
multiprocessing database system without having
them interfere with one another.
• This property of DBMS allows many transactions
to access the same database at the same time
without interfering with each other.
Concurrent Execution
• The primary goal of concurrency is to ensure the
atomicity of the execution of transactions in a
multi-user database environment.
• Concurrency controls mechanisms attempt to
interleave (parallel) READ and WRITE operations
of multiple transactions so that the interleaved
execution yields results that are identical to the
results of a serial schedule execution.
• Example : Railway reservation and online
shopping…
What is transaction schedule?
• Transaction schedule is a sequence of database
instructions.
• A transaction includes read and write database object
for executing the actions.
• In transaction schedules, after finishing the first
transaction, the execution of second transaction starts.
• 'R' denotes “Read” operation and 'W' denotes “Write”
operation performed by the transaction.
• If the transaction fails or succeeds, it must specify its
final action as a commit or abort.
Example : Consider two transactions T1 and T2.
T1 : Deposits Rs. 1000 to both accounts X and Y.
T2 : Doubles the balance of account X and Y.
T1 T2
Read (X) Read (X)
X ← X + 1000 X←X*2
Write (X) Write (X)
Read (Y) Read (Y)
Y ← Y + 1000 Y←Y*2
Write (Y) Write (Y)
• Now, if we perform Serial schedule of the
above transaction, it can be represented as
below,
• Serial Schedule 1
Now, if we perform Serial schedule of the above transaction, it can
be represented as below, Serial Schedule 1

In the above Serial Schedule 1, first we execute Transaction T2 and


then execute T1. This can be represented as T1 → T2.
Serial Schedule 2

In the above Serial Schedule 2, first we execute T1 and then T2


transaction, represented by T2 → T1.
• This will execute the Serial Schedule in following way:

Step 1: Account X initially has Rs. 1000 and Y has Rs.


1000. Transaction T1 will update X as Rs. 2000 and Y as
Rs. 2000.

Step 2: T2 will read updated values of X and Y. T2 will


update value of X as Rs. 4000 and Y as Rs. 4000.

The consistency constraint X + Y should remain


unchanged. So at the end of T2, X + Y i.e 4000 + 4000 =
8000 remains unchanged.

Hence, we see that the execution of this schedule keeps


database in consistent state.
What is Concurrency control?
• Concurrency control manages the transactions simultaneously
without letting them interfere with each other.
• The main objective of concurrency control is to allow many users
perform different operations at the same time.
• Using more than one transaction concurrently improves the
performance of system.
• If we are not able to perform the operations concurrently, then
there can be serious problems such as loss of data integrity and
consistency.
• Concurrency control increases the throughput (No. of transactions
per second) because of handling multiple transactions
simultaneously.
• It reduces waiting time of transaction.
Example: Consider two transactions T1 and T2.
T1 : Deposits Rs. 1000 to both accounts X and Y.
T2 : Doubles the balance of accounts X and Y.
T1 T2
Read (X) Read (X)
X ← X + 1000 X←X*2
Write (X) Write (X)
Read (Y) Read (Y)
Y ← Y + 1000 Y←Y*2
Write (Y) Write (Y)
The above two transactions
can be executed
concurrently as below
Schedule C1
• The above concurrent schedule executes in the following manner:

Step 1: Part of transaction (T1) is executed, which updates X to Rs.


2000.

Step 2: The processor switches to transaction (T2). T2 is executed


and updates X to Rs. 4000. Then the processor switches to
transaction (T1) and remaining part of T2 which updates Y to Rs.
2000 is executed.

Step 3: At the end of remaining part of T2 which reads Y as Rs. 2000,


updates it to Rs. 4000 by multiplying value of Y.

This concurrent schedule maintains the consistency of database as,


X + Y = 4000 + 4000 = 8000, remains unchanged.

Therefore, the above schedule can be converted to equivalent


serial schedule and hence it is a consistent schedule.
What is serializability?
• Serializability is a concurrency scheme where the
concurrent transaction is equivalent to one that
executes the transactions serially.
• A schedule is a list of transactions.
• Serial schedule defines each transaction is
executed consecutively without any interference
from other transactions.
• Non-serial schedule defines the operations from
a group of concurrent transactions that are
interleaved.
What is serializability?
• In non-serial schedule, if the schedule is not
proper, then the problems can arise like multiple
update, uncommitted dependency and incorrect
analysis.
• The main objective of serializability is to find non-
serial schedules that allow transactions to
execute concurrently without interference and
produce a database state that could be produced
by a serial execution.
1. Conflict Serializability

• Conflict serializability defines two instructions of


two different transactions accessing the same
data item to perform a read/write operation.
• It deals with detecting the instructions that are
conflicting in any way and specifying the order
in which the instructions should execute in case
there is any conflict.
• A conflict serializability arises when one of the
instruction is a write operation.
The following rules are important in Conflict Serializability,

• If two transactions are both read operation, then they


are not in conflict.

2. If one transaction wants to perform a read operation


and other transaction wants to perform a write operation,
then they are in conflict and cannot be swapped.

3. If both the transactions are for write operation, then


they are in conflict, but can be allowed to take place in
any order, because the transactions do not read the value
updated by each other.
2. View Serializability

• View serializability is the another type of


serializability.
• It can be derived by creating another schedule
out of an existing schedule and involves the
same set of transactions.
• Example : Let us assume two transactions T1 and T2 that are being
serialized to create two different schedules SH1 and SH2, where T1 and T2
want to access the same data item. Now there can be three scenarios

1. If in SH1, T1 reads the initial value of data item, then in SH2 , T1 should
read the initial value of that same data item.

2. If in SH1, T1 writes a value in the data item which is read by T2, then in
SH2, T1 should write the value in the data item before T2 reads it.

3. If in SH1, T1 performs the final write operation on that data item, then
in SH2, T1 should perform the final write operation on that data item.

If a concurrent schedule is view equivalent to a serial schedule of same


transaction then it is said to be View serializable.
Concurrency control can be divided into two protocols

• 1. Lock-Based Protocol
2. Timestamp Based Protocol
1. Lock-Based Protocol

• Lock is a mechanism which is important in a concurrent control.


• It controls concurrent access to a data item.
• It assures that one process should not retrieve or update a record which
another process is updating.

For example, in traffic, there are signals which indicate stop and go. When
one signal is allowed to pass at a time, then other signals are locked.
Similarly, in database transaction only one transaction is performed at a
time and other transactions are locked.

If the locking is not done properly, then it will display the inconsistent and
corrupt data.
• It manages the order between the conflicting pairs among transactions at
the time of execution.
• There are two lock modes,

1. Shared Lock
2. Exclusive Lock
• 1. Shared Lock(S): Shared lock is placed when we are reading the data,
multiple shared locks can be placed on the data but when a shared lock is
placed no exclusive lock can be placed.
• For example, when two transactions are reading Steve’s account balance, let
them read by placing shared lock but at the same time if another transaction
wants to update the Steve’s account balance by placing Exclusive lock, do not
allow it until reading is finished.
• 2. Exclusive Lock(X): Exclusive lock is placed when we want to read and write
the data. This lock allows both the read and write operation, Once this lock is
placed on the data no other lock (shared or Exclusive) can be placed on the
data until Exclusive lock is released.
• For example, when a transaction wants to update the Steve’s account balance,
let it do by placing X lock on it but if a second transaction wants to read the
data(S lock) don’t allow it, if another transaction wants to write the data(X
lock) don’t allow that either.
So based on this we can create a table like this:
Lock Compatibility Matrix

S X

S True False

X False False

How to read this matrix?:


There are two rows, first row says that when S lock is placed, another S lock
can be acquired so it is marked true but no Exclusive locks can be acquired
so marked False.
In second row, When X lock is acquired neither S nor X lock can be acquired
so both marked false.
• There are four types of lock protocols
available −
1. Simplistic Lock Protocol

Simplistic lock-based protocols allow


transactions to obtain a lock on every object
before a 'write' operation is performed.
Transactions may unlock the data item after
completing the ‘write’ operation.
2.Pre-claiming Lock Protocol

• Pre-claiming protocols evaluate their operations and create a list of data


items on which they need locks. Before initiating an execution, the
transaction requests the system for all the locks it needs beforehand. If all
the locks are granted, the transaction executes and releases all the locks
when all its operations are over. If all the locks are not granted, the
transaction rolls back and waits until all the locks are granted.
3.Two-Phase Locking 2PL

• This locking protocol divides the execution phase of a


transaction into three parts. In the first part, when the
transaction starts executing, it seeks permission for the locks it
requires. The second part is where the transaction acquires all
the locks. As soon as the transaction releases its first lock, the
third phase starts. In this phase, the transaction cannot
demand any new locks; it only releases the acquired locks.
• Two-phase locking has two phases, one
is growing, where all the locks are being
acquired by the transaction; and the second
phase is shrinking, where the locks held by the
transaction are being released.
• To claim an exclusive (write) lock, a
transaction must first acquire a shared (read)
lock and then upgrade it to an exclusive lock.
4.Strict Two-Phase Locking

• The first phase of Strict-2PL is same as 2PL. After acquiring all


the locks in the first phase, the transaction continues to
execute normally. But in contrast to 2PL, Strict-2PL does not
release a lock after using it. Strict-2PL holds all the locks until
the commit point and releases all the locks at a time.
• Strict-2PL does not have cascading abort as 2PL does.
2. Timestamp Based Protocol

• Timestamp Based Protocol helps DBMS to identify the


transactions.
• It is a unique identifier. Each transaction is issued a
timestamp when it enters into the system.
• Timestamp protocol determines the serializability order.
• It is most commonly used concurrency control protocol.
• It uses either system time or logical counter as a
timestamp.
• It starts working as soon as a transaction is created.
Timestamp Ordering Protocol
• The TO Protocol ensures serializability among
transactions in their conflicting read and write
operations.
• The transaction of timestamp (T) is denoted as
TS(T).
• Data item (X) of read timestamp is denoted by
R–timestamp(X).
• Data item (X) of write timestamp is denoted by
W–timestamp(X).
Basic Timestamp Ordering –

• Every transaction is issued a timestamp based on when it enters the


system.
• Suppose, if an old transaction Ti has timestamp TS(Ti), a new
transaction Tj is assigned timestamp TS(Tj) such that TS(Ti) < TS(Tj).
• The protocol manages concurrent execution such that the
timestamps determine the serializability order.
• The timestamp ordering protocol ensures that any conflicting read
and write operations are executed in timestamp order.
• Whenever some Transaction T tries to issue a R_item(X) or a
W_item(X), the Basic TO algorithm compares the timestamp
of T with R_TS(X) & W_TS(X) to ensure that the Timestamp order is
not violated. This describe the Basic TO protocol in following two
cases.
1 Whenever a Transaction T issues a W_item(X) operation, check
the following conditions:

– If R_TS(X) > TS(T) or if W_TS(X) > TS(T), then abort and rollback T and
reject the operation. else,
– Execute W_item(X) operation of T and set W_TS(X) to TS(T).
2 Whenever a Transaction T issues a R_item(X) operation, check
the following conditions:

– If W_TS(X) > TS(T), then abort and reject T and reject the operation,
else
– If W_TS(X) <= TS(T), then execute the R_item(X) operation of T and set
R_TS(X) to the larger of TS(T) and current R_TS(X).
• One drawback of Basic TO protocol is that
it Cascading Rollback is still possible. Suppose
we have a Transaction T1 and T2 has used a
value written by T1. If T1 is aborted and
resubmitted to the system then, T2 must also
be aborted and rolled back. So the problem of
Cascading aborts still prevails.
Strict Timestamp Ordering –

• A variation of Basic TO is called Strict


TO ensures that the schedules are both Strict
and Conflict Serializable.
• In this variation, a Transaction T that issues a
R_item(X) or W_item(X) such that TS(T) >
W_TS(X) has its read or write operation
delayed until the Transaction T‘ that wrote the
values of X has committed or aborted.
• 3. Thomas's Write Rule
It does not enforce conflict serializability. It rejects some write operations,
by modifying the checks for the write_item(X) operation as follows,

1. If Read_TS(X) > TS(T) (read timestamp is greater than timestamp


transaction), then abort and rollback transaction T and reject the
operation.

2. If Write_TS(X) > TS(T) (write timestamp is greater than timestamp


transaction), then do not execute the write operation but continue
processing. Because some transaction with a timestamp is greater than
TS(T) and after T in the timestamp has already written the value of X.

3. If neither the condition transaction 1 nor the condition in transaction 2


occurs, then execute the Write_item(X) operation of transaction T and set
Write_TS(X) to TS(T).
What is deadlock?

• A deadlock is a condition that occurs when two or more


different database tasks are waiting for each other and none of
the task is willing to give up the resources that other task
needs.
• It is an unwanted situation that may result when two or more
transactions are each waiting for locks held by the other to be
released.
• In deadlock situation, no task ever gets finished and is in
waiting state forever.
• Deadlocks are not good for the system.
In the above diagram,
Process P1 holds Resource R2 and waits for resource R1, while Process P2
holds resource R1 and waits for Resource R2. So, the above process is in
deadlock state.
There is the only way to break a deadlock, is to abort one or more
transactions. Once, a transaction is aborted and rolled back, all the locks
held by that transaction are released and can continue their execution. So,
the DBMS should automatically restart the aborted transactions.
Deadlock Conditions

• Following are the deadlock conditions,


1. Mutual Exclusion
2. Hold and Wait
3. No Preemption
4. Circular Wait

A deadlock may occur, if all the above


conditions hold true.
• In Mutual exclusion states that at least one resource cannot be used by
more than one process at a time. The resources cannot be shared
between processes.

Hold and Wait states that a process is holding a resource, requesting for
additional resources which are being held by other processes in the
system.

No Preemption states that a resource cannot be forcibly taken from a


process. Only a process can release a resource that is being held by it.

Circular Wait states that one process is waiting for a resource which is
being held by second process and the second process is waiting for the
third process and so on and the last process is waiting for the first process.
It makes a circular chain of waiting.
Deadlock Prevention
• Deadlock Prevention ensures that the system never enters a deadlock state.

Following are the requirements to free the deadlock:

1. No Mutual Exclusion : No Mutual Exclusion means removing all the resources


that are sharable.

2. No Hold and Wait : Removing hold and wait condition can be done if a
process acquires all the resources that are needed before starting out.

3. Allow Pre emption : Allowing pre emption is as good as removing mutual


exclusion. The only need is to restore the state of the resource for the pre
empted process rather than letting it in at the same time as the pre emptor.

4. Removing Circular Wait : The circular wait can be removed only if the
resources are maintained in a hierarchy and process can hold the resources in
increasing the order of precedence.
• Wait-Die Scheme
• In this scheme, if a transaction requests to lock a resource (data
item), which is already held with a conflicting lock by another
transaction, then one of the two possibilities may occur −
• If TS(Ti) < TS(Tj) − that is Ti, which is requesting a conflicting lock,
is older than Tj − then Ti is allowed to wait until the data-item is
available.
• If TS(Ti) > TS(Tj) − that is Ti is younger than Tj − then Ti dies. Ti is
restarted later with a random delay but with the same
timestamp.
• This scheme allows the older transaction to wait but kills the
younger one.
Deadlock Avoidance

• Deadlock Avoidance helps in avoiding the


rolling back conflicting transactions.
• It is not good approach to abort a transaction
when a deadlock occurs.
• Rather deadlock avoidance should be used to
detect any deadlock situation in advance.
Causes of Database Failures

• A database includes a huge amount of data and transaction.


• If the system crashes or failure occurs, then it is very difficult
to recover the database.

There are some common causes of failures such as,
1. System Crash
2. Transaction Failure
3. Network Failure
4. Disk Failure
5. Media Failure
• Each transaction has ACID property. If we fail to maintain the
ACID properties, it is the failure of the database system.
1. System Crash
• System crash occurs when there is a hardware
or software failure or external factors like a
power failure.
• The data in the secondary memory is not
affected when system crashes because the
database has lots of integrity. Checkpoint
prevents the loss of data from secondary
memory.
2. Transaction Failure
• The transaction failure is affected on only few
tables or processes because of logical errors in
the code.
• This failure occurs when there are system
errors like deadlock or unavailability of system
resources to execute the transaction.
3. Network Failure
• A network failure occurs when a client –
server configuration or distributed database
system are connected by communication
networks.
4. Disk Failure
• Disk Failure occurs when there are issues with
hard disks like formation of bad sectors, disk
head crash, unavailability of disk etc.
5. Media Failure
• Media failure is the most dangerous failure
because, it takes more time to recover than any
other kind of failures.
• A disk controller or disk head crash is a typical
example of media failure.
• Natural disasters like floods, earthquakes,
power failures, etc. damage the data.
What is recovery?

• Recovery is the process of restoring a database to the correct state


in the event of a failure.
• It ensures that the database is reliable and remains in consistent
state in case of a failure.

Database recovery can be classified into two parts;

1. Rolling Forward applies redo records to the corresponding data


blocks.
2. Rolling Back applies rollback segments to the datafiles. It is
stored in transaction tables.

We can recover the database using Log–Based Recovery.


Log-Based Recovery

• Logs are the sequence of records, that maintain the


records of actions performed by a transaction.
• In Log – Based Recovery, log of each transaction is
maintained in some stable storage. If any failure occurs,
it can be recovered from there to recover the database.
• The log contains the information about the transaction
being executed, values that have been modified and
transaction state.
• All these information will be stored in the order of
execution.
• Example:
Assume, a transaction to modify the address of an employee. The
following logs are written for this transaction,

Log 1: Transaction is initiated, writes 'START' log.


Log: <Tn START>

Log 2: Transaction modifies the address from 'Pune' to 'Mumbai'.


Log: <Tn Address, 'Pune', 'Mumbai'>

Log 3: Transaction is completed. The log indicates the end of the


transaction.
Log: <Tn COMMIT>
• There are two methods of creating the log files and updating the database,
1. Deferred Database Modification
2. Immediate Database Modification

1. In Deferred Database Modification, all the logs for the transaction are
created and stored into stable storage system. In the above example, three
log records are created and stored it in some storage system, the database
will be updated with those steps.

2. In Immediate Database Modification, after creating each log record, the


database is modified for each step of log entry immediately. In the above
example, the database is modified at each step of log entry that means after
first log entry, transaction will hit the database to fetch the record, then the
second log will be entered followed by updating the employee's address,
then the third log followed by committing the database changes.
Recovery with Concurrent Transaction

• When two transactions are executed in


parallel, the logs are interleaved. It would
become difficult for the recovery system to
return all logs to a previous point and then
start recovering.
• To overcome this situation 'Checkpoint' is
used.
checkpoint
• Checkpoint acts like a bookmark. During the execution of
transaction, such checkpoints are marked and transaction is
executed. The log files will be created as usual with the steps of
transactions. When it reaches the checkpoint, the transaction will
be updated into database and all the logs till that point will be
removed from file. Log files then are updated with new steps of
transaction till next checkpoint and so on. Here care should be
taken to create a checkpoint because, if any checkpoints are
created before any transaction is complete fully, and data is
updated to database, it will not meet the purpose of the log file
and checkpoint. If checkpoints are created when each transaction
is complete or where the database is at consistent state, then it
will be useful.
Suppose there are 4 concurrent transactions – T1, T2, T3 and T4. A checkpoint is added
at the middle of T1 and there is failure while executing T4. Let us see how a recovery
system recovers the database from this failure.
• It starts reading the log files from the end to start, so that it can reverse the transactions. i.e.;
it reads log files from transaction T4 to T1.
• Recovery system always maintains undo log and redo log. The log entries in the undo log will
be used to undo the transactions where as entries in the redo list will be re executed. The
transactions are put into redo list if it reads the log files with (<Tn, Start>, <Tn, Commit>) . That
means, it lists all the transactions that are fully complete into redo list to re execute after the
recovery. In above example, transactions T 2 andT3 will have (<Tn, Start>, <Tn, Commit>) in the
log file. The transaction T1 will have only <Tn, Commit> in the log file. This because, the
transaction is committed after the checkpoint is crossed. Hence all the logs with<Tn,
Start>, are already written to the database and log file is removed for those steps. Hence it
puts T1, T2 and T3 into redo list.
• The logs with only <Tn, Start> are put into undo list because they are not complete and can
lead to inconsistent state of DB. In above example T 4 will be put into undo list since this
transaction is not yet complete and failed amid.
• This is how a DBMS recovers the data in case concurrent transaction failure.
Shadow Paging
• This is the method where all the transactions are
executed in the primary memory or the shadow
copy of database. Once all the transactions
completely executed, it will be updated to the
database. Hence, if there is any failure in the
middle of transaction, it will not be reflected in the
database. Database will be updated after all the
transaction is complete.
A database pointer will be always pointing to the consistent copy of the
database, and copy of the database is used by transactions to update. Once
all the transactions are complete, the DB pointer is modified to point to
new copy of DB, and old copy is deleted. If there is any failure during the
transaction, the pointer will be still pointing to old copy of database, and
shadow database will be deleted. If the transactions are complete then the
pointer is changed to point to shadow DB, and old DB is deleted.
As we can see in above diagram, the DB pointer is always pointing to
consistent and stable database. This mechanism assumes that there will
not be any disk failure and only one transaction executing at a time so that
the shadow DB can hold the data for that transaction. It is useful if the DB is
comparatively small because shadow DB consumes same memory space as
the actual DB. Hence it is not efficient for huge DBs. In addition, it cannot
handle concurrent execution of transactions. It is suitable for one
transaction at a time.
ARIES
ARIES
• ARIES (Algorithm for Recovery and Isolation Exploiting
Semantics) recovery is based on the Write Ahead
Logging (WAL) protocol. Every update operation writes a
log record which is one of
• An undo-only log record: Only the before image is
logged. Thus, an undo operation can be done to retrieve
the old data.
• A redo-only log record: Only the after image is logged.
Thus, a redo operation can be attempted.
• An undo-redo log record. Both before image and after
images are logged.
• Every log record is assigned a unique and monotonically increasing log
sequence number (LSN). Every data page has a page LSN field that is set to the
LSN of the log record corresponding to the last update on the page. WAL
requires that the log record corresponding to an update make it to stable
storage before the data page corresponding to that update is written to disk.
For performance reasons, each log write is not immediately forced to disk. A log
tail is maintained in main memory to buffer log writes. The log tail is flushed to
disk when it gets full. A transaction cannot be declared committed until the
commit log record makes it to disk.Once in a while the recovery subsystem
writes a checkpoint record to the log. The checkpoint record contains the
transaction table (which gives the list of active transactions) and the dirty page
table (the list of data pages in the buffer pool that have not yet made it to disk).
A master log record is maintained separately, in stable storage, to store the LSN
of the latest checkpoint record that made it to disk. On restart, the recovery
subsystem reads the master log record to find the checkpoint's LSN, reads the
checkpoint record, and starts recovery from there on.
• The actual recovery process consists of three passes:
• Analysis. The recovery subsystem determines the
earliest log record from which the next pass must start.
It also scans the log forward from the checkpoint record
to construct a snapshot of what the system looked like
at the instant of the crash.
• Redo. Starting at the earliest LSN determined in pass (1)
above, the log is read forward and each update redone.
• Undo. The log is scanned backward and updates
corresponding to loser transactions are undone.
• It is clear from this description of ARIES that the following features are
required for a log manager:
• Ability to write log records. The log manager should maintain a log tail in
main memory and write log records to it. The log tail should be written to
stable storage on demand or when the log tail gets full. Implicit in this
requirement is the fact that the log tail can become full halfway through
the writing of a log record. It also means that a log record can be longer
than a page.
• Ability to wraparound. The log is typically maintained on a separate disk.
When the log reaches the end of the disk, it is wrapped around back to
the beginning.
• Ability to store and retrieve the master log record. The master log record
is stored separately in stable storage, possibly on a different duplex-disk.
• Ability to read log records given an LSN. Also, the ability to scan the log
forward from a given LSN to the end of log. Implicit in this requirement is
that the log manager should be able to detect the end of the log and
distinguish the end of the log from a valid log record's beginning.
• Ability to create a log. In actual practice, this will require setting up a
duplex-disk for the log, a duplex-disk for the master log record, and a raw
device interface to read and write the disks bypassing the Operating
System.
• Ability to maintain the log tail. This requires some sort of shared memory
because the log tail is common to all transactions accessing the database
the log corresponds to. Mutual exclusion of log writes and reads have to
be taken care of.

You might also like