Distributed Transaction Management
Distributed Transaction Management
Introduction:
Transaction:
Collection of actions that make consistent transformations of system states while
preserving system consistency.(; concurrency transparency, ; failure transparency)
Properties of Transactions:
1
Summary:
Characterization of Transactions:
Based on application areas ((non-)distributed, heterogeneous transactions), Timing (on-
line vs batch), organization of read/write actions (two-step, restricted, action model), and
Structure
Flat transactions: sequence of primitive operations between begin ransaction and end
transaction Michael Gertz 4. Distributed Transaction Management ECS 289F Database Systems,
Fall 98 37
Nested Transactions: Operations of a transaction may itself be a transaction Have the same
properties as their parents, i.e., they may themselves have other nested transactions.Introduces
concurrency control and recovery concepts within a ta
Closed Nesting:
(1) sub-tas begin after parent and _nish before them,
(2) commit of sub-ta is conditional upon commit of parent
Open Nesting:
(1) sub-ta can execute and commit independently,
(2) compensation may be necessary
Transactions provide :
(1) atomic and reliable execution in the presence of failures,
(2) correct execution in the presence of multiple user accesses, and
(3) correct management of replicas.
2
Scheduler (SC) responsible for implementation of a specific concurrency control
algorithm
Local Recovery Manager (RM) implements procedures to recover from failures
Concurrency Control: The problem of synchronizing concurrent transactions such that
the consistency of the database is maintained while, at the same time, maximum degree
of concurrency is achieved.
Anomalies: Lost update, dirty read, inconsistent analysis
Execution Schedule Concept of a schedule provides a mechanism to express and reason
about the (possible) concurrent execution of transactions.
Describe order in which the operations of a set of tas are executed
Schedule can be de_ned as partial order over these operations
Equivalent Schedules: Two schedules S and S0 are said to be equivalent I they contain
the same transactions and operations (S:_T = S0:_T ),they order all conicting operations
of non-aborting transactions in the same way.
Serializable Schedule: A schedule S is said to be serializable I it is equivalent to some
serial schedule.
Example:
S1 = fR1(A);W1(A); R1(B);W1(B);C1;R2(A); R2(B);C2; g
S2 = fR1(A);W1(A); R2(A); R2(B);C2; R1(B);W1(B);C1g
S3 = fR1(A);W1(A); R2(A); R1(B);W1(B);C1; R2(B);C2g
Pessimistic
Two-Phase Locking Protocol (2PL)
Centralized (primary site) 2PL
Primary Copy 2PL
Distributed 2PL
Timestamp Ordering (TO)
Basic TO
3
Multiversion TO
Conservative TO
Hybrid
Optimistic
Locking-based
Timestamp ordering-based
Timestamp Ordering (TO)
(1) Does not maintain serializability by mutual exclusion, but \selects" serialization order
and executes accordingly
(2) Assumes global (systemwide) monotonically increasing counter (global clock ;
problems)
(3) Transaction Ti is assigned a globally unique timestamp TS(Ti),< de_nes relationship
between older and younger transactions.
(4) Transaction manager attaches timestamp to all operations issued by transactions.
(5) Each object x has two timestamps:
RTS(X) := max{TS(T) j T has read x}
WTS(X) := max{TS(T) j T has written x}
(6) Convicting operations are resolved by timestamp order:
An operation can proceed if all conicting steps of older transactions have already
been output. Assume transaction T wants to operate on object x:
Case operation of:
read: if TS(T) < WTS(x) then reject (reschedule) read;
else begin
execute read(x);
RTS(x) := max{RTS(x); TS(T)};
end.
write: if TS(T) < max{RTS(x);WTS(x)} then reject write;
else begin
execute write(x);
WTS(x) := TS(T);
end.
end case.
Problem: Basic TO tries to execute operations as soon as possible. TO algorithm can never
cause operations to wait, but instead, restarts them. Advantage due to deadlock freedom,
disadvantage, because numerous restarts would have adverse performance implications.
4
end;
lock: if TS(T) < maxfRTS(x);WTS(x)g then reject lock
else insert lock(x) into lock queue(x);
write: lr queue := lock queue(x) [ read queue(x);
if TS(T) > minfTS(T0) j T0 2 lr queue(x)g then
insert write(x) into write queue(x);
else begin
execute write(x); delete corresponding lock from lock queue(x);
WTS(x) := TS(T);
check read queue(x) and write queue(x) for executable operations;
end;
Comments:
read: timestamp of read must be current wrt write timestamp of object (as it is for
normal TO) Whether or not read is executed immediately or deferred depends on
whether lock with smaller timestamp already exists for this object.In this case,
read is deferred (insert into read queue), otherwise read is executed lock lock
queue corresponds to lock request queue in two-phase locking protocol write
write is only deferred if locks or reads with smaller timestamp for that object exist
and which need to be executed after executing write(x), exclusive lock is released.
In other words, if the scheduler has already processed Rj(xr) such that TS(Ti) < TS(xr) < TS(Tj),
then Wi(x) is rejected.
Algorithm trades space for time.
Optimistic Concurrency Control Algorithms
Pessimistic execution: validate ! read ! compute ! write (commit)
Optimistic execution: read ! compute ! validate ! write (commit)
Underlying transaction execution model: divide into sub transactions each of which executes
at a site: transaction Tij executes at site j Transactions run independently at each site until they
reach the end of their read phases.
All sub transactions are assigned a timestamp at the end of their read phase.
Validation test performed during validation phase. If one fails, all rejected.
5
Optimistic CC Validation Test (local validation of Tij):
1. If all transactions Tk where TS(Tk) < TS(Tij) have completed their write phase before Tij has
started its read phase, then validation succeeds (tas execute in serial order)
2. If there is any transaction Tk such that TS(Tk) < TS(Tij) which completes its write phase
while Tij is in its read phase, the validation succeeds if Tij does not read objects written by Tk.
3. If there is any transaction Tk such that TS(Tk) < TS(Tij) which completes its read phase
before Tij completes its read phase, the validation succeeds if they don't access any common data
object.
Once a ta is locally validated to ensure that local db consistency is maintained, it also needs to be
globally validated.no know optimistic methods for doing this major problem with OCCA: high
storage cost; the read and write sets of terminated transactions that were in progress when Tij
arrived at site j need to be stored in order to validate Tij. Deadlock Management A transaction is
deadlocked if it is blocked and will remain blocked until there is an intervention
Ignore: Let the application programmer deal with it, or restart system
Prevention: Guaranteeing that deadlocks never occur in first place. Check transaction when it is
initiated. Requires no run-time support.
Avoidance: Detecting potential deadlocks in advance and taking action to ensure that deadlock
will not occur. Requires run-time support.
Detection and Recovery: Allowing deadlocks to form and then find and \breaking" them.
Requires run-time support.
Deadlock Prevention
All resources which may be needed by transaction must be pre declared. System must
guarantee that none of the resources will be needed by an ongoing transaction Resources must
only be reserved, but not allocated Unsuitable in database environment (#programming) Suitable
for systems that have no provisions for undoing processes Evaluation
6
Deadlock Avoidance
Transactions are not required to request resources a priori.
Transactions are allowed to proceed unless a requested resource is unavailable.
In case of conflict, transactions may be allowed to wait for a _xed time interval (timeout).
Order either the data item or the sites and always request locks in that order.
More attractive than prevention in database environment.
Deadlock Avoidance Wait-Die & Wound-Wait Algorithms Deadlock Detection and
Resolution
Transactions are allowed to wait freely
Wait-for graphs and cycles
(1) Centralized
(2) Distributed
(3) Hierarchical
The local WFGs are formed at each site and passed on to the other sites. Each
local WFG is modi_ed as follows:
1. Since each site receives the potential deadlock cycles from other sites, these
edges are added to the local WFG.
2. The edges in the local WFG which show that local transactions are waiting for
transactions at other sites are joined with edges in the local WFGs which show
that remote tas are waiting for local ones.
7
Reliability :
Fundamental Defnitions:
Reliability:
A measure of success with which a system conforms to some authoritative
specification of its behavior.
Probability that the system has not experienced any failures within a given period
of time.
Failure:
The deviation of a system from the behavior that is described in its specification.
Erroneous State:
Internal state of the system such that there exist circumstances in which further
processing, by normal algorithms of the system, will lead to a failure which is not
attributed to a subsequent fault.
Fault: An error in the internal states of the components of a system or in the design of a
system.
Types of faults:
Soft faults: Transient or intermittent; account for more than 90% of all failures; soft
failures. Fault tolerance measures for reliability, availability, mean time between failures,
mean time to repair,
8
Types of Failures in Distributed DBMS
Transaction failures
Logging: The log contains information used by the recovery process to restore the consistency of
a system. This information may include
WAL protocol:
1. Before a stable database is updated, the undo portion of the log should be written to the
stable log (bffer disk).
2. When a transaction commits, the redo portion of the log must be written to stable log
prior to the updating of the stable database.
Commit protocols:
Issue:
How to ensure atomicity and durability?
9
Termination protocols:
If a failure occurs, how can the remaining sites deal with it?
Non-blocking:
The occurrence of failure should not force the sites to wait until the failure is repaired to
terminate the transaction.
Recovery protocols:
When a failure occurs, how do the sites where the failure occurred deal with it?
Independent:
A failed site can determine the outcome of a transaction without having to obtain remote
information.
10
Two-Phase Commit (2PC):
All sites (local transaction managers) participating in global transaction decide whether to
globally commit/abort transaction.
All local decision are collected at one site (coordinator). This site also makes the _nal
decision with respect to global commit/abort.
Execution of the protocol is initiated by coordinator after the last step of the global
transaction has been reached.
Note that when the protocol is initiated, the transaction still may be executing at some
local sites.
Phase 1: Coordinator gets the participants ready to write the results into the database
Phase 2: Everybody writes the results into the database Coordinator: The process at the site
where the transaction originates and which controls the execution
Participants: Processes at the other other sites that participate in executing the transaction
1. Coordinator aborts a transaction if and only if at least one participant votes to abort it.
2. Coordinator commits a transaction if and only if all participants vote to commit it.
11
2PC Protocol Actions
12
Site failures-2PC Termination :
C: Timeout in initial ! no problem
C: Timeout in wait ! cannot unilaterally commit, can unilaterally abort
C: Timeout in abort or commit ! stay blocked and wait for the acks
P: Timeout in initial ! coordinator must have failed; abort
P: Timeout in ready ! stay blocked
Site failures-2PC Recovery :
C: Failure in initial or wait ! start/restart commit upon recovery
C: Failure in abort or commit ! nothing special if the acks have been
received; otherwise the termination protocol is involved
P: Failure in initial ! unilaterally abort upon recovery
P: Failure in ready ! The coordinator has been informed about the local decision; treat as
timeout in ready state and invoke termination protocol
P: Failure in commit or abort ! nothing special needs to be done
2PC Recovery-Additional Cases
Arise due to non-atomicity of log and message send actions
Coordinator fails after writing begin commit log and before sending prepare command
Treat is as failure in wait state; send prepare command
Participant site fails after writing ready record in log but before vote commit is sent
Treat it as failure in ready state; alternatively can send vote commit upon recovery
Participant site fails after writing abort in log but before vote abort is sent
Coordinator site fails after logging its _nal decision but before sending decision to
participants
Participant site fails after writing abort or commit in log but before ack is sent Problems
with 2PC
13
Conclusion :
Databases are common transactional resources and, often, transactions span a couple of
such databases. In this case, a distributed transaction can be seen as a database transaction that
must be synchronized (or provide ACID properties) among multiple articipating databases which
are distributed among different physical locations. The isolation property (the I of ACID) poses a
special challenge for multi database transactions, since the (global) serializability property could
be violated, even if each database provides it (see also global serializability). In practice most
commercial database systems use strong strict two phase locking (SS2PL) for concurrency
control, which ensures global serializability, if all the participating databases employ it. (see
also commitment ordering for multidatabases.)
There are also long-lived distributed transactions, for example a transaction to book a
trip, which consists of booking a flight, a rental car and a hotel. Since booking the flight might
take up to a day to get a confirmation, two-phase commit is not applicable here, it will lock the
resources for this long. In this case more sophisticated techniques that involve multiple undo
levels are used. The way you can undo the hotel booking by calling a desk and cancelling the
reservation, a system can be designed to undo certain operations (unless they are irreversibly
finished).
14
Web Reference :
http://en.wikipedia.org/wiki/Distributed_transaction
http://findarticles.com/p/articles/mi_qa4041/is_200201/ai_n9068422/
http://msdn.microsoft.com/en-us/library/ms191440.aspx
15