Chapter 4
Chapter 4
4
NON-LOCKING
SCHEDULERS
4.1 lNTRODUCTlON
In this chapter we will examine two scheduling techniques that do not use
locks, timestamp ordering (TO) and serialization graph testing (SGT). As with
2PL, we’ll see aggressive and conservative as well as centralized and distributed
versions of both techniques.
We will also look at a very aggressive variety of schedulers, called certi-
fiers. A certifier never delays operations submitted by TMs. It always outputs
them right away When a transaction is ready to commit, the certifier runs a
“certification test” to determine whether the transaction was involved in a non-
SR execution. If the transaction fails this test, then the certifier aborts the
transaction. Otherwise, it allows the transaction to commit. We will describe
certifiers based on all three techniques: 2PL, TO, and SGT.
In the final section, we will show how to combine scheduling techniques
into composite schedulers. For example, a composite scheduler could use 2PL
for part of its synchronization activity and TO for another part. Using the
composition rules, you can use the basic techniques we have discussed to
construct hundreds of different types of schedulers, all of which produce SR
executions.
Unlike 2PL, the techniques described in this chapter are not currently used
in many commercial products. Moreover, their performance relative to 2PL is
not well understood. Therefore, the material in this chapter is presented
113
114 CHAPTER 4 I NON-LOCKING SCHEDULERS
Introduction
By enforcing the TO rule, we are ensuring that every pair of conflicting opera-
tions is executed in timestamp order. Thus, a TO execution has the same effect
as a serial execution in which the transactions appear in timestamp order. In
the rest of this section we will present ways of enforcing the TO rule.
Bask TO
Basic TO is a simple and aggressive implementation of the TO rule. It accepts
operations from the TM and immediately outputs them to the DM in first-
come-first-served order. To ensure that this order does not violate the TO rule,
the scheduIer rejects operations that it receives too late. An operation p,[x] is
too late if it arrives after the scheduler has already output some conflicting
operation q,[x] with ts( T,) > ts( TJ. If pi[x] is too late, then it cannot be sched-
4.2 TIMESTAMP ORDERING (TO) 116
uled without violating the TO rule. Since the scheduler has already output
qj[x], it can only solve the problem by rejecting pi[x].
If e;[x] is rejected, then T, must abort. When T, is resubmitted, it must be
assigned a larger timestamp - large enough that its operations are less likely
to be rejected during its second execution. Notice the difference with
timestamp-based deadlock prevention, where an aborted transaction is resub-
mitted with the same timestamp to avoid cyclic restart. Here it is resubmitted
with a new and larger timestamp to avoid certain rejection.
To determine if an operation has arrived too late, the Basic TO scheduler
maintains for every data item x the maximum timestamps of Reads and Writes
on x that it has sent to the DM, denoted max-r-scheduled[x] and max-w-
scheduled[x] (respectively). When the scheduler receives e;[x], it compares
ts(Ti) to max-q-scheduIed[x] for all operation types q that conflict with p. If
ts( T,) < max-q-scheduled[x], then the scheduler rejects ei[x], since it has
already scheduled a conflicting operation with a larger timestamp. Otherwise,
it schedules ei[x] and, if ts( T2) > max-p-scheduled[x], it updates max-p-
scheduled[x] to ts( Ti).
The scheduler must handshake with the DM to guarantee that operations
are processed by the DM in the order that the scheduler sent them. Even if the
scheduler decides that e;[x] can be scheduled, it must not send it to the DM
until every conflicting qj[x] that it previously sent has been acknowledged by
the DM. Notice that 2PL automatically takes care of this problem. 2PL does
not schedule an operation until all conflicting operations previously scheduled
have released their locks, which does not happen until after the DM acknow-
ledges those operations.
To enforce this handshake, the Basic TO scheduler also maintains, for each
data item x, the number of Reads and Writes that have been sent to, but not yet
acknowledged by, the DM. These are denoted r-in-transit[x] and w-in-
transit[x] (respectively). For each data item x the scheduler also maintains a
queue, queue[x], of operations that cay2be scheduled insofar as the TO rule is
concerned, but are waiting for acknowledgments from the DM to previously
sent conflicting operations. Conflicting operations are in the queue in times-
tamp order.
Let us consider a simple scenario to see how the scheduler uses these data
structures to enforce the TO Rule. For simplicity, assume that the timestamp of
each transaction (or operation) is equal to its subscript (i.e., ts( T,) = i). We use
ack(oJx]) to denote the acknowledgment that the DM sends to the scheduler
indicating that oj[x] has been processed. Suppose initially max-r-scheduled[x]
= 0, r-in-transit[x] = 0, and queue[x] is empty.
1. Y,[x] arrives and the scheduler dispatches it to the DM. It sets max-r-
scheduled[x] to 1 and r-in-transit[x] to 1.
2. w,[x] arrives. Although the TO rule says wJx] can be scheduled, since
r-in-transit[x] = 1 the scheduler must wait until it receives ack(r,[x]). It
therefore appends w,[x] to queue[x]. (w-in-transit[x] is unaffected.)
116 CHAPTER 4 / NON-LOCKING SCHEDULERS
3. r+[x] arrives and although the TO rule says r+,[x] can be scheduled, the
scheduler must wait until it receives ack(wl[x]). It therefore appends
r,[x] to queue[x] (after u;[x]). (r-in-transit[x] is unaffected.)
4. Y~[x] arrives. Just like r,[x], it must wait for w,[x]. So, the scheduler
appends it to queue[x] (after r4[x]).
5. ack(r,[x]) arrives from the DM. The scheduler decrements r-in-transit[x]
to 0. It can now dispatch w,[xJ, so it removes w2[x] from queue[x],
sends it to the DM, and sets max-w-scheduled to 2 and w-in-transitjx]
to 1. It cannot yet dispatch r,[x] and r{[x] because w-in-transit[x] > 0,
indicating that the DM has not yet acknowledged some conflicting
Write.
6. ack(w?[x]) arrives from the DM. The scheduler decrements w-in-
transit[x] to 0. Now it can send both Y~[x] and r{[x] to the DM simulta-
neously. So, it sets max-r-scheduled to 4 and r-in-transit[x] to 2, and
queue[x] becomes empty again.
The principles of operation of a Basic TO scheduler should now be clear.
When it receives an operation p,[x], it accepts it for scheduIing if ts( T1) I max-
q-scheduled[x] f or all operation types q that conflict with p. Otherwise, it
rejects ei[x] and T, must be aborted. Once p,[x] is accepted for scheduling, the
scheduler dispatches it to the DM immediately, if for all operation types q that
conflict with p, q-in-transit[x] = 0 and there are no q operations in queue[x].
Otherwise a conflicting operation ql[x] is in transit between the scheduler and
the DM, or is waiting in queue[x], and so el[x] must be delayed; it is therefore
inserted in queue[x]. Finally, when it receives ack(p,[x]), the scheduler updates
p-in-transit[x] accordingly, and removes all the operations in (the head of)
queue[x] that can now be dispatched and sends them to the DM.
Strict TO
Although the TO rule enforces serializability, it does not necessarily ensure
recoverability. For example, suppose that ts( T,) = 1 and ts(T?) = 2, and
consider the following history: w,[x] rL[x] wL[y] cl. Conflicting operations
appear in timestamp order. Thus this history could be produced by Basic TO.
Yet it is not recoverable: T? reads x from T,, T, is committed, but T, is not.
As we discussed in Chapters 1 and 2, we usually want the scheduler to
enforce an even stronger condition than recoverabiIity, namely, strictness. Here
is how Basic TO can be modified to that end.
Recall that w-in-transit[x] denotes the number of w[x] operations that the
scheduler has sent to the DM but that the DM has not yet acknowledged. Since
two conflicting operations cannot be “in transit” at any time and Writes on the
same data item conflict, w-in-transit[x] at any time is either 0 or 1.
The Strict TO scheduler works like Basic TO in every respect, except that
it does nof set w-in-transitlx] to 0 when it receives the DM’s acknowledgment
4.2 TIMESTAMP ORDERING (TO) 117
Timestamp Management
Suppose we store timestamps in a table, where each entry is of the form [x,
max-r-scheduled[x], max-w-scheduled[x]]. This table could consume a lot of
space. Indeed, if data items are small, this timestamp information (and o-in-
transit) could occupy as much space as the database itself. This is a potentially
serious problem.
We can solve the problem by exploiting the following observation.
Suppose TMs use relatively accurate real time clocks to generate timestamps,
and suppose transactions execute for relatively short periods of time. Then at
any given time t, the scheduler can be pretty sure it won’t receive any more
operations with timestamps smaller than t - 6, where 6 is large compared to
transaction execution time. The only reason the scheduler needs the time-
stamps in max-r-scheduled[x] and max-w-scheduled[x], say ts, and ts,,, is to
reject Reads and Writes with even smaller timestamps than ts, and ts,. So, once
ts, and ts, are smaller than t - 6, ts, and ts, are of little value to the scheduler,
because it is unlikely to receive any operation with a timestamp smaller than ts,
or ts,.
Using this observation, we can periodically purge from the timestamp
table entries that have uselessly small timestamps. Each Purge operation
118 CHAPTER 4 / NON-LOCKING SCHEDULERS
Distributed TO schedulers
TO schedulers are especially easy to distribute. Each site can have its own TO
scheduler which schedules operations that access the data srored at that site.
The decision to schedule, delay, or reject an operation o,[x] depends only on
other operations accessing x. Each scheduler can maintain all the information
about the operations accessing the data items it manages. It can therefore go
about its decisions independently of the other schedulers. Unlike distributed
2PL, where coordination among distributed schedulers is usually needed to
handle distributed deadlocks, distributed TO requires no inter-scheduler
communication whatsoever.
Conservative TO
If a Basic TO scheduier receives operations in an order widely different from
their timestamp order, then it may reject too many operations, thereby causing
too many transactions to abort. This is due to its aggressive nature. We can
remedy this problem by designing more conservative scheduIers based on the
TO rule.
One approach is to require the scheduler to artificially delay each opera-
tion it receives for some period of time. To see why this helps avoid rejections,
consider some operation oi[x]. The danger in scheduling o,[x] right away is
that the scheduler may later receive a conflicting operation with a smaller
4.2 TIMESTAMP ORDERING (TO) 11s
timestamp, which it will therefore have to reject. However, if it holds oi[x] for
a while before scheduling it, then there is a better chance that any conflicting
operations with smaller timestamps will arrive in time to be scheduled. The
longer the scheduler holds each operation before scheduling it, the fewer rejec-
tions it will be forced to make. Like other conservative schedulers, conserva-
tive TO delays operations to avoid rejections.
Of course, delaying operations for too long also has its problems, since the
delays slow down the processing of transactions. When designing a conserva-
tive TO scheduler, one has to strike a balance by adding enough delay to avoid
too many rejections without slowing down transactions too much.
An “ultimate conservative” TO scheduler neuer rejects operations and thus
never causes transactions to abort. Such a scheduler can be built if we make
certain assumptions about the system. As with Conservative 2PL, one such
assumption is that transactions predeclare their readset and writeset and the
TM conveys this information to the scheduler. We leave the construction of a
conservative TO scheduler based on this assumption as an exercise (Exercise
4.11).
In this section we’ll concentrate on an ultimate conservative TO scheduler
based on a different assumption, namely, that each TM submits its operations
to each DM in timestamp order. One way to satisfy this assumption is to adopt
the following architecture. At any given time, each TM supervises exactly one
transaction (e.g., there is one TM associated with each terminal from which
users can initiate transactions). Each TM’s timestamp generator returns
increasing timestamps every time it’s called. Thus, each TM runs transactions
serially, and each transaction gets a larger timestamp than previous ones super-
vised by that TM. Of course, since many TMs may be submitting operations
to the scheduler in parallel, the scheduler does not necessarily receive opera-
tions serially,
Under these assumptions we can build an ultimate conservative TO sched-
uler as follows. The scheduler maintains a queue, called unsched-queue,
containing operations it has received from the TMs but has not yet scheduled.
The operations in unsched-queue are kept in timestamp order, the operations
with the smallest timestamp being at the head of the queue. Operations with
the same timestamp are placed according to the order received, the earlier ones
being closer to the head.
When the scheduler receives pi[x] from a TM, it inserts ei[x] at the appro-
priate place in unsched-queue to maintain the order properties just given. The
scheduler then checks if the operation at the head of unsched-queue is ready to
be dispatched to the DM. The head of unsched-queue, say qj[x], is said to be
ready if
1. unsched-queue contains at least one operation from every TM, and
2. all operations conflicting with qj[x] previously sent to the DM have been
acknowledged by the DM.
120 CHAPTER 4 / NON-LOCKING SCHEDULERS
is usually maintained by an SGT scheduler, in two ways. First, the SGT sched-
uler’s SG may not include nodes corresponding to all committed transactions,
especially those that committed long ago. Second, it usually includes nodes for
all active transactions, which by definition are not yet committed. Due to these
differences, we use a different term, Stored SG (SSG), to denote the SG main-
tained by an SGT scheduler.
Basic SGT
When an SGT scheduler receives an operation e,[x] from the TM, it first adds
a node for T, in its SSG, if one doesn’t already exist. Then it adds an edge from
T1 to T, for every previously scheduled operation qj[x] that conflicts with pi[x].
We have two cases:
1. The resulting SSG contains a cycle. This means that if p,[x] were to be
scheduled now (or at any point in the future), the resulting execution
would be non-SR. Thus the scheduler rejects ~~[x]. It sends a, to the DM
and, when a, is acknowledged, it deletes from the SSG T, and all edges
incident with T,. Deleting T, makes the SSG acyclic again, since all
cycles that existed involved T,. Since the SSG is acyclic, the execution
produced by the scheduler now - with T, aborted - is SR.
2. The resulting SSG is still acyclic. In this case, the scheduler can accept
,bi[x]. It can schedule p,[x] immediately, if all conflicting operations
previously scheduled have been acknowledged by the DM; otherwise, it
must delay p,[x] until the DM acknowledges all conflicting operations.
This handshake can be implemented as in Basic TO. Namely, for each
data item x the scheduler maintains queue[x] where delayed operations
are inserted in first-in-first-out order, and two counts, r-in-transit[x] and
w-in-transit[x], for keeping track of unacknowle-dged Reads and Writes
for each x sent to the Dh4.
To determine if an operation conflicts with a previously scheduled one, the
scheduler can maintain, for each transaction T, that has a node in SSG, the sets
of data items for which Reads and Writes have been scheduled. These sets will
be denoted r-scheduled[T,] and w-scheduled[T,J, respectively. Then, p;[x]
conflicts with a previously scheduled operation of transaction T, iff x E q-
scheduled[T,], for 4 conflicting with e.
A significant practical consideration is when the scheduler may discard the
information it has collected abour a transaction. To detect conflicts, we have to
maintain the readset and writeset of every transaction, which could consume a
lot of space. It is therefore important to discard this information as soon as
possible.
One may naively assume that the scheduler can delete information about a
transaction as soon as it commits. Unfortunately, this is not so. For exampIe,
consider the (partial) history
4.3 SERlALlZATlON GRAPH TESTING (SGT) 123
Conservative SGT
A conservative SGT scheduler never rejects operations but may delay them. As
with 2PL and TO, we can achieve this if each transaction T; predeclares its
readset and writeset, denoted r-set[T;] and w-set[TJ, by attaching them to its
Start operation.
When the scheduler receives T[s Start, it saves r-set[T;] and w-set[?;]. It
then creates a node for T; in the SSG and adds edges Tj + T; for every Tj in the
SSG such that P-set[T;] f7 q-set[Tj] # { }’ f or all pairs of conflicting operation
types e and q.
For each data item x the scheduler maintains the usual queue[x] of delayed
operations that access x. Conflicting operations in queue[x], say ei[x] and
ql[x], are kept in an order consistent with SSG edges. That is, if T, --t T) is in
the SSG, then qJx] is closer to the head of queue[x] than pJx]; thus, qJx] will
be dequeued before eJx]. The order of nonconflicting operations in queue[x]
(i.e., Reads) is immaterial; for specificity, let’s say they are kept in order of
arrival. When the scheduler receives operation o,[x] from the TM, it inserts
o,[x] in queue[x] in accordance with the ordering just specified.
The scheduler may send the operation at the head of some queue to the
DM iff the operation is “ready.” An operation pi[x] is read), if
1. all operations that conflict with pJx] and were previously sent to the
DM have been acknowledged; and
2. for every Ti that directly precedes T, in the SSG (i.e., Ti -+ Tr is in the
SSG) and for every operation type q that conflicts with p, either x g q-
set[T,] or qJx] has already been received by the scheduler (i.e., x E q-
scheduled[ T,]).
Condition (1) amounts to the usual handshake that makes sure the DM
processes conflicting operations in the order they are scheduled. Condition (2)
is what makes this scheduler avoid aborts. The rationale for it is this. Suppose
T1 precedes T, in the SSG. If the SSG is acyclic, then the execution is equivalent
to a serial one in which TI executes before T,. Thus if e,[x] and q][x] conflict,
qJx] must be scheduled before p,[x]. So if pJx] is received before qJx], it must
be delayed. Otherwise, when q,[x] is eventually received it will have to be
rejected, as its acceptance would create a cycle involving T, and T, in the SSG.
Note that to evaluate condition (2), the Conservative SGT scheduler must,
in addition to o-set[ T,], maintain the sets o-scheduled[T,,], as discussed in
Basic SGT.
One final remark about condition (2). You may wonder why we have
limited it only to transactions that directly precede T,. The reason is that the
condition is necessarily satisfied by transactions T, that indirectly precede T,;
that is, the shortest path from T, to T, has more than one edge. Then T, and T,
do not issue conflicting operations. In particular, x E p-set[T,] implies x @
q-set[T,] for a11conflicting operation types ~7,q.
Every time it receives pi[x] from the TM or an acknowledgment of some
q,Jx] from the DM, the scheduler checks if the head of queue[x] is ready. If so,
it dequeues the operation and sends it to the DM. The scheduler then repeats
the same process with the new head of queue[x] until the queue is empty or its
head is not ready. The policy for discarding information about terminated
transactions is the same as for Basic SGT.
Recoverability Considerations
Basic and Conservative SGT produce SR histories, but not necessarily recover-
able - much less cascadeless or strict - ones.
4.3 SERIALIZATION GRAPH TESTING (SGT) 125
Both types of SGT schedulers can be modified to produce only strict (and
SR) histories by using the same technique as Strict TO. The scheduler sets w-
in-transit[x] to 1 when it sends w,[x] to the DM. But rather than decrementing
it back to zero when the DM acknowledges wi[x], the scheduler does so when
it receives an acknowledgment that the DM processed ai or ci. Recall that w-in-
transit[x] is used to delay sending r,[x] and wj[x] operations until a previously
sent wi[x] is processed. By postponing the setting of w-in-transit[x] to zero, the
scheduler delays rj[x] and WJX] until the transaction that last wrote into x has
terminated, thereby ensuring the execution is strict.
It’s also easy to modify Basic or Conservative SGT to enforce only the
weaker condition of avoiding cascading abort. For this, it is only necessary to
make sure that before scheduling r;[x], the transaction from which Tj will read
x has committed. To do this, every time the scheduler receives an acknowledg-
ment of a Commit operation, cj, it marks node T1 in the SSG as “committed.”
Now suppose the scheduler receives ri[x] from the TM and accepts it. Let Tj be
a transaction that satisfies:
1. x E w-scheduled[Tj]; and
2. for any Tk # Tj such that x E w-scheduled[Th], Tk + Tj is in the SSG.
At most one Tj can satisfy both of these conditions. (It is possible that no trans-
action does, for instance, if all transactions that have ever written into x have
been deleted from the SSG by now.) The scheduler can send ri[x] to the DM
only if Tj is marked “committed” or no such Tj exists.
The same idea can be used to enforce only the weaker condition of recov-
erability. The difference is that instead of delaying individual Reads, the sched-
uler now delays Tis Commit until all transactions Tj from which Ti has read
either are marked “committed” or have been deleted from the SSG. Also, since
in this case cascading aborts are possible, when the scheduler either receives
T/s Abort from the TM or causes Ti to abort to break a cycle in the SSG, it
also aborts any transaction Tj that read from T;. An SGT scheduler can detect
if Tj has read from Ti by checking if
1. Ti + Tj is in the SSG, and
2. there is some x E r-scheduled[?;] n w-scheduled[TJ such that for every
Tk where Ti + Tk and Tk -+Tj are in the SSG, x +Zw-scheduled[Tk].
Suppose that there are k sites, and x, is stored at sire i, for 1 I is k. At each
site i < k, the Iocal SSG contains the edge T, * T,,,. Now, if T, issues w,[xJ,
the local SSG at site k contains the edge Tk -+T,. Thus, globally we have a
cycle T, --+ T2 -+ . . . + Tk --t T,, yet a11local SSGs are acyclic (each consists of a
single edge).
This is essentially the same problem we had with global deadlock detec-
tion in 2PL (Section 3.11). There is an important difference, however, that
makes our new problem more severe. In global deadlock detection any trans-
actions involved in a cycle in the WFG are just waiting for each other and thus
none can proceed; in particular, none can commit. So we may take our time in
checking for global cycles, merely at the risk of delaying the discovery of a
deadlock. On the other hand, transactions that lie along an SSG cycle do not
wait for each other. Since a transaction should not commit until the scheduler
has checked that the transaction is not in an SSG cycle, global cycle detection
must take place at least at the same rate as transactions are processed. In typi-
cal applications, the cost of this is prohibitive.
*See Section A.3 of the Appendix for the definition of transitive closure of a directed graph.
128 CHAPTER 4 I NON-LOCKING SCHEDULERS
We compute SSG + before deleting the node corresponding to T,, the trans-
action that committed, to avoid losing indirect precedence information (repre-
sented as paths in the SSG). For example, suppose that there is a path from TJ
to Th in the SSG and that all such paths pass through node ?;. Then, deleting
T, will produce a graph that doesn’t represent the fact that Ti must precede Tk.
SSG’ contains an edge T? -+ Tk iff the SSG has a path from T, to Tk. Thus
SSG’ represents exactly the same precedence information as SSG. Moreover,
the on131precedence information lost by deleting T, from the SSG+ pertains to
Ti itself (in which we are no longer interested since it has terminated) and to no
other active transactions.
The scheduler as described produces SR executions. By using the tech-
niques of the previous subsection we can further restrict its output to be recov-
erable, cascadeless, or strict (see Exercise 4.19).
4.4 CERTIFIERS
Introduction
So far we have been assuming that every time it receives an operation, a sched-
uler must decide whether to accept, reject, or delay it. A different approach is
to have the scheduler immediately schedule each operation it receives. From
time to time, it checks to see what it has done. If it thinks all is well, it
continues scheduling. On the other hand, if it discovers that in its hurry to
process operations it has inappropriately scheduled conflicting operations,
then it must abort certain transactions.
When it’s about to schedule a transaction T,‘s Commit, the scheduler
checks whether the execution that includes c, is SR. If not, it rejects the
Commit, thereby forcing T, to abort. (It cannot check less often than on every
Commit, as it wouId otherwise risk committing a transaction involved in a
non-SR execution.) Such schedulers are called certifiers. The process of check-
ing whether a transaction’s Commit can be safely scheduled or must be rejected
is called certification. Certifiers are sometimes called optimistic schedulers,
because they aggressively schedule operations, hoping nothing bad, such as a
non-SR execution, will happen.
There are certifiers based on all three types of schedulers - 2PL, TO, and
SGT- with either centralized or distributed control. We will expiore all of
these possibilities in this section.
2PL Certification
When a 2PL certifier receives an operation from the TM, it notes the data item
accessed by the operation and immediately submits it to the DM. When it
receives a Commit, c,, the certifier checks if there is any operation p)[x] of Tr
that conflicts with some operation q][~] of some other active transaction, T,. If
4.4 CERTIFIERS 129
so, the certifier rejects ci and aborts Tia3 Otherwise it certifies Ti by passing c, to
the DM, thereby allowing Ti to terminate successfully.
The 2PL certifier uses several data structures: a set containing the names
of active transactions, and. two sets, r-scheduled[ Ti] and w-scheduled[ TJ, for
each active transaction Ti, which contain the data items read and written,
respectively, by Tj so far.
When the 2PL certifier receives Y,[x] (or wi[x]), it adds x to r-scheduled[ T;]
(or w-scheduled[TJ). When the scheduler receives ci, Ti has finished executing,
so r-scheduled[TJ and w-scheduled[Ti] contain Tts readset and writeset,
respectively. Thus, testing for conflicts can be done by looking at intersections
of the r-scheduled and w-scheduled sets. To process ci, the certifier checks
every other active transaction, Tj, to determine if any one of r-scheduled[ Tj] f7
w-scheduled[ Tj], w-scheduled[ Ti] n r-scheduled[ Tj], or w-scheduled[ Ti] n w-
scheduled[Tj] is nonempty. If so, it rejects ci. Otherwise, it certifies T; and
removes T, from the set of active transactions.
To prove that the 2PL certifier only produces SR executions, we will
follow the usual procedure of proving that every history it allows must have an
acyclic SG.
We called this certifier a 2PL certifier, yet there was no mention of locks
anywhere! The name is justified if one thinks of an imaginary read (or write)
lock being obtained by Ti on x when x is added to r-scheduled[TJ (or w-
scheduled[TJ). If there is ever a lock conflict between two active transactions,
3A transaction is committed only when the scheduler acknowledges to the TM the processing of
Commit, not when the TM sends the Commit to the scheduler. So, it is perfectly legitimate for
the scheduler to reject c; and abort Ti at this point.
130 CHAPTER 4 I NON-LOCKING SCHEDULERS
the first of them to attempt certification will be aborted. This is very similar to
a 2PL scheduler that never allows a conflicting operation to wait, but rather
always rejects it. In fact, the committed projection of every history produced
by a 2PL certifier could also have been produced by a 2PL scheduler.
To enforce recoverability, when a 2PL certifier aborts a transaction T,, it
must also abort any other active transaction T, such that w-scheduled[ Tl] n r-
scheduled[ T,] # ( }. Note that this may cause T,,to be aborted unnecessarily if,
for example, there are data items in w-scheduled[ir,] n r-scheduled[ JJ] but T,
actually read them before T, wrote them. However, the 2PL certifier does not
keep track of the order in which conflicting operations were processed; it can’t
distinguish, at certification time, the case just described from the case in which
T,, read some of the items in w-scheduled[ T,] n r-scheduled[ T,] izfter T, rvrote
them. For safety, then, it must abort 2;.
One can modify the 2PL certifier to enforce the stronger conditions of
cascadelessness or strictness, although this involves delaying operations and
therefore runs counter to the optimistic philosophy of certifiers (see Exercise
4.25).
To understand the performance of 2PL certification, let’s compare it to its
on-line counterpart, Basic 2PL. Both types of scheduler check for conflicts
between transactions. Thus, the overhead for checking conflicts in the two
methods is about the same. If transactions rarely conflict, then Basic 2PL
doesn’t delay many transactions and neither Basic 2PL nor 2PL certification
aborts many. Thus, in this case throughput for the two methods is about the
same.
At higher conflict rates, 2PL certification performs more poorly. To see
why, suppose T, issues an operation that conflicts with that of some other
active transaction T,. In 2PL certification, T, and T, would execute to comple-
tion, even though at least one of them is doomed to be aborted. The execution
effort in completing that doomed transaction is wasted. By contrast, in 2PL T,
would be delayed. This ensures that at most one of T, and T, will be aborted
due to the conflict. Even if delaying T, causes a deadlock, the Gictim is aborted
before it executes completely, so less of its execution effort is wasted than in
2PL certification.
Quantitative studies are consistent with this intuition. In simulation and
analytic modelling of the methods, 2PL certification has lower throughput
than 2PL for most application parameters. The difference in throughput
increases with increasing conflict rate.
SGT Certification
SGT lends itself very naturally to a certifier. The certifier dynamically main-
tains an SSG of the execution it has produced so far, exactly as in Basic SGT.
Every time it receives an operation pJx], it adds the edge T1 -+ T, to the SSG
for every transaction q such that the certifier has already sent to the DM an
4.4 CERTIFIERS 131
TO Certification
A TO certifier schedules Reads and Writes without delay, except for reasons
related to handshaking between the certifier and the DM. When the certifier
receives ci, it certifies T; if all conflicts involving operations of T, are in time-
stamp order. Otherwise, it rejects ci and Ti is aborted. Thus, Ti is certified iff
the execution so far satisfies the TO rule. That is, in the execution produced
thus far, if some operation ei[x] precedes some conflicting operation qj[x] of
transaction Tj, then ts( Ti) < ts( Tj). This is the very same condition that Basic
TO checks when it receives each operation. However, when Basic TO finds a
violation of the TO rule, it immediately rejects the operation, whereas the TO
certifier delays this rejection until it receives the transaction’s Commit. Since
allowing such a transaction to complete involves extra work, with no hope
that it will ultimately commit, Basic TO is preferable to a TO certifier.
Distributed Certifiers
A distributed certifier consists of a collection of certifier processes, one for
each site. As with distributed schedulers, we assume that the certifier at a site is
responsible for regulating access to exactly those data items stored at that site.
Although each certifier sends operations to its respective DM indepen-
dently of other certifiers, the activity of transaction certification must be
carried out in a coordinated manner. To certify a transaction, a decision must
be reached involving all of the certifiers that received an operation of that
transaction. In SGT certification, the certifiers must exchange their local SSGs
to ensure that the global SSG does not have a cycle involving the transaction
132 CHAPTER 4 I NON-LOCKING SCHEDULERS
being certified. If no such cycle exists, then the transaction is certified (by all
certifiers involved); otherwise it is aborted,
In 2PL 0~ TO certification, each certifier can make a local decision
whether or not to certify a transaction, based on conflict information for the
data items it manages. A global decision must then be reached by consensus. If
the local decision of all certifiers involved in the certification of a transaction is
to certify the transaction, then the global decision is to certify. If even one certi-
fier’s local decision is to abort the transaction, then the global decision is to
abort. The fate of a transaction is decided only after this global decision has
been reached. A certifier cannot proceed to certify a transaction on the basis of
its local decision only.
This kind of consensus can be reached by using the following communica-
tion protocol between the TM that is supervising T, and the certifiers that
processed T,‘s operations. The TM distributes T,‘s Commit to all certifiers that
participated in the execution of T,. When a certifier receives c,, it makes a local
decision, called its Z)O~P,on whether to certify T, or not, and sends its vote to
the TM. After receiving the votes from all the certifiers that participate in T,‘s
certification, the TM makes the global decision accordingly It then sends the
global decision to all participating certifiers, which carry it out as soon as they
receive ir.4
Using this method for reaching a unanimous decision, a certifier may vote
to certify a transaction, yet end up having to abort it because some other certi-
fier voted not to certify. Thus, a certifier that votes to certify experiences a
period of uncertainty on the fate of the transaction, namely, the period
between the moment it sends its vote and the moment it receives the global
decision from the TM. Of course, a certifier that votes to abort is not un-
certain about the transaction. It knows it will eventually be aborted by all
certifiers.
T, where ts( T,) < ts( T,) < ts( T3) < ts( T4); T,, T,, and T, just write x; and T3
reads x. Now, suppose the scheduler scheduIes w,[x], r)[x], w,[x] in that order,
and then receives w,[x]. TWR says that it’s safe for the ww synchronizer to
accept wJx] but not process it. But this seems wrong, since T, should have
written x before r3[x] read the value written by w,[x]. This is true. But the
problem is one of synchronizing Reads against Writes and therefore none of
the ww synchronizer’s business. The rw synchronizer must somehow prevent
this situation.
This example drives home the division of labor between rw and ww
synchronizers. And it emphasizes once more that care must be taken in inte-
grating rw and ww synchronizers to obtain a correct scheduler.
We will examine two integrated schedulers, one using Basic TO for rw
synchronization and TWR for ww synchronization, and another using 2PL for
rw synchronization and TWR for ww synchronization. The first is a pure inte-
grated scheduler because both rw and ww synchronization are achieved by the
same mechanism, TO. The second is a mixed integrated scheduler because it
combines a 2PL rw synchronizer with a TWR ww synchronizer.
BIBLIOGRAPHIC NOTES
Early TO based algorithms appear in [Shapiro, Millstein 77a], [Shapiro, Millstein
77b], and [Thomas 791. The latter paper, first published in 1976 as a technical report,
also introduced certification and TWR, and was the first to apply voting to replicated
data (see Chapter 8). An elaborate TO algorithm using TWR, classes, and conflict
graphs was built in the SDD-1 distributed DBS [Bernstein et al. 781, [Bernstein, Ship-
man 801, [Bernstein, Shipman, Rothnie 801, and [McLean 811. Other TO algorithms
include [Cheng, Belford 801 and [Kaneko et al. 791. Multigranularity locking ideas are
applied to TO in [Carey 831.
Serialization graph testing has been studied in [Badal 791, [Casanova 811, [Hadzilacos,
Yannakakis 861, and [Schlageter 781. [Casanova 811 contains the space-efficient SGT
scheduler in Section 4.3.
The term “optimistic” scheduler was coined in [Kung, Robinson 811, who developed
the concept of certifier independently of [Thomas 791. Other work on certifiers
includes [Haerder 841, [Kersten, Tebra 841, and [Robinson 821. [Lai, Wilkinson 841
studies the atomicity of the certification activity. The performance of certifiers is
analyzed in [Menasce, Nakanishi 82a], [Morris, Wong 841, [Morris, Wong 8.51,and
[Robinson 821.
The rw and ww synchronization paradigm of Section 4.5 is from [Bernstein, Goodman
811 and [Bernstein, Goodman 811. The 2PL and TWR mixed integrated scheduler is
from [Bernstein, Goodman, Lai 831.
EXERCISES
4.1 In Basic TO, suppose the scheduler adjusts max-q-scheduled[x] when
it sends e,[x] to the DM, instead of when it adds ei[x] to queue[x]. What
effect does this have on the rate at which the scheduler rejects operations?
What are the benefits of this modification to Basic TO?
4.2 Modify Basic TO to avoid cascading aborts. Modify it to enforce the
weaker condition of recoverability. Explain why your modified schedulers
satisfy the required conditions.
4.3 In Basic TO, under what conditions (if any) is it necessary to insert an
operation pJx] to queue[x] other than at the end?
4.4 Generalize the Basic TO scheduler to handle arbitrary operations
(e.g., Increment and Decrement).
4.5 Modify the Basic TO scheduler of the previous problem to avoid
cascading aborts. Does the compatibility matrix contain enough informa-
tion to make this modification? If not, explain what additional informa-
EXERCISES 139
tion is needed. Prove that the resulting scheduler produces histories that
are cascadeless.
4.6 Compare the behavior of distributed 2PL with Wait-Die deadlock
prevention to that of distributed Basic TO.
4.7 Prove that the Strict TO scheduler of Section 4.2 produces strict his-
tories.
4.8 Design a conservative TO scheduler that uses knowledge of process
speeds and message delays to avoid rejecting operations.
4.9 Prove that the ultimate conservative TO scheduler in Section 4.2
produces SR histories.
4.10 Modify the ultimate conservative TO scheduler in Section 4.2 so that
each TM can manage more than one transaction concurrently.
4.11 Design an ultimate conservative TO scheduler that avoids rejections
by exploiting predeclaration. (Do not use the TM architecture of Section
4.2, where each TM submits operations to the DM in timestamp order.)
Prove that your scheduler produces SR executions.
4.12 Design a way of changing class definitions on-line in conservative TO.
4.13 Design a TO scheduler that guarantees the following property: For
any history H produced by the scheduler, there is an equivalent serial
history H, such that if Tj is committed before Tj in H, then Ti precedes Tj
in H,. (T; and Tj may not have operations that conflict with each other.)
Prove that it has the property.
4.14 A conflict graph for a set of classes is an undirected graph whose nodes
include RI and WI, for each class I, and whose edges include
ci (RI, WI) for all I,
q (RI, WI) if the readset of class I intersects the writeset of class /, and
q (WI, WI) if the writeset of class I intersects the writeset of class
I (I * I).
Suppose each class is managed by one TM, and that each TM executes
transactions serially. A transaction can only be executed by a TM if it is a
member of the TM’s class.
a. Suppose the conflict graph has no cycles. What additional constraints,
if any, must be imposed by the scheduler to ensure SR executions?
Prove that the scheduler produces SR executions.
b. Suppose the scheduler uses TWR for ww synchronization, and the
conflict graph has no cycles containing an (RI, WJ) edge (i.e., all cycles
only contain (WI, WI) edges). What additional constraints, if any,
must be imposed by the scheduler to ensure SR executions? Prove that
the scheduler produces SR executions.
4.15 If the size of the timestamp table is too small, then too many recent
timestamps will have to be deleted in order to make room for even more
140 CHAPTER 4 / NON-LOCKING SCHEDULERS
recent ones. This will cause a TO scheduler to reject some older operations
that access data items whose timestamps were deleted from the table. An
interesting project is to study this effect quantitatively, either by simulation
or by mathematical analysis.
4.16 Prove that the conservative SGT scheduler described in Section 4.3
produces SR executions.
4.17 Show that for any history produced by an SGT scheduler, there exists
an assignment of timestamps to transactions such that the same history
could be produced by a TO scheduler.
4.18 Design an SGT scheduler that guarantees the following property: For
any history H produced by the scheduler, there is an equivalent serial
history H, such that if T, is committed before T, in H, then T, precedes T,
in H,. Prove that it has the property.
4.19 Modify the space-efficient SGT scheduler described in Section 4.3 to
produce recoverable, cascadeless, and strict executions. ExpIain why each
of your modified schedulers satisfies the required condition.
4.20 Give a serializability theoretic correctness proof of the space-efficient
‘SGT scheduler described in Section 4.3.
4.21 Although the space requirements of both 2PL and space-efficient SGT
are proportional to a * jDl, w h ere a is the number of active transactions
and D is the size of the database, space-efficient SGT will usually require
somewhat more space than 2PL. Explain why.
4.22 Since a certifier does not control the order in which Reads and Writes
execute, a transaction may read arbitrarily inconsistent data. A correct
certifier will eventually abort any transaction that reads inconsistent data,
but this may not be enough to avoid bad results. In particular, a program
may not check data that it reads from the database adequately enough to
avoid malfunctioning in the event that it reads inconsistent data; for exam-
ple, it may go into an infinite loop. Give a realistic example of a transac-
tion that malfunctions in this way using 2PL certification, but never
malfunctions using Basic 2PL.
4.23 Prove that the committed projection of every history produced by a
2PL certifier could have been produced by a 2PL scheduier.
4.24 Give an example of a complete history that could be produced by a
2PL certifier but not by a 2PL scheduler. (In view of the previous exercise,
the history must include aborted transactions.)
4.25 Modify the 2PL certifier so that it avoids cascading aborts. What
additional modifications are needed to ensure strictness? Explain why
each modified certifier satisfies the required condition.
4.26 If a 2PL certifier is permitted to certify two (or more) transactions
concurrently, is there a possibiIity that it will produce a non-SR execution?
Suppose the 2PL certifier enforces recoverability, If it is permitted to certify
EXERCJSES 141