0% found this document useful (0 votes)
4 views24 pages

Chapter 5

Uploaded by

abcd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views24 pages

Chapter 5

Uploaded by

abcd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Contents Index

5
MULTIVERSION
CONCURRENCYCONTROL

5.1 INTRODUCTION
In a multiversion concurrency control algorithm, each Write on a data item x
produces a new copy (or version) of X. The DM that manages x therefore
keeps a list of versions of X, which is the history of values that the DM has
assigned to X. For each Read(x), the scheduler not only decides when to send
the Read to the DM, but it also tells the DM which one of the versions of x to
read.
The benefit of multiple versions for concurrency control is to help the
scheduler avoid rejecting operations that arrive too late. For example, the
scheduler normally rejects a Read because the value it was supposed to read
has already been overwritten. With multiversions, such old values are never
overwritten and are therefore always available to tardy Reads. The scheduler
can avoid rejecting the Read simply by having the Read read an old version.’
Maintaining multiple versions may not add much to the cost of concur-
rency control, because the versions may be needed anyway by the recovery
algorithm. As we’ll see in the next chapter, many recovery algorithms have to
maintain some before image information, at least of those data items that have
been updated by active transactions; the recovery algorithm needs those before
images in case any of the active transactions abort. The before images of a data
item are exactly its list of old versions. It is a small step for the DM to make
those versions explicitly available to the scheduler.
An obvious cost of maintaining multiple versions is storage space. To
control this storage requirement, versions must periodically be purged or

143
144 CHAPTER 5 I MULTIVERSION CONCURRENCY CONTROL

archived. Since certain versions may be needed by active transactions, purging


versions must be synchronized with respect to active transactions. This purg-
ing activity is another cost of multiversion concurrency control.
We assume that if a transaction is aborted, any versions it created are
destroyed. In our subsequent discussion, the term “version” will refer to the
value of a data item produced by a transaction that’s either active or commit-
ted. Thus, when the scheduler decides to assign a particular version of x to
Read(x), the value returned is not one produced by an aborted transaction. If
the version read is one produced by an active transaction, recoverability
requires that the reader’s commitment be delayed until the transaction that
produced the version has committed. If that transaction actually aborts
(thereby invalidating its version), the reader must also be aborted.
The existence of multiple versions is only visible to the scheduler and DM,
not to user transactions. Transactions still reference data items, such as x and
3: Users therefore expect the DBS to behave as if there were only one version of
each data item, namely, the last one that was written from that user’s perspec-
tive. The scheduler may use multiple versions to improve performance by
rejecting operations less frequently. But it must not change the system’s
functionality over a single version view of the database.
There are many applications of databases in which users do want to
explicitly access each of the multiple versions of a data item. For example, a
user may wish to maintain several versions of a design database: the last design
released for manufacturing, the last design checked for correctness, and the
most recent working design. The user may update each version of the design
independently. Since the existence of these multiple versions is not transparent
to the user, such applications are not appropriate for the multiversion concur-
rency control algorithms described in this chapter.

Analyzing Correctness
To analyze the correctness of multiversion concurrency control algorithms, we
need to extend serializability theory. This extension requires two types of
histories: multiversion (MV) histories that represent the DM’s execution of
operations on a multiversion database, and single version (IV) histories that
represent the interpretation of MV histories in the users’ single version view of
the database. Serial 1V histories are the histories that the user regards as
correct. But the system actually produces MV histories. So, to prove that a
concurrency control aIgorithm is correct, we must prove that each of the MV
histories that it can produce is equivalent to a serial 1V history,
What does it mean for an MV history to be equivalent to a 1V history?
Let’s try to answer this by extending the definition of equivalence of 1V histo-
ries that we used in Chapters 2-4. To attempt this extension, we need a little
5.1 INTRODUCTION 145

notation. For each data item X, we denote the versions of x by xi, xj, . . . , where
the subscript is the index of the transaction that wrote the version. Thus, each
Write in an MV history is always of the form Wi[Xi], where the version
subscript equals the transaction subscript. Reads are denoted in the usual way,
such as ri[xj].
Suppose we adopt a definition of equivalence that says an MV history
HM” is equivalent to a 1V history HIV if every pair of conflicting operations in
Hbp, is in the same order in HIV. Consider the MV history
H, = wobolcoWEA cl rz[xolw,[yzl cz.
The only two operations in H, that conflict are w,[x,] and r,[x,]. The operation
w,[x,] does not conflict with either w,[x,] or r,[x,], because it operates on a
different version of x than those operations, namely x,. Now consider the 1V
history
Hz = wo[xl co4x1 ~1~z[xlw[yl cz.
We constructed H, by mapping each operation on versions x0, x,, and yz in H,
into the same operation on the corresponding data items x and y. Notice that
the two operations in H, that conflict, w,[x,] and r,[x,], are in the same order
in H, as in H,. So, according to the definition of equivalence just given, H, is
equivalent to H,. But this is not reasonable. In H,, T, reads x from T,, whereas
in H,, T, reads x from T,,.’ Since T2 reads a different value of x in H, and H,, it
may write a different value in y.
This definition of equivalence based on conflicts runs into trouble because
MVand 1V histories have slightly different operations - version operations
versus data item operations. These operations have different conflict proper-
ties. For example, w,[x,] does not conflict with yz[xo], but their corresponding
1V operations w,[x] and TJX] do conflict. Therefore, a definition of equiva-
lence based on conflicts is inappropriate.
To solve this problem, we need to return to first principles by adopting the
more fundamental definition of view equivalence developed in Section 2.6.
Recall that two histories are view equivalent if they have the same reads-from
relationships and the same final writes. Comparing histories H, and H,, we see
that T, reads x from T, in H,, but T, reads x from T, in H,. Thus, H, is not
view equivalent to H2.
Now that we have a satisfactory definition of equivalence, we need a way
of showing that every MV history H produced by a given multiversion concur-
rency control algorithm is equivalent to a serial 1V history. One way would be
to show that SG( H) is acyclic, so H is equivalent to a serial MV history, Unfor-
tunately, this doesn’t help much, because not every serial MV history is equiva-
lent to a serial 1V history. For example, consider the serial MV history

‘Recall from Section 2.4 that T; reads xfrom Tj in H if (1) tuj[x] < TJx], (2) aj Q ri[x], and (3)
if there is some wk[x] such that Wj[x] < wk[x] < wi[x], then ak < r;[x].
146 CHAPTER 5 I MULTIVERSION CONCURRENCY CONTROL

H, = w,,[x,,lw,[yolcor,[.GlY,[YolWi~,l WlrYll Cl YL[XOIYAYll C?.


If we treat the versions of x and y as independent data items, then we get

SG(H,) = T,, -+ T, -+ T,.

Although H, is serial and SG(H,) is acyclic, H, is not equivalent to a serial


1V history. For example, consider the 1V history

We can show that I-I, is not view equivalent to H, by showing that they do not
have the same reads-from relationships. In H,, TL reads x and y from T,. But in
H,, T2 reads x from T,, and reads 3’ from T,. Since T, reads different values in
H, and H,, the two histories are not equivalent. Similarly, H, is not equivalent
to the 1V history

Clearly, H, is not equivalent to any 1V serial history over the same set of trans-
actions.
Only a subset of serial MV histories, called l-serial MV histories, are
equivalent to serial 1V histories. Intuitively, a serial MV history is I-serial if
for each reads-from relationship, say T, reads x from T,, T, is the last trdnsac-
tion preceding T, that writes any version of x. Notice that Ii, is not l-serial
because TL reads x from T,), not T,, which is the last transaction preceding T2
that writes x.
All l-serial MV histories are equivalent to serial 1V histories, so we can
define l-serial histories to be correct. To prove that a multiversion concurrency
control algorithm is correct, we must show that its MV histories are equivalent
to l-serial MV histories. We will do this by defining a new graph structure
called a multiversion serialization graph (MVSG). An MV history is equivalent
to a l-serial MV history iff it has an acyclic MVSG. Now proving multiversion
concurrency control algorithms correct is just like standard serializability
theory. We simply prove that its histories have acyclic MVSGs. We now
proceed with a formal development of this line of proof.

5.2 *MULTIVERSION SERIALIZABILITY THEORY’


Let T = {T,,, . . . T,} be a set of transactions, where the operations of T, are
ordered by <i for 0 I i 5 n. To process operations from T, a multiversion
scheduler must translate T’s operations on (single version) data items into
operations on specific versions of those data items. We formalize this transla-

‘This section requires reading Section 2.6 as a prerequisite. We recommend skipping this and
other starred secrions of this chdpter on the first reading, to gain some intuition for mulriver-
sion algorithms before studying their serializability theory.
5.2 ‘MULTIVERSION SERIALIZABILITY THEORY 147

tion by a function h that maps each wi[x] into w;[x;], each ri[x] into ri[q] for
some j, each c; into c;, and each a; into ai.
A complete multiversion (ML’) history H over T is a partial order with
ordering relation < where
1. H = h( U:‘J;) for some translation function h;
2. for each Ti and all operations pi, 4; in Ti, if pi <i qi, then h(pJ < h(qJ;
3. if h(rj[x]) = rj[xiJ, then w;[xJ < rj[Xi];
4. if w;[x] <i ri[x], then h(rJx]) = r;[xJ; and
5. if h(r$x]) = rj[Xi], i # j, and cj E H, then C; < cj.
Condition (1) states that the scheduler translates each operation submitted
by a transaction into an appropriate multiversion operation. Condition (2)
states that the MV history preserves all orderings stipulated by transactions.
Condition (3) states that a transaction may not read a version until it has been
produced.3 Condition (4) states that if a transaction writes into a data item x
before it reads X, then it must read the version of x that it previously created,
This ensures that His consistent with the implied semantics of the transactions
over which it is defined. If H satisfies condition (4), we say that it preserves
reflexive reads-from relationships. Condition (5) says that before a transaction
commits, all the transactions that produced versions it read must have already
committed. If H satisfies this condition we say it is recoverable.
An M V history H is a prefix of a complete MV history. We say that an MV
history preserves reflexive reads-from relationships (or is recoverable) if it is
the prefix of a complete MV history that does so. As in 1V histories, a transac-
tion Ti is committed (respectively aborted) in an MV history H if ci (respec-
tively ai) is in H. Also, the committed projection of an MV history H, denoted
C(H), is defined as for 1V histories; that is, C(H) is obtained by removing from
H the operations of all but the committed trans,actions. It is easy to check that
if H is an MV history then C(H) is a complete MV history, i.e., C(H) satisfies
conditions (1) - (5) (see Exercise 5.2).
For example, given transactions {TO, T,, T2, T,, T4}$
%[Xl \ / %[)'I\
To = wl[Yl 2 co T3 = 73[21
wlkl b w,[zl PC3
r&l -
T, = rl[x] - WSYI - Cl T4 = r&l - 6
y4[zl -
721~1 \
T2 = w,[xl - 6
7Szl j

‘To ensure condition (3), we will normally include in our examples an initializing transaction,
To, that creates the initial version of eacli data item.
148 CHAPTER 5 / MULTIVERSION CONCURRENCY CONTROL

the following history, H,, is a complete MV history over ET,,, T,, T,, T,, T4}.

,411compIete MV histories over a set of transactions must have the same


Writes, but they need not have the same Reads. For exampIe, H- has ri[y,]
instead of rj[v,].

MV History Equivalence
Two 1V histories over the same transactions are uiew equivalent if they contain
the same operations, have the same reads-from reIationships, and the same
final writes. However, for MV histories, we can safely drop “and the same
final writes” from the definition. If two histories are over the same transac-
tions, then they have the same Writes. Since no versions are overwritten, all
Writes are effectively final writes. Thus, if two MV histories over the same
transactions have the same operations and the same reads-from relationships,
then they have the same final writes and are therefore view equivalent.
To formalize the definition of equivalence, we must formalize the notion of
reads-from in A4V histories. To do this, we replace the notion of data item by
version in the ordinary definition of reads-from for 1V histories. Transaction
r, reads x from T, in MV history H if T, reads the version of x produced by T,.
Since the version of x produced by XC,is x,, T, reads x from T, in H iff T, reads
x,, that is, iff r,[xJ E H.
Two h4V histories over a set of transactions Tare equivalent, denoted s,
if they have the same operations and the same reads-from relationships. In
view of the preceding discussion, having the same reads-from relationships
amounts to having the same Read operations. Therefore, equivalence of MV
histories reduces to a trivial condition, as stated in the following proposition.

Proposition 5.1: Two MV histories over a set of transactions are equiva-


lent iff the histories have the same operations. 3
5.2 ‘MULTIVERSION SERIALIZABILITY THEORY 649

Next we want to define the equivalence of an MV history HMv to a 1V history


Hiv. We will only be interested in such an equivalence if Hiv is a valid one
version view of Hbiv. That is, Hiv and H&Iv must be over the same set of trans-
actions and their operations must be in one-to-one correspondence. More
precisely, there must be a bijective (one-to-one and onto) function from the
operations of H iv to those of Hbrv, mapping Ci to ci, a, to ai, r;[x] to l;.[xI] for
some version xj of x and w,[x] to wJxJ.
Given that the operations of Hi~v and Hrv are in one-to-one correspon-
dence, we can talk about their, reads-from relationships being the same. We
need not worry about final writes; all of the final writes in Niv must be part of
the state produced by Hh?v, because HMv retains all versions written in it. So,
just like MV histories, an MV history and 1V history are equivalent if they
have the same reads-from relationships.’

Serialization Graphs
Two operations in an MV history conflict if they operate on the same version
and one is a Write. Only one pattern of conflict is possible in an MV history: if
pi < qj and these operations conflict, then pi is w;[x,] and qj is r$x;] for some
data item X. Conflicts of the form wf[xi] < wj[xi] are impossible, because each
Write produces a unique new version. Conflicts of the form Yj[xi] < wi[x,] are
impossible, because Tj cannot read xi until it has been produced. Thus, all
conflicts in an MV history correspond to reads-from relationships.
The serialization graph for an MV history is defined as for a 1V history.
But since only one kind of conflict is possible in an MV history, SGs are quite
simple. Let H be an MV history. SG(H) has nodes for the committed transac-
tion in H and edges T; + Tj (i #:j) whenever for some X, Tj reads x from Ti.
That is, T; + Tj is present iff for some X, rj[xf] (i # j) is an operation of C(H).
This gives us the following proposition.

Proposition 5.2: Let H and H’ be MV histories. If II = N’, then SG(H)


= SG(H’). 0

The serialization graphs of N, and H, follow.

SG(H,) =
/T2\
T, -T3- T.,
bl-1

‘%vo 1V histories can be equivalent in this sensewithout being view equivalent to each other,
because they don’t have the same final writes.
150 CHAPTER 5 / MULTIVERSION CONCURRENCY CONTROL

One Copy Serializability


A complete MV history is serial if for every two transactions T, and Tl that
appear in H, either all of T,‘s operations precede all of T’,‘s or vice versa. Nor all
serial MV histories behave like ordinary serial 1V histories, for example,

The subset of serial MV histories that are equivalent to serial 1V histories is


defined as follows.
A serial MV history H is one-copy serial (or I-serial) if for all i,i, and X, if
T, reads x from T,, then i = j, or Ti is the Iast transaction preceding T, that
writes into any version of X. Since H is serial, the word last in this definition is
well defined. History H, is not l-serial because T2 reads x from To but w,[x,,] <
w,[x,] < r2[x,]. History H,, which follows, is l-serial.

An MV history is one-copy serializable (or ZSR) if its committed projec-


tion is equivalent to a l-serial MV history.5 For example, H, is 1SR because
C(H,) = H, is equivalent to H,, which you can verify by Proposition 5.1.
C(H,) = Hi is equivalent to no l-serial history, and thus H7 is not 1SR.
A serial history can be 1SR even though it is not l-serial. For example,
H, = W,b”l 6 r,[x,l ~,[~,I Cl~zE%l
cz
is not l-serial since TL reads x from To instead T,. But it is lSR, because it is
equivalent to

To justify the value of one-copy serializability as a correctness criterion, we


need to show that the committed projection of every 1SR history is equivalent
to a serial 1V history.

Theorem 5.3: Let H be an MV history over ?: C(H) is equivalent to a


serial, 1V history over T iff H is 1SR.
Proof: (If) Since H is lSR, C(H) is equivalent to a l-serial MV history
H,. Translate H, into a serial 1V hisrory Hi, by translating each WJXJ into
wi[x] and Y,[x~] into Yj[x]. To show H, E HJ, consider any reads-from rela-
tionship in Hs, say Tl reads x from T,. Since E?, is I-serial, no wk[xk] lies
between wJx,] and u,[xJ. Hence no wk[x] lies between w,[x] and Qx] in H:.

“It turns out that this is a prefix commit-closed property. Unlike view serializability, we need
not require that the committed projection of every prefix of an MV history be equivalent to a
l-serial MV history. This follows from the fact that the committed projection of the history
itself is equivalent to a l-serial MV history (see Exercise 5.4).
5.2 ‘MULTIVERSION SERIALIZABILITY THEORY 151

Thus Tj reads x from T; in Hi. Now consider a reads-from relationship in


Hi, Tj reads x from Ti. If rj[x] was translated from rj[x;] in H,, then Tj
reads x from T, in H, and we’re done. So assume instead that rj[x] was
translated from rj[xk], k # i. If i = j, then k = i by condition (4) in the
definition of MV history and we’re done. If i + j, then since H, is l-serial,
either wi[xi] < wh[xk] or rj[xk] < wJx,]. But then, translating these opera-
tions into Hi implies that Tj does not read x from T; in Hi, a contradiction.
Thus Tj reads x from Ti in H,. This establishes Hi, E H,. Since H, E
C(H), C(H) = H: follows by transitivity of equivalence.
(Only if) Let Hi be the hypothesized serial 1V history equivalent to
C(H). Translate HJ into a serial MV history H, by translating each c; into
ci, wi[x] into wi[x,], and T~[x] into rj[x;] such that Tj reads x from Ti in H:.
We must show that H, is indeed a complete MV history. It is immediate
that it satisfies conditions (1) and (2) of the complete MV history defini-
tion. For condition (3), it is enough to show that each rj[x] is preceded by
some w;[x] in Hi. Since H is an MV history, each Tj[Xk] in C(H) is preceded
by wk[xk]. Since H1 has the same reads-from relationships as the MV
history C(H), every Read in Hi, must be preceded by a Write on the same
data item, as desired. To show H, satisfies condition (4), note that if wj[x]
< rj[x] in Hi, then since Hi is serial, Tj reads x from Tj in Hi and rj[x] is
translated into rj[Xj] in H,. Finally, for condition (5) we must show that
if rj[x/] (i+j) is in H, then ci < cj. If rj[xi] is in H, then Tj reads x from T;
in Hl. Since H: is serial and Ti, Tj are committed in it, we have ci < cl in
Hi. By the translation then, it follows that c; < cj in H,, as wanted. This
concludes the proof that H, is indeed an MV history, Since the transla-
tion preserves reads-from relationships, so H, s Hi. By transitivity,
C(H) = H,.
It remains to prove that H, is l-serial. So consider any reads-from rela-
tionship in HS, say Tj reads x from T;, where i # j. Since Hi is a 1%’ history,
no wk[x] lies between wi[x] and rj[x]. Hence no wk[xk] lies between wi[xi]
and rj[xi] in H,. Thus, H, is l-serial, as desired. 0

The 1-Serializability Theorem


To determine if a multiversion concurrency control algorithm is correct, we
must determine if all of its histories are 1SR. To do this, we use a modified SG.
The modification is motivated by the fact that all known multiversion concur-
rency control algorithms sort the versions of each data item into a total order.
We use this total order of versions to define an appropriately modified SG.
Given an MV history H and a data item x, a version order, <, for x in His
a total order of versions of x in H. A version order for H is the union of the
version orders for all data items. For example, a possible version order for H,
is x0 < x,, y,, < y1 < y3, and z0 4 z3.
152 CHAPTER 5 I MULTIVERSION CONCURRENCY CONTROL

Given an MV history H and a version order 4, the multit,ersion serializa-


tiort graph for H and <, MVSG(H, e), is SG(H) with the following version
order edges added: for each Y~[x,] and u~,[x,] in C(H) where i, j, and k are
distinct, if x, << ,y, then include T, -+ T,, otherwise include Tk + T,.’ (Note that
there is no version order edge ifj = k, that is, if a transaction reads from
itself.) For example, given the preceding version order for H,,

The version order edges that are in MVSG(H,, e) but not in SG(H) are T, --f
TL, T, + T,, and TL -+ T,. Except for T,, --) T,, all edges in SG(H) are also
version order edges.
Given an MV history H, suppose SG(H) is acyclic. We know that a serial
MV history H, obtained by topologically sorting SG(H) may not be equivalent
to any serial 1V history. The reason is that some of HS’s reads-from relation-
ships may be changed by mapping version operations into data item opera-
tions. The purpose of version order edges is to detect when this happens. If
T~[x,] and w,[x,] are in C(H), then the version order edge forces UJ~[X,]to either
precede wi[x,] or follow Y~[x,] in H,. That way, when operations on x, and xj are
mapped to operations on x when changing H, to a IV history, the reads-from
relationship is undisturbed. This ensures that we can map H, into an equiva-
lent 1V history, Of course, all of this is possible onIy if SG(H) is stiI1 acyclic
after adding version order edges. This observation leads us to the following
theorem, which is our main tool for analyzing multiversion concurrency
control algorithms.

Theorem 5.4: An MV history H is 1SR iff there exists a version order 4


such that MVSG(H, Q ) is acyclic.
Proqf: (If) Let H, be a serial MV history T,, T,! . . . T,-, where T,,, T,,:, . . .
T,. is a topological sort of MVSG(H, <). Since C(H) is an MV history, it
follows that H, is as well. Since f-l, has the same operations as C(H), by
Proposition 5.1, H, s C(H).

‘Recali that the nodes of X(H) and. therefore, of h4VSG(H, 9) are the ~orrmi~trd UJ~SAC-
lions in H.
5.3 MULTIVERSION TIMESTAMP ORDERING 153

It remains to prove that H, is l-serial. Consider any reads-from rela-


tionship in Hs, say Tk reads x from T,, k # i. Let wi[xi] (i#i and if k) be
any other Write on x.’ If xi 4 xj, then MVSG(H, < ) includes the version
order edge Ti + Tj, which forces Tj to follow Ti in H,. If xj 4 xi, then
MVSG( H, < ) includes the version order edge Tk -+ T,, which forces Th to
precede Tj in H,. Therefore, no transaction that writes x falls in between
Tj and Tk in H,. Thus, H, is l-serial.
(Only if) Given H and -%, let MV(H, 4 ) be the graph containing only
version order edges. Version order edges depend only on the operations in
H and 4 ; they do not depend on the order of operations in H. Thus, if H
and H’ are MV histories with the same operations, then MV(H, 4 ) =
MV( H’, 4 ) for all version orders Q ,
Let H, be a l-serial MV history equivalent to C(H). All edges in SG( Hs)
go “left-to-right;” that is, if T; + Tj then T, precedes TI in H,. Define < as
follows: xi 6 x1 only if T; precedes Tj in H,. All edges in MV(H,, +) are
also left-to-right. Therefore all edges in MVSG(H,, e) = SG(H,) U
MV(H,, <) are left-to-right. This implies MVSG(H,, 4) is acyclic.
By Proposition 5.1, C(H) and H, have the same operations.
Hence MV(C(H), +) = MV(H,, 4). By Proposition 5.1 and 5.2
SG(C(H)) = SG(H,). Therefore MVSG(C(H), +) = MVSG(H,, 4). Since
MVSG( H,, < ) is acyclic, so is MVSG(C( H), 4 ), which is identical to
MVSG(H, <). cl

5.3 MULBBVERSlQN TllMESTAMP ORDERING


We can define schedulers for multiversion concurrency control based on each
of the three basic types of schedulers: 2PL, TO, and SGT. We begin with a
multiversion scheduler based on TO because it is the easiest to prove correct.
As for all TO schedulers, each transaction has a unique timestamp,
denoted ts(Ti). Each operation carries the timestamp of its corresponding
transaction. Each version is labeled by the timestamp of the transaction that
wrote it.
A multiversion TO (M VTO) scheduler processes operations first-come-
first-served. It translates operations on data items into operations on versions
to make it appear as if it processed these operations in timestamp ,order on a
single version database. The scheduler processes ri[x] by first translating it into
YJx~], where xk is the version of x with the largest timestamp less than or equal
to ts(Ti), and then sending rj[xk] to the DM. It processes Wi[X] by considering
two cases. If it has already processed a Read Tj[Xk] such that ts( Tk) < ts( Ti) <
ts( Tj), then it rejects wi[x]. Otherwise, it translates’w,[x] into WJXJ and sends it
to the DM. Finally, to ensure recoverability, the scheduler must delay the
processing of ci until it has processed Cj for all transactions Tj that wrote
versions read by Ti.
154 CHAPTER 5 / MULTIVERSION CONCURRENCY CONTROL

To understand MVTO, it is helpful to compare its effect to an execution,


say Hiv, on a single version database in which operations execute in timestamp
order. In Hiv, each Read, r,[xJ, reads the value of x with the largest timestamp
less than or equal to ts(T,). This is the value of the version that the MVTO
scheduler selects when it processes rr[xJ.
Since MVTO need not process operations in timestamp order, a Write
could arrive whose processing would invalidate a Read that the scheduler
already processed. For example, suppose w,[xOJ < rL[xoJ represents the execu-
tion so far, where ts(T,) = i for all transactions. Now if u/,[xJ arrives, the
scheduler has a problem. If it translates wl[xJ into w,[x,] and sends it to the
DM, then it produces a history that no longer has the same effect as a TO
execution on a single version database. For in such an execution, rL[x] would
have read the value of x written by T,, but in the execution w,,[xnJ r,[x,J w,[x,J,
it reads the value written by To. In this case, we say that w,[xJ would have
invalidated rz[xoJ. To avoid this problem, the scheduler rejects w,[x] in this
case. In general, it rejects w,[xJ if it has already processed a Read r,[xk] such
that ts( Tk) < ts( T,) < ts( T,). This is exactly the situation in which processing
w,[xJ would invalidate r,[xkJ.
To select the appropriate version to read and to avoid invalidating Reads,
the scheduler maintains some timestamp information about operations it has
already processed. For each version, say x,, it maintains a timestamp interval,
denoted interval(x,) = [wts, rts], where wts is the timestamp of x, and rts is the
largest timestamp of any Read that read x,; if no such Read exists, then rts =
wts. Let intervals(x) = { interval(x,) 1x, is a version of x>.
To process ~~1x1,the scheduler examines intervals(x) to find the version x1
whose interval, interval(x,) = [wts, rts], has the maximal wts less than or
equal to ts( T,). If ts( T,) > rts, then it sets rts to ts( 17;).
To process u/,[xJ, the scheduler again examines intervals(x) to find the
version x1 whose interval [wts, rtsJ has the maximal wts less than ts( T1). If rts
> ts( T;), then it rejects zu,[xJ. Otherwise, it sends w,[x,J to the DM and creates
a new interval, interval(x,) = [wts, rts], where wts = rts = ts( T,).
Eventually, the scheduler will run out of space for storing intervals, or the
DM will run out of space for storing versions. At this point, old versions and
their corresponding intervals must be deleted. To avoid incorrect behavior, it is
essential that versions be deleted from oldest to newest. To see why, consider
the following history,

where ts( T,) = i for 0 5 i I 4. Suppose the system deleted x2 but not x,. If
r3[xJ now arrives, the scheduler will incorrectly translate it into r3[xoJ . Suppose
instead that the system deleted x0. If r,[xJ now arrives, the scheduler will find
no version whose interval has a wts < ts( T,). This condition indicates that the
DBS has deleted the version that T,[x] has to read, so the scheduler must reject
r,[xJ*
5.3 MULTIVERSION TIMESTAMP ORDERING 155

‘Proof of Correctness
To prove MVTO correct, we must describe it in serializability theory As usual,
we do this by inferring properties that all histories produced by MVTO will
satisfy. Using these properties as our formal specification of the algorithm, we
prove that all histories produced by MVTO have an acyclic MVSG and hence
are 1SR.
The following properties describe the essential characteristics of every
MVTO history H over (T,, . . . T,}.
MVTO,. For each T;, there is a unique timestamp ts( T;); that is, ts( Tj) =
ts( T1) iff i = j.
M VTO,. For every rk[$ E H, wj[xjJ < rk[xjJ and ts(Tj) I ts(Tk).
M VTO,. For every rk[X]J and w;[xiJ E H, i#i, either (a) ts( Ti) < ts( Tj) or
(b) ts( Tk) < ts( Ti) or (c) i = k and Y~[x~] < wi[xiJ.
M VTO,. If rj[xf] E H, i+i, and cj E H, then C; < cj.
Property MVTO, says that transactions have unique timestamps. Property
MVTO, says that each transaction Tk only reads versions with timestamps
smaller than ts( Tk). Property MVTO, states that when the scheduler processes
Q[x~], xj is the version of x with the largest timestamp less than or equal to
ts( Tk). Moreover, if the scheduler later receives wi[xi], it will reject it if ts( Tj) <
ts( T,) < ts( Tk). MVTO, states that H is recoverable.
These conditions ensure that H preserves reflexive reads-from relation-
ships. To see this, suppose not, that is, wk[xkJ < rk[xj] andj # k. By MVTO,
andi # k, ts( Tj) < ts( Tk). By MVTO,, either (a) tS( Tk) < ts( Tj), (b) tS( Tk) <
ts( Tk), or (c) Yk[XjJ< wk[xk]. All three cases are impossible, a contradiction.
We now prove that any history satisfying these properties is 1SR. In other
words, MVTO is a correct scheduler.
Theorem 5.5: Every history H produced by MVTO is 1SR.
Proof: Define a version order as follows: xi < Xj iff ts( T;) < tS(Tj). We
now prove that MVSG(H, 4) is acyclic by showing that for every edge
Ti + Tj in MVSG(H, 6), ts( Ti) < ts(Tj).
Suppose T; + Tj is an edge of SG(H). This edge corresponds to a reads-
from relationship. That is, for some X, Tj reads x from T;. By MVTO,,
ts( Ti) 5 ts( Tj). By MVTO,, ts( T;) # ts(Tj). SO,ts( Ti) < ts( Tj) as desired.
Let rk[xj] and wi[xi] be in H where i, j, and k are distinct, and consider
the version order edge that they generate. There are two cases: (1) xi 4 Xj,
which implies Ti + Tj is in MVSG(H, < ); and (2) xj 4 xi, which implies
Tk -+ T; is in MVSG(H, + ). In case (l), by definition of -%, ts( Ti) <
ts( Tj). In case (2), by MVTO,, either ts( Ti) < ts( Tj) or ts( Tk) < ts( Ti). The
first option is impossible, because Xj % xi implies ts( Tj) < ts( T;). SO,ts( Tk)
< ts( Ti) as desired.
156 CHAPTER 5 / MULTIVERSION CONCURRENCY CONTROL

Since all edges in MVSG( H, + ) are in timestamp order, hlVSG(H, + ) is


acyclic. By Theorem 5.4, H is 1SR. [3

5.4 MULTIVERSION TWO PHASE LOCKING


Two Version 2PL
In 2PL, a write lock on a data item x prevents transactions from obtaining read
locks on x. We can avoid this locking conflict by using two versions of x. When
a transaction T, writes into x, it creates a new version x, of x. It sets a lock on x
that prevents other transactions from reading x, or writing a new version of x.
However, other transactions are allowed to read the previous version of x.
Thus, Reads on x are not delayed by a concurrent writer of x. In the language
of Section 4.5, we are using 2PL for ww synchronization and version selection
for rw synchronization. As we will see, there is also certification activity
involved.
To use this scheme, the DM must store one or two versions of each data
item. If a data item has two versions, then only one of those versions was writ-
ten by a committed transaction. Once a transaction T, that wrote x commits,
its version of x becomes the unique committed version of x, and the previous
committed version of x becomes inaccessible.
The two versions of each data item could be the same two versions used by
the DM’s recovery algorithm. If T, wrote x but has not yet cormmitted, then the
two versions of x are Tl’s before image of x and the value of x it wrote. As we
will see, T,‘s before image can be deleted once T, commits. So, an old version
becomes dispensable for both concurrency control and recovery reasons at
approximately the same time.’
A two version 2PL (2 VZPL) scheduler uses three types of locks: read locks,
write locks, and certify locks. These locks are governed by the compatibility
matrix in Fig. 5-l. The scheduler sets read and write locks at the usual time,
when it processes Reads and Writes. When it learns that a transaction has
terminated and is about to commit, it converts all of the transaction’s write
locks into certify locks. We will explain the handling and significance of certify
locks in a moment.
When a 2V2PL scheduler receives a Write, u/,[x], it attempts to set
wl,[xJ. Since write locks conflict with certify locks and with each other, the
scheduler delays w,[x] if another transaction already has a write or certify lock
on x. Otherwise, it sets t~jI,[xJ, translates w,[xJ into w,[x,J, and sends wi[xlJ to
the DM.
When the scheduler receives a Read, T,[x], it attempts to set v&[xJ. Since
read locks only conflict with certify locks, it can set this lock as long as no

-This firs especially \vell with the shadow page recovery techniques used, for example. in the
no-undo/no-redo algorithm of Section 6.7.
5.4 MULTIVERSION TWO PHASE LOCKING 157

Read Write Certify

Read Y Y n
Write Y n n
Certify n n n

FIGURE 5-i
Compatibility Matrix for Two Version 2PL

transaction already owns a certify lock on X. If Tj already owns wli[.u] and has
therefore written xi, then the scheduler translates Y;[x] into YJxJ, which it sends
to the DM. Otherwise, it waits until it can set a read lock, and then sets the
lock, translates YJX] into rj[xj], where xj is the most recently (and therefore
only) committed version of X, and sends ri[xj] to the DM. Note that since only
committed versions may be read (except for versions produced by the reader
itself), the scheduler avoids cascading aborts and, a fovtiori, ensures that the
MV histories it produces are recoverable.
When the scheduler receives a Commit, ci, indicating that T; has termi-
nated, it attempts to convert T/s write locks into certify locks. Since certify
locks conflict with read locks, the scheduler can only do this lock conversion
on those data items that have no read locks owned by other transactions. On
those data items where such read locks exist, the lock conversion is delayed
until all read locks are released. Thus, the effect of certify locks is to delay T,‘s
commitment until there are no active readers of data items it is about to over-
write.
Note that lock conversions can lead to deadlock just as in standard 2PL.
For example, suppose T; has a read lock on x and Tj has a write lock on X. If Ti
tries to convert its read lock to a write lock and Tj tries to convert its write lock
to a certify lock, then the transactions are deadlocked. We can use any dead-
lock detection or prevention technique: cycle detection in a WFG, timestamp-
based prevention, etc.
Since a transaction may deadlock while trying to convert its write locks, it
may be aborted during this activity. Therefore, it must not release its locks or
be committed until it has obtained all of its certify locks.
Certify locks in 2V2PL behave much like write locks in ordinary 2PL.
Since the time to certify a transaction is usually much less than the total time to
execute it, 2V2PL’s certify locks delay Reads for less time than 2PL’s write
locks delay Reads. However, since existing read locks delay a transaction’s
certification in 2V2PL, the improved concurrency of Reads comes at the
expense of delaying the certification and therefore the commitment of update
transactions.
158 CHAPTER 5 / MULTIVERSION CONCURRENCY CONTROL

Using More than Two Versions


The only purpose served by write locks is to ensure that only two versions of a
data item exist at a time. They are not needed to attain l-serializability. If we
relax the conflict rules so that write locks do not conflict, then a data item may
have many uncertified versions (i.e., versions written by uncommitted transac-
tions), However, if we follow the remaining 2V2PL locking rules, then only
the most recently certified version may be read.
If we are willing to cope with cascading aborts, then we can be a little
more flexible by allowing a transaction to read any of the uncertified versions.
(We could make the same allowance in 2V2PL, in which there is at most one
uncertified version to read.) To get the same correct synchronization behavior
as 2V2PL, we have to modify the scheduler in two ways. First, a transaction
cannot be certified until all of the versions it read (except for ones it wrote
itself) have been certified. And second, the scheduler can only convert a write
lock on x into a certify lock if there are no read locks on certified versions of x.
Essentially, the scheduler is ignoring a read lock on an uncertified version
until either that version is certified or the transaction that owns the read lock
tries to become certified. This is just like delaying the granting of that read lock
until after the version to be read is certified. The only difference is that cascad-
ing aborts are now possible. If the transaction that produced an uncertified
version aborts, then transactions that read rhat version must also abort.

*Correctness of 2V2PL
To list the properties of histories produced by executions of 2V2PL, we need to
include the operationf,, denoting the certification of T,.
Let H be a history over {T,), . . . T,,) produced by 2V2PL. Then H must
satisfy the following properties.

2 V2PL,. For every T,,fi follows all of Ti’s Reads and Writes and precedes
T/s commitment.
2 V2PLL. For every rk[x,] in H, ifj + k, then cj < Q[x~]; otherwise wk[xk]
< ~k[Xkl*
2 V2lJL,. For every ZU~[X~]and Y~[x]] in H, if wk[xk] < Y~[x]], then j = k.
Property 2V2PL, says that every Read YJXJ either reads a certified version
or reads a version written by itself (i.e., Tk). Property 2V2PL, says that if Tk
wrote x before the scheduler received rk[x], then it translates Y~[x] into rh[xbj’.
2 v2 PL,. If Y~[xJ and w,[x,] are in H, then eitherf, < Y~[x,Jor Y~[x]] < fi.

Property 2V2PL, says that Y~[xJ is strictly ordered with respect to the certi-
fication operation of every transaction that writes x. This is because each
5.4 MULTIVERSION TWO PHASE LOCKING 159

transaction Ti that writes x must obtain a certify lock on X. For each transac-
tion Tk that reads x, either Ti must delay its certification until Tk has been
certified (if it has not already been so), or else Tk must wait for Ti to be certi-
fied before it can set its read lock on h: and therefore read X.
2 v2 PL,. For every Yk[Xj] and wi[xi] (i, j, and k distinct), if fi < rk[Xj],
thenf, < ,$.
Property 2V2PLj, combined with 2V2PL,, says that each Read rh[xj]
either reads a version written by TK or reads the most recently certified version
ofx.
2 V2PL,. For every rk[q] and w/i[xi], i # j and i f k, if Th[xj] < fi, then
fk < fi.
Property 2V2PL, says that a transaction T; that writes x cannot be
certified until all transactions that previously read a version of x have already
been certified. This follows from the fact that certify locks conflict with read
locks.
2 V2PL,. For every wi[xi] and wj[Xj] 3 either fi < fi or fj < fi,
Property 2V2PL, says that the certification of every two transactions that
write the same data item are atomic with respect to each other.

Theorem 5.6: Every history H produced by a 2V2PL scheduler is 1SR.


Proof: By 2V2PL,, 2V2PL, and 2V2PL,, H preserves reflexive reads-
from relationships and is recoverable, and therefore is an MV history.
Define a version order 4 by x, e Xj only ifh < ,$. By 2V2PL,, < is indeed
a version order. We will prove that all edges in MVSG(H, + ) are in certifi-
cation order. That is, if T; + Tj in MVSG(H, e), then fi < f,.
Let Ti -+ Tj be in SG(H). This edge corresponds to a reads-from rela-
tionship, such as Tj reads x from T;. By 2V2PLz, 5 < r,[xJ. By 2V2PL,,
Yj[Xl] < fj. Hence, h < h.
Consider a version order edge induced by wJx,], wj[xj], and Y/Jxj] (i, j,
and k distinct). There are two cases: xi 6 xj and Xj 6 xi. If xi < xj, then
the version order edge is Ti -+ Tj, andJ, < /: follows directly from the defi-
nition of 6. If Xj 4 xi, then the version order edge is Tk + Ti. Since xj +
xi, /f < fi. By 2V2PL,, either fi < Yk[xj] or rk[x,] < fi. In the former case,
2V2PL, implies fi < ,lj, contradicting fj < fi. Thus rk[xj] < fi, and by
2V2PL,, fb < fi as desired.
This proves that all edges in MVSG(H, <) are in certification order.
Since the certification order is embedded in a history, which is acyclic
by definition, MVSG(H, 4) is acyclic too. So, by Theorem 5.4, H is
1SR. 0
f60 CHAPTER 5 / MULTIVERSION CONCURRENCY CONTROL

5.5 A MULTIVERSION MIXED METHOD


As we have seen, multiversions give the scheduler more flexibility in scheduling
Reads. If the scheduler knows in advance which transactions will only issue
Reads (and no Writes), then it can get even more concurrency among transac-
tions. Recall from Section 1.1 that transactions that issue Reads but no Writes
are called queries, while those that issue Writes (and possibly Reads as well)
are called ~rpdat~rs. In this section we’ll describe an algorithm that uses MVTO
to synchronize queries and Strict 2PL to synchronize updaters,
When a transaction begins executing, it tells its TM whether it’s an
updater or a query. If it’s an updater, then the TM simply passes that fact to the
scheduler, which executes operations from that transaction using Stricr 2PL.
When the TM receives the updater’s Commit, indicating that the updater has ter-
minated, the TM assigns a timestamp to the updater, using the timestamp gen-
eration method of Section 4.5 for integrating Strict 2PL and TWR. This ensures
that updaters have timestamps that are consistent with their order in the SG.
Unlike Basic 2PL, in this method each Write produces a new version. The
scheduler tags each version with the timestamp of the transaction that wrote it.
The scheduler uses these timestamped versions to synchronize Reads from
queries using hlVT0.
When a Thl receives the first operation from a transaction that identified
itself as a query, it assigns to that query a timestamp smaller than that of any
committed or active updater (and therefore, of any future updater 3s well).
When 3 scheduler receives an r)[~] from a query T,, it finds the version of s
with the largest timestamp less than ts( T,). By the timestamp assignment rule,
this version was written by a committed transaction. Moreover, by the same
rule, assigning this version of x to r)[s] will not invalidate the Read at any time
in the future (so future Writes need never be rejected).
Note that a query does not set any locks. It is therefore never forced to
wait for updaters and never causes updaters to wait for it. The scheduler can
always process a query’s Read without delay.
In a centralized DBS, selecting the timestamp of a query is easy, because
active updaters are not assigned timestamps until they terminate. In a distrib-
uted DBS, TMs can ensure that each query has a sufficiently small timestamp
by deliberately selecting an especinll~ small timestamp. Suppose that the local
clocks at any two TMs are kno\vn to differ by at most 6. If a TM’s clock reads
t, then it is safe to assign a new query any timestamp less than t - 6. Any
updater that terminates after this point in real time will be assigned a time-
stamp of at least t - 6, so the problem of the previous paragraph cannot arise.
We can argue the correctness of this by using MVSGs as follows.’ Let H be
a history produced by the method. Define the version order for H as in

‘This pLlrngraph requires an understmding oi Section 5.2, on hluftiversion SeriAzabllity


Theory, a starred section.
5.5 A MULTIVERSION MIXED METHOD 161

MVTO: X, 4 x, iff ts( T,) < ts(T,). We show that MVSG(H, 6) is acyclic by
showing that for each of its edges T, -+ T,, ts(T;) < ts(Tj). First, consider an
edge T, + Tj in SG(H). Each such edge is due to a reads-from relationship. If
Tj is an updater, then by the way timestamps are assigned to updaters, ts( T;);
< ts(T]) (cf. Section 4.5): If TJ is a query, then by MVTO version selection,
ts( Ti) < ts( T,). Now consider a version order edge in lMVSG(H, 4 ) that arises
because Tj reads x from T; and Tk writes x (i,i, k distinct). If xk 4 x,, then we
have the edge Tk + T, in MVSG(H, 4) and ts(Tk) < ts(T,). Otherwise, we
have the edge Tj + Tk, SO we must show ts(Tj) < ts(Tk). If Tj is an updater,
then Tj released rIj[x] before Tk obtained wlk[x], so by the timestamp assign-
ment method, ts( T;) < ts( Tk). If Tj is a query, then it is assigned a timestamp
smaller than all active or future updaters. So again ts(T,) < ts(TJ. Thus, all
edges in MVSG(H, 4) are in timestamp order, and MVSG(H, 6) is acyclic.
By Theorem 5.4, H is 1SR.
To avoid running out of space, the scheduler must have a way of deleting
“old” versions. Any committed version may be eliminated as soon as the sched-
uler can be assured that no query will need to read that copy in the future. For
this reason, the scheduler maintains a non-decreasing value rnin, which is the
minimum timestamp that can be assigned to a query Whenever the scheduler
wants to release some space used by versions, it sets min to be the smallest
timestamp assigned to any active query. It can then discard a committed
version xi if ts( T;) < min and there is another committed version x1, such that
ts( Tj) < ts(Tj).
The main benefit of this method is that queries and updaters never delay
each other. A query can always read the data it wants without delay Although
updaters may delay each other, queries set no locks and therefore never delay
updaters. This is in sharp contrast to 2PL, where a query may set many locks
and thereby delay many updaters. This delay is also inherent in multiversion
2PL and 2V2PL, since an updater T, cannot commit until there are no read
locks held by other transactions on T{s writeset.
The main disadvantages of the method are that queries may read out-of-
date data and that the tagging and interpretation of timestamps on versions
may add significant scheduling overhead. Both problems can be mitigated by
using the methods described next.

Replacing Timestamps by Commit Lists


Tagging versions with timestamps may be costly because when a scheduler
processes w;[x] by creating a new version of X, it doesn’t know T/s timestamp.
Only after T, terminates can the scheduler learn T/S timestamp. However, by
this time, the version may already have been moved to disk; it needs to be
reread in order to be tagged, and then subsequently rewritten to disk.
‘.
One can avoid timestamps altogether by using instead a list of identifiers
of committed transactions, called the commit list. When a query begins execut-
162 CHAPTER 5 / MULTIVERSION CONCURRENCY CONTROL

ing, the TM makes a copy of the commit list and associates it with the query. It
attaches the commit list to every Read that it sends to the scheduler, essentially
treating the list like a timestamp. When the scheduler receives r,[x] for a query
T,, it finds the most recently committed version of x whose tag is in T,‘s copy of
the commit list. To do this efficiently, all versions of a data item are kept in a
linked list, from newest to oldest. That is, whenever a new version is created, it
is added to the top of the version list. Since updaters use Strict 2PL, two trans-
actions may not concurrently create new versions of the same data item. Thus,
the order of a data item’s versions (and hence the version list) is well defined.
Given this organization for versions, to process r,[x] for a query T,, the
scheduler scans the version list of x until it finds a version written by a transac-
tion that appears in the commit list associated with T,. This is just like reading
the most recently committed version of x whose timestamp is less than ts( TI) (if
T, had a timestamp). This technique is used in DBS products by Prime
Computer, and in the Adaplex DBS by Computer Corporation of America.
The problem with this scheme is the size and structure of commit lists.
First, each list must be small. In a centralized system, every query will have a
copy of the list consuming main memory. In a distributed system, every Read
sent to a DM will have a copy of the list, which consumes communication
bandwidth. Second, since the scheduler must search the list on every Read
from a query, the list should be structured to make it easy to determine whether
a given transaction identifier is in the list.
A good way to accomplish these goals is to store the commit list as a bit
map. That is, the commit list is an array, CL, where CL[I’] = 1 if T, is commit-
ted; otherwise CL[I’] = 0. Using the bit map, the scheduler can easily tell
whether a version’s tag is in the list. It simply looks up the appropriate position
in the array. However, as time goes on, the list grows without limit. So we need
a way to keep the list small.
We can shorten the list by observing that old transaction identifiers eventu-
ally become useless. A transaction identifier is only needed as long as there is a
version whose tag is that identifier. Suppose we know that all versions whose
tags are less than n (where n is a transaction identifier) have either been
committed or discarded before all active queries began. Then when the sched-
uler reads a version whose tag is less than n, it may assume that n is in the
commit list. Only transactions whose identifiers are greater than or equal to n
need to be kept in the list.
The commit list can be kept short as follows. When the list has exceeded a
certain size, the scheduler asks the TM for a transaction identifier, n, that is
smaller than that which has been assigned to any active query or updater, or
will be assigned to any future query or updater. The scheduler can then discard
the prefix of the commit list through transaction identifier n! thereby shorten-
ing the list. To process rl[x] of some query T,, the scheduler returns the first
version in the version list of x written by a transaction whose identifier is either
in, or smaller than any identifier in, the commit lisr given to T, when it started.
5.5 A MULTIVERSION MIXED METHOD 163

We are assuming here, as always, that when a transaction aborts, all versions it
has produced are removed from the version lists.
When the scheduler receives n from the TM for the purpose of reducing
the size of the commit list, it can also garbage collect versions. In particular, it
can discard a committed version of x, provided there is a more recent commit-
ted version of x whose identifier is less than n.

Distributed Commit Lists


In a distributed DBS, using a commit list in place of timestamps requires
special care, because the commit lists maintained at different sites may not be
instantaneously identical. For example, suppose an updater T, commits at site
A, where it updated x, and is added to CLA, the commit list at site A; but
suppose T, has not yet committed at site B, where it updated y. Next, suppose
an updater T, starts executing at A, reads the version of h: written by T,, writes
a new version of z at site B, and commits, thereby adding its transaction identi-
fier to CL,J and CLB. (T, still hasn’t committed at site B). Now suppose a query
starts executing at site B, reads CLB (which contains T, but not T,), and reads y
and z at site B. It will read the version of z produced by T, (which read h: from
T,) but not the version of y produced by T,. The result is not 1SR.
We can avoid this problem by ensuring that whenever a commit list at a
site contains a transaction Ti, then it also contains all transactions from which
Ti read a data item (at the same site or any other site). To do this, before an
updater transaction Tj commits, it reads the commit lists at all sites where it
read data items and takes the union of those commit lists along with { Tj},
producing a temporary commit list CLtemp. Then, instead of merely adding Tj
to the commit list at every site where it wrote, it unions CLremp into those
commit lists. Using this method in the example of the previous paragraph, T,
would read CLA, which includes T,, and would union it into CLB. The query
that reads CLB now reads T,‘s version of y, as required to be 1SR.
Using this method, each query reads a database that was effectively
produced by a serial execution of updaters. However, executions may not be
1SR in the sense that two different queries may see mutually inconsistent
views. For example, suppose TM, and TM, supervise the execution of queries
T, and TX, respectively, both of which read data items x and y stored, respec-
tively, at sites A and B. Consider now the following sequence of events:
1. TM, reads CLA.
2. TM, reads CLB.
3. T, writes x at site A and commits, thereby adding T3 to CLA.
4. T4 writes y at site B and commits, thereby adding T, to CLB.
5. TM, reads CLB.
6. TM, reads CLA.
164 CHAPTER 5 I MULTIVERSION CONCURRENCY CONTROL

Now, T, reads a database state that includes T4’s Write on 3~but not T,‘s
Write on X, while T, reads a database state that includes T,‘s Write on x but not
T4's Write on 1: Thus, from T,‘s viewpoint, transactions executed in the order
T, T, T,, but from TL's vieit-point, transactions executed in the order T, TL T,,.
There is no serial 1V history including all four transactions that is equivalent
to this execution. Yet, the execution consisting only of updaters is lSR, and in
a sense, each query reads consistent data. We leave the proof of these proper-
ties as an exercise (see Exercise 5.22).

BIBLIOGRAPHIC NOTES
The serializability theoretic model of multiversion concurrency control is from
[Bernstein, Goodman 831. Other theoretical aspects are explored in [Hadzilacos,
Papadimitriou 8.51,[Ibaraki, Kameda 831, [Lausen 831, and [Papadimitriou, Kanellakis
841. The two version 2PL algorithm in Section 5.3 is similar to that of [Stearns,
Rosenkrantz 811, which uses timestamp-based deadlock prevention. A similar method
that uses SGT certification for rw synchronization is described in [Bayer et al. 801 and
[Bayer, Heller, Reiser 801. A multiversion tree locking algorithm appears in [Silber-
schatz,821. Multiversion TO was introduced in [Reed 781, [Reed 791, and [Reed 831.
hlultiversion mixed methods like those in Section 5.5 are described in [Bernsrein,
Goodman 811, [Chan et al. 821, [Ch,an, Gray 8.51,[Dubourdieu 821, and [Weihl 851.
[Dubourdieu 821 describes a method used in a product of Prime Computer. [Lai,
Wilkinson 841 describes a multiversion 2PL certifier, where queries are never delayed,
and each updater I, is certified by checking its readset and writeset against the writeset
of all transactions that committed after T, starts.

EXERCISES
5.1” Consider the following history:

a. Prove that this satisfies the definition of MV history.


b. Is this history serializable?
c. Is it one-copy serializable? If so, give a version order that produces an
acyclic MVSG.
d. Suppose we add the operation ra[yi] (where xj3[yj] < r4[y3]) to the
history. Answer (c) for this new history.
5.2* Give a careful proof of the fact that if H is an MV history then C(H) is
a complete MV history. Suppose in the definition of MV histories we
5.5 A MULTIVERSION MIXED METHOD 165

required only conditions (1) - (4), but not recoverability. Prove that in that
case, C(H) would not necessarily be a complete MV history. (Incidentally,
this is the reason for making recoverability part of the definition of MV
histories, whereas in 1V serializability theory we treated recoverability as
a property that some histories have and others do not.)
5.3” Prove Proposition 5.2.
5.4” Prove that if H is a 1SR MV history, then so is any prefix of H.
Lj. 5 :5 Suppose no transaction ever reads a data item that it previously wrote.
Then we can redefine MV history, such that it need not preserve reflexive
reads-from relationships (since they cannot exist). Using this revised defi-
nition prove Theorem 5.3, making as many simplifications as possible.
5.6 MVTO can reject transactions whose Writes arrive too late. Design a
conservative MVTO scheduler that never rejects Reads or Writes. Prove it
correct. To show why your conservative MVTO is not worse than single
version conservative TO, characterize those situations in which the latter
will delay operations while the former will not. Are there situations where
the opposite is true?
5.7 In MVTO, suppose that we store timestamp intervals in the data
items themselves rather than in a separate table. For example, suppose the
granularity of data items is a fixed size page and that each page has a
header containing timestamp interval information. How does this organi-
zation affect the efficiency with which the MVTO scheduler processes
operations? How does it affect the way the scheduler garbage collects old
versions?
5.8 Since MVTO doesn’t use locks, we need to add a mechanism for
preventing transactions from reading uncommitted data and thereby
avoiding cascading aborts. Propose such a mechanism. How much
concurrency do you lose through this mechanism? Compare the amount of
concurrency you get with the one you proposed for Exercise 5.6.
5.9 Show that there does or does not exist a sequence of Reads and Writes
in which
a. Basic TO rejects an operation and MVTO does not;
b. Basic TO delays an operation and MVTO does not;
c. MVTO rejects an operation and Basic TO does not; and
d. MVTO delays an operation and Basic TO does not.
That is, for each situation, either give an example sequence with the
desired property, or prove that such a sequence does not exist.
5.10 Modify MVTO so that it correctly handles transactions that write into
a data item more than once.
5.11 Describe the precise conditions under which MVTO can safely discard
a version without affecting any future transaction.
166 CHAPTER 5 / MULTIVEIWON CONCURRENCY CONTROL

5.12 It is incorrect to use MVTO for rw synchronization and TWR for ww


synchronization. Explain why.
5.13 Assume no transaction ever reads a data item that it previously wrote.
Consider the following variation of standard 2PL, called 2P.L with delayed
writes. Each TM holds all writes used by a transaction until the transac-
tion terminates. It then sends all those held Writes to the appropriate
DMs. DMs use standard 2PL. Compare the behavior of 2V2PL to 2PL
with delayed writes.
5.14” Let H be the set of all 1V histories equivalent to the MV histories
produced by 2V2PL. Is H identical to the set of histories produced by 2PL?
Prove your answer.
5.15 Suppose we modify multiversion 2PL as follows. As in Section 5.5, we
distinguish queries from updaters. Updaters set certify locks in the usual
way. Queries set no (read) locks. To read a data item X, a query reads the
most recently certified version of x. Does this algorithm produce 1SR
executions? If so, prove it. If not, give a counterexample.
5.16 Suppose no transaction ever reads a data item that it previously wrote.
Use this knowledge to simplify the 2V2PL algorithm. Does your simplifi-
cation improve performance?
5.17 Show how to integrate timestamp-based deadlock prevention into
2V2PL. If most write Iocks will eventually be converted into certify locks
(i.e., if very few transactions spontaneously abort), is it better to perform
the deadlock prevention early using write locks or later using certify locks?
5.18* Prove the correctness of the extension to 2V2PL that uses more than
two versions, described at the end of Section 5.4.
5.19 Compare the behavior of the multiple version extension to 2V2PL to
standard 2V2PL. How would you expect them to differ in the number of
delays and aborts they induce?
5.20* Prove that the mixed method of Strict 2PL and “MVTO” that uses
commit lists for queries in a centralized DBS (in Section 5.5) is correct.
5.21 Consider the distributed Strict 2PL and “MVTO” mixed method in
Section 5.5 that uses commit lists for queries. The method only guarantees
that any execution of updaters is lSR, and that each query reads consistent
data. Propose a modification to the algorithm that ensures that queries do
not read mutually inconsistent data; that is, any execution of updaters and
queries is 1SR. Compare the cost of your method to the cost of the one in
the chapter.
5.22, Prove that the distributed Strict 2PL and “MVTO” mixed method in
Section 5.5 that uses commit lists for queries is correct, in the sense that
any execution of updaters and one query is 1SR.
5.23 Design a multiversion concurrency controi algorithm that uses SGT
certification for rw synchronization and 2PL for ww synchronization.
Prove that your algorithm is correct.

You might also like