Chapter 5
Chapter 5
5
MULTIVERSION
CONCURRENCYCONTROL
5.1 INTRODUCTION
In a multiversion concurrency control algorithm, each Write on a data item x
produces a new copy (or version) of X. The DM that manages x therefore
keeps a list of versions of X, which is the history of values that the DM has
assigned to X. For each Read(x), the scheduler not only decides when to send
the Read to the DM, but it also tells the DM which one of the versions of x to
read.
The benefit of multiple versions for concurrency control is to help the
scheduler avoid rejecting operations that arrive too late. For example, the
scheduler normally rejects a Read because the value it was supposed to read
has already been overwritten. With multiversions, such old values are never
overwritten and are therefore always available to tardy Reads. The scheduler
can avoid rejecting the Read simply by having the Read read an old version.’
Maintaining multiple versions may not add much to the cost of concur-
rency control, because the versions may be needed anyway by the recovery
algorithm. As we’ll see in the next chapter, many recovery algorithms have to
maintain some before image information, at least of those data items that have
been updated by active transactions; the recovery algorithm needs those before
images in case any of the active transactions abort. The before images of a data
item are exactly its list of old versions. It is a small step for the DM to make
those versions explicitly available to the scheduler.
An obvious cost of maintaining multiple versions is storage space. To
control this storage requirement, versions must periodically be purged or
143
144 CHAPTER 5 I MULTIVERSION CONCURRENCY CONTROL
Analyzing Correctness
To analyze the correctness of multiversion concurrency control algorithms, we
need to extend serializability theory. This extension requires two types of
histories: multiversion (MV) histories that represent the DM’s execution of
operations on a multiversion database, and single version (IV) histories that
represent the interpretation of MV histories in the users’ single version view of
the database. Serial 1V histories are the histories that the user regards as
correct. But the system actually produces MV histories. So, to prove that a
concurrency control aIgorithm is correct, we must prove that each of the MV
histories that it can produce is equivalent to a serial 1V history,
What does it mean for an MV history to be equivalent to a 1V history?
Let’s try to answer this by extending the definition of equivalence of 1V histo-
ries that we used in Chapters 2-4. To attempt this extension, we need a little
5.1 INTRODUCTION 145
notation. For each data item X, we denote the versions of x by xi, xj, . . . , where
the subscript is the index of the transaction that wrote the version. Thus, each
Write in an MV history is always of the form Wi[Xi], where the version
subscript equals the transaction subscript. Reads are denoted in the usual way,
such as ri[xj].
Suppose we adopt a definition of equivalence that says an MV history
HM” is equivalent to a 1V history HIV if every pair of conflicting operations in
Hbp, is in the same order in HIV. Consider the MV history
H, = wobolcoWEA cl rz[xolw,[yzl cz.
The only two operations in H, that conflict are w,[x,] and r,[x,]. The operation
w,[x,] does not conflict with either w,[x,] or r,[x,], because it operates on a
different version of x than those operations, namely x,. Now consider the 1V
history
Hz = wo[xl co4x1 ~1~z[xlw[yl cz.
We constructed H, by mapping each operation on versions x0, x,, and yz in H,
into the same operation on the corresponding data items x and y. Notice that
the two operations in H, that conflict, w,[x,] and r,[x,], are in the same order
in H, as in H,. So, according to the definition of equivalence just given, H, is
equivalent to H,. But this is not reasonable. In H,, T, reads x from T,, whereas
in H,, T, reads x from T,,.’ Since T2 reads a different value of x in H, and H,, it
may write a different value in y.
This definition of equivalence based on conflicts runs into trouble because
MVand 1V histories have slightly different operations - version operations
versus data item operations. These operations have different conflict proper-
ties. For example, w,[x,] does not conflict with yz[xo], but their corresponding
1V operations w,[x] and TJX] do conflict. Therefore, a definition of equiva-
lence based on conflicts is inappropriate.
To solve this problem, we need to return to first principles by adopting the
more fundamental definition of view equivalence developed in Section 2.6.
Recall that two histories are view equivalent if they have the same reads-from
relationships and the same final writes. Comparing histories H, and H,, we see
that T, reads x from T, in H,, but T, reads x from T, in H,. Thus, H, is not
view equivalent to H2.
Now that we have a satisfactory definition of equivalence, we need a way
of showing that every MV history H produced by a given multiversion concur-
rency control algorithm is equivalent to a serial 1V history. One way would be
to show that SG( H) is acyclic, so H is equivalent to a serial MV history, Unfor-
tunately, this doesn’t help much, because not every serial MV history is equiva-
lent to a serial 1V history. For example, consider the serial MV history
‘Recall from Section 2.4 that T; reads xfrom Tj in H if (1) tuj[x] < TJx], (2) aj Q ri[x], and (3)
if there is some wk[x] such that Wj[x] < wk[x] < wi[x], then ak < r;[x].
146 CHAPTER 5 I MULTIVERSION CONCURRENCY CONTROL
We can show that I-I, is not view equivalent to H, by showing that they do not
have the same reads-from relationships. In H,, TL reads x and y from T,. But in
H,, T2 reads x from T,, and reads 3’ from T,. Since T, reads different values in
H, and H,, the two histories are not equivalent. Similarly, H, is not equivalent
to the 1V history
Clearly, H, is not equivalent to any 1V serial history over the same set of trans-
actions.
Only a subset of serial MV histories, called l-serial MV histories, are
equivalent to serial 1V histories. Intuitively, a serial MV history is I-serial if
for each reads-from relationship, say T, reads x from T,, T, is the last trdnsac-
tion preceding T, that writes any version of x. Notice that Ii, is not l-serial
because TL reads x from T,), not T,, which is the last transaction preceding T2
that writes x.
All l-serial MV histories are equivalent to serial 1V histories, so we can
define l-serial histories to be correct. To prove that a multiversion concurrency
control algorithm is correct, we must show that its MV histories are equivalent
to l-serial MV histories. We will do this by defining a new graph structure
called a multiversion serialization graph (MVSG). An MV history is equivalent
to a l-serial MV history iff it has an acyclic MVSG. Now proving multiversion
concurrency control algorithms correct is just like standard serializability
theory. We simply prove that its histories have acyclic MVSGs. We now
proceed with a formal development of this line of proof.
‘This section requires reading Section 2.6 as a prerequisite. We recommend skipping this and
other starred secrions of this chdpter on the first reading, to gain some intuition for mulriver-
sion algorithms before studying their serializability theory.
5.2 ‘MULTIVERSION SERIALIZABILITY THEORY 147
tion by a function h that maps each wi[x] into w;[x;], each ri[x] into ri[q] for
some j, each c; into c;, and each a; into ai.
A complete multiversion (ML’) history H over T is a partial order with
ordering relation < where
1. H = h( U:‘J;) for some translation function h;
2. for each Ti and all operations pi, 4; in Ti, if pi <i qi, then h(pJ < h(qJ;
3. if h(rj[x]) = rj[xiJ, then w;[xJ < rj[Xi];
4. if w;[x] <i ri[x], then h(rJx]) = r;[xJ; and
5. if h(r$x]) = rj[Xi], i # j, and cj E H, then C; < cj.
Condition (1) states that the scheduler translates each operation submitted
by a transaction into an appropriate multiversion operation. Condition (2)
states that the MV history preserves all orderings stipulated by transactions.
Condition (3) states that a transaction may not read a version until it has been
produced.3 Condition (4) states that if a transaction writes into a data item x
before it reads X, then it must read the version of x that it previously created,
This ensures that His consistent with the implied semantics of the transactions
over which it is defined. If H satisfies condition (4), we say that it preserves
reflexive reads-from relationships. Condition (5) says that before a transaction
commits, all the transactions that produced versions it read must have already
committed. If H satisfies this condition we say it is recoverable.
An M V history H is a prefix of a complete MV history. We say that an MV
history preserves reflexive reads-from relationships (or is recoverable) if it is
the prefix of a complete MV history that does so. As in 1V histories, a transac-
tion Ti is committed (respectively aborted) in an MV history H if ci (respec-
tively ai) is in H. Also, the committed projection of an MV history H, denoted
C(H), is defined as for 1V histories; that is, C(H) is obtained by removing from
H the operations of all but the committed trans,actions. It is easy to check that
if H is an MV history then C(H) is a complete MV history, i.e., C(H) satisfies
conditions (1) - (5) (see Exercise 5.2).
For example, given transactions {TO, T,, T2, T,, T4}$
%[Xl \ / %[)'I\
To = wl[Yl 2 co T3 = 73[21
wlkl b w,[zl PC3
r&l -
T, = rl[x] - WSYI - Cl T4 = r&l - 6
y4[zl -
721~1 \
T2 = w,[xl - 6
7Szl j
‘To ensure condition (3), we will normally include in our examples an initializing transaction,
To, that creates the initial version of eacli data item.
148 CHAPTER 5 / MULTIVERSION CONCURRENCY CONTROL
the following history, H,, is a complete MV history over ET,,, T,, T,, T,, T4}.
MV History Equivalence
Two 1V histories over the same transactions are uiew equivalent if they contain
the same operations, have the same reads-from reIationships, and the same
final writes. However, for MV histories, we can safely drop “and the same
final writes” from the definition. If two histories are over the same transac-
tions, then they have the same Writes. Since no versions are overwritten, all
Writes are effectively final writes. Thus, if two MV histories over the same
transactions have the same operations and the same reads-from relationships,
then they have the same final writes and are therefore view equivalent.
To formalize the definition of equivalence, we must formalize the notion of
reads-from in A4V histories. To do this, we replace the notion of data item by
version in the ordinary definition of reads-from for 1V histories. Transaction
r, reads x from T, in MV history H if T, reads the version of x produced by T,.
Since the version of x produced by XC,is x,, T, reads x from T, in H iff T, reads
x,, that is, iff r,[xJ E H.
Two h4V histories over a set of transactions Tare equivalent, denoted s,
if they have the same operations and the same reads-from relationships. In
view of the preceding discussion, having the same reads-from relationships
amounts to having the same Read operations. Therefore, equivalence of MV
histories reduces to a trivial condition, as stated in the following proposition.
Serialization Graphs
Two operations in an MV history conflict if they operate on the same version
and one is a Write. Only one pattern of conflict is possible in an MV history: if
pi < qj and these operations conflict, then pi is w;[x,] and qj is r$x;] for some
data item X. Conflicts of the form wf[xi] < wj[xi] are impossible, because each
Write produces a unique new version. Conflicts of the form Yj[xi] < wi[x,] are
impossible, because Tj cannot read xi until it has been produced. Thus, all
conflicts in an MV history correspond to reads-from relationships.
The serialization graph for an MV history is defined as for a 1V history.
But since only one kind of conflict is possible in an MV history, SGs are quite
simple. Let H be an MV history. SG(H) has nodes for the committed transac-
tion in H and edges T; + Tj (i #:j) whenever for some X, Tj reads x from Ti.
That is, T; + Tj is present iff for some X, rj[xf] (i # j) is an operation of C(H).
This gives us the following proposition.
SG(H,) =
/T2\
T, -T3- T.,
bl-1
‘%vo 1V histories can be equivalent in this sensewithout being view equivalent to each other,
because they don’t have the same final writes.
150 CHAPTER 5 / MULTIVERSION CONCURRENCY CONTROL
“It turns out that this is a prefix commit-closed property. Unlike view serializability, we need
not require that the committed projection of every prefix of an MV history be equivalent to a
l-serial MV history. This follows from the fact that the committed projection of the history
itself is equivalent to a l-serial MV history (see Exercise 5.4).
5.2 ‘MULTIVERSION SERIALIZABILITY THEORY 151
The version order edges that are in MVSG(H,, e) but not in SG(H) are T, --f
TL, T, + T,, and TL -+ T,. Except for T,, --) T,, all edges in SG(H) are also
version order edges.
Given an MV history H, suppose SG(H) is acyclic. We know that a serial
MV history H, obtained by topologically sorting SG(H) may not be equivalent
to any serial 1V history. The reason is that some of HS’s reads-from relation-
ships may be changed by mapping version operations into data item opera-
tions. The purpose of version order edges is to detect when this happens. If
T~[x,] and w,[x,] are in C(H), then the version order edge forces UJ~[X,]to either
precede wi[x,] or follow Y~[x,] in H,. That way, when operations on x, and xj are
mapped to operations on x when changing H, to a IV history, the reads-from
relationship is undisturbed. This ensures that we can map H, into an equiva-
lent 1V history, Of course, all of this is possible onIy if SG(H) is stiI1 acyclic
after adding version order edges. This observation leads us to the following
theorem, which is our main tool for analyzing multiversion concurrency
control algorithms.
‘Recali that the nodes of X(H) and. therefore, of h4VSG(H, 9) are the ~orrmi~trd UJ~SAC-
lions in H.
5.3 MULTIVERSION TIMESTAMP ORDERING 153
where ts( T,) = i for 0 5 i I 4. Suppose the system deleted x2 but not x,. If
r3[xJ now arrives, the scheduler will incorrectly translate it into r3[xoJ . Suppose
instead that the system deleted x0. If r,[xJ now arrives, the scheduler will find
no version whose interval has a wts < ts( T,). This condition indicates that the
DBS has deleted the version that T,[x] has to read, so the scheduler must reject
r,[xJ*
5.3 MULTIVERSION TIMESTAMP ORDERING 155
‘Proof of Correctness
To prove MVTO correct, we must describe it in serializability theory As usual,
we do this by inferring properties that all histories produced by MVTO will
satisfy. Using these properties as our formal specification of the algorithm, we
prove that all histories produced by MVTO have an acyclic MVSG and hence
are 1SR.
The following properties describe the essential characteristics of every
MVTO history H over (T,, . . . T,}.
MVTO,. For each T;, there is a unique timestamp ts( T;); that is, ts( Tj) =
ts( T1) iff i = j.
M VTO,. For every rk[$ E H, wj[xjJ < rk[xjJ and ts(Tj) I ts(Tk).
M VTO,. For every rk[X]J and w;[xiJ E H, i#i, either (a) ts( Ti) < ts( Tj) or
(b) ts( Tk) < ts( Ti) or (c) i = k and Y~[x~] < wi[xiJ.
M VTO,. If rj[xf] E H, i+i, and cj E H, then C; < cj.
Property MVTO, says that transactions have unique timestamps. Property
MVTO, says that each transaction Tk only reads versions with timestamps
smaller than ts( Tk). Property MVTO, states that when the scheduler processes
Q[x~], xj is the version of x with the largest timestamp less than or equal to
ts( Tk). Moreover, if the scheduler later receives wi[xi], it will reject it if ts( Tj) <
ts( T,) < ts( Tk). MVTO, states that H is recoverable.
These conditions ensure that H preserves reflexive reads-from relation-
ships. To see this, suppose not, that is, wk[xkJ < rk[xj] andj # k. By MVTO,
andi # k, ts( Tj) < ts( Tk). By MVTO,, either (a) tS( Tk) < ts( Tj), (b) tS( Tk) <
ts( Tk), or (c) Yk[XjJ< wk[xk]. All three cases are impossible, a contradiction.
We now prove that any history satisfying these properties is 1SR. In other
words, MVTO is a correct scheduler.
Theorem 5.5: Every history H produced by MVTO is 1SR.
Proof: Define a version order as follows: xi < Xj iff ts( T;) < tS(Tj). We
now prove that MVSG(H, 4) is acyclic by showing that for every edge
Ti + Tj in MVSG(H, 6), ts( Ti) < ts(Tj).
Suppose T; + Tj is an edge of SG(H). This edge corresponds to a reads-
from relationship. That is, for some X, Tj reads x from T;. By MVTO,,
ts( Ti) 5 ts( Tj). By MVTO,, ts( T;) # ts(Tj). SO,ts( Ti) < ts( Tj) as desired.
Let rk[xj] and wi[xi] be in H where i, j, and k are distinct, and consider
the version order edge that they generate. There are two cases: (1) xi 4 Xj,
which implies Ti + Tj is in MVSG(H, < ); and (2) xj 4 xi, which implies
Tk -+ T; is in MVSG(H, + ). In case (l), by definition of -%, ts( Ti) <
ts( Tj). In case (2), by MVTO,, either ts( Ti) < ts( Tj) or ts( Tk) < ts( Ti). The
first option is impossible, because Xj % xi implies ts( Tj) < ts( T;). SO,ts( Tk)
< ts( Ti) as desired.
156 CHAPTER 5 / MULTIVERSION CONCURRENCY CONTROL
-This firs especially \vell with the shadow page recovery techniques used, for example. in the
no-undo/no-redo algorithm of Section 6.7.
5.4 MULTIVERSION TWO PHASE LOCKING 157
Read Y Y n
Write Y n n
Certify n n n
FIGURE 5-i
Compatibility Matrix for Two Version 2PL
transaction already owns a certify lock on X. If Tj already owns wli[.u] and has
therefore written xi, then the scheduler translates Y;[x] into YJxJ, which it sends
to the DM. Otherwise, it waits until it can set a read lock, and then sets the
lock, translates YJX] into rj[xj], where xj is the most recently (and therefore
only) committed version of X, and sends ri[xj] to the DM. Note that since only
committed versions may be read (except for versions produced by the reader
itself), the scheduler avoids cascading aborts and, a fovtiori, ensures that the
MV histories it produces are recoverable.
When the scheduler receives a Commit, ci, indicating that T; has termi-
nated, it attempts to convert T/s write locks into certify locks. Since certify
locks conflict with read locks, the scheduler can only do this lock conversion
on those data items that have no read locks owned by other transactions. On
those data items where such read locks exist, the lock conversion is delayed
until all read locks are released. Thus, the effect of certify locks is to delay T,‘s
commitment until there are no active readers of data items it is about to over-
write.
Note that lock conversions can lead to deadlock just as in standard 2PL.
For example, suppose T; has a read lock on x and Tj has a write lock on X. If Ti
tries to convert its read lock to a write lock and Tj tries to convert its write lock
to a certify lock, then the transactions are deadlocked. We can use any dead-
lock detection or prevention technique: cycle detection in a WFG, timestamp-
based prevention, etc.
Since a transaction may deadlock while trying to convert its write locks, it
may be aborted during this activity. Therefore, it must not release its locks or
be committed until it has obtained all of its certify locks.
Certify locks in 2V2PL behave much like write locks in ordinary 2PL.
Since the time to certify a transaction is usually much less than the total time to
execute it, 2V2PL’s certify locks delay Reads for less time than 2PL’s write
locks delay Reads. However, since existing read locks delay a transaction’s
certification in 2V2PL, the improved concurrency of Reads comes at the
expense of delaying the certification and therefore the commitment of update
transactions.
158 CHAPTER 5 / MULTIVERSION CONCURRENCY CONTROL
*Correctness of 2V2PL
To list the properties of histories produced by executions of 2V2PL, we need to
include the operationf,, denoting the certification of T,.
Let H be a history over {T,), . . . T,,) produced by 2V2PL. Then H must
satisfy the following properties.
2 V2PL,. For every T,,fi follows all of Ti’s Reads and Writes and precedes
T/s commitment.
2 V2PLL. For every rk[x,] in H, ifj + k, then cj < Q[x~]; otherwise wk[xk]
< ~k[Xkl*
2 V2lJL,. For every ZU~[X~]and Y~[x]] in H, if wk[xk] < Y~[x]], then j = k.
Property 2V2PL, says that every Read YJXJ either reads a certified version
or reads a version written by itself (i.e., Tk). Property 2V2PL, says that if Tk
wrote x before the scheduler received rk[x], then it translates Y~[x] into rh[xbj’.
2 v2 PL,. If Y~[xJ and w,[x,] are in H, then eitherf, < Y~[x,Jor Y~[x]] < fi.
Property 2V2PL, says that Y~[xJ is strictly ordered with respect to the certi-
fication operation of every transaction that writes x. This is because each
5.4 MULTIVERSION TWO PHASE LOCKING 159
transaction Ti that writes x must obtain a certify lock on X. For each transac-
tion Tk that reads x, either Ti must delay its certification until Tk has been
certified (if it has not already been so), or else Tk must wait for Ti to be certi-
fied before it can set its read lock on h: and therefore read X.
2 v2 PL,. For every Yk[Xj] and wi[xi] (i, j, and k distinct), if fi < rk[Xj],
thenf, < ,$.
Property 2V2PLj, combined with 2V2PL,, says that each Read rh[xj]
either reads a version written by TK or reads the most recently certified version
ofx.
2 V2PL,. For every rk[q] and w/i[xi], i # j and i f k, if Th[xj] < fi, then
fk < fi.
Property 2V2PL, says that a transaction T; that writes x cannot be
certified until all transactions that previously read a version of x have already
been certified. This follows from the fact that certify locks conflict with read
locks.
2 V2PL,. For every wi[xi] and wj[Xj] 3 either fi < fi or fj < fi,
Property 2V2PL, says that the certification of every two transactions that
write the same data item are atomic with respect to each other.
MVTO: X, 4 x, iff ts( T,) < ts(T,). We show that MVSG(H, 6) is acyclic by
showing that for each of its edges T, -+ T,, ts(T;) < ts(Tj). First, consider an
edge T, + Tj in SG(H). Each such edge is due to a reads-from relationship. If
Tj is an updater, then by the way timestamps are assigned to updaters, ts( T;);
< ts(T]) (cf. Section 4.5): If TJ is a query, then by MVTO version selection,
ts( Ti) < ts( T,). Now consider a version order edge in lMVSG(H, 4 ) that arises
because Tj reads x from T; and Tk writes x (i,i, k distinct). If xk 4 x,, then we
have the edge Tk + T, in MVSG(H, 4) and ts(Tk) < ts(T,). Otherwise, we
have the edge Tj + Tk, SO we must show ts(Tj) < ts(Tk). If Tj is an updater,
then Tj released rIj[x] before Tk obtained wlk[x], so by the timestamp assign-
ment method, ts( T;) < ts( Tk). If Tj is a query, then it is assigned a timestamp
smaller than all active or future updaters. So again ts(T,) < ts(TJ. Thus, all
edges in MVSG(H, 4) are in timestamp order, and MVSG(H, 6) is acyclic.
By Theorem 5.4, H is 1SR.
To avoid running out of space, the scheduler must have a way of deleting
“old” versions. Any committed version may be eliminated as soon as the sched-
uler can be assured that no query will need to read that copy in the future. For
this reason, the scheduler maintains a non-decreasing value rnin, which is the
minimum timestamp that can be assigned to a query Whenever the scheduler
wants to release some space used by versions, it sets min to be the smallest
timestamp assigned to any active query. It can then discard a committed
version xi if ts( T;) < min and there is another committed version x1, such that
ts( Tj) < ts(Tj).
The main benefit of this method is that queries and updaters never delay
each other. A query can always read the data it wants without delay Although
updaters may delay each other, queries set no locks and therefore never delay
updaters. This is in sharp contrast to 2PL, where a query may set many locks
and thereby delay many updaters. This delay is also inherent in multiversion
2PL and 2V2PL, since an updater T, cannot commit until there are no read
locks held by other transactions on T{s writeset.
The main disadvantages of the method are that queries may read out-of-
date data and that the tagging and interpretation of timestamps on versions
may add significant scheduling overhead. Both problems can be mitigated by
using the methods described next.
ing, the TM makes a copy of the commit list and associates it with the query. It
attaches the commit list to every Read that it sends to the scheduler, essentially
treating the list like a timestamp. When the scheduler receives r,[x] for a query
T,, it finds the most recently committed version of x whose tag is in T,‘s copy of
the commit list. To do this efficiently, all versions of a data item are kept in a
linked list, from newest to oldest. That is, whenever a new version is created, it
is added to the top of the version list. Since updaters use Strict 2PL, two trans-
actions may not concurrently create new versions of the same data item. Thus,
the order of a data item’s versions (and hence the version list) is well defined.
Given this organization for versions, to process r,[x] for a query T,, the
scheduler scans the version list of x until it finds a version written by a transac-
tion that appears in the commit list associated with T,. This is just like reading
the most recently committed version of x whose timestamp is less than ts( TI) (if
T, had a timestamp). This technique is used in DBS products by Prime
Computer, and in the Adaplex DBS by Computer Corporation of America.
The problem with this scheme is the size and structure of commit lists.
First, each list must be small. In a centralized system, every query will have a
copy of the list consuming main memory. In a distributed system, every Read
sent to a DM will have a copy of the list, which consumes communication
bandwidth. Second, since the scheduler must search the list on every Read
from a query, the list should be structured to make it easy to determine whether
a given transaction identifier is in the list.
A good way to accomplish these goals is to store the commit list as a bit
map. That is, the commit list is an array, CL, where CL[I’] = 1 if T, is commit-
ted; otherwise CL[I’] = 0. Using the bit map, the scheduler can easily tell
whether a version’s tag is in the list. It simply looks up the appropriate position
in the array. However, as time goes on, the list grows without limit. So we need
a way to keep the list small.
We can shorten the list by observing that old transaction identifiers eventu-
ally become useless. A transaction identifier is only needed as long as there is a
version whose tag is that identifier. Suppose we know that all versions whose
tags are less than n (where n is a transaction identifier) have either been
committed or discarded before all active queries began. Then when the sched-
uler reads a version whose tag is less than n, it may assume that n is in the
commit list. Only transactions whose identifiers are greater than or equal to n
need to be kept in the list.
The commit list can be kept short as follows. When the list has exceeded a
certain size, the scheduler asks the TM for a transaction identifier, n, that is
smaller than that which has been assigned to any active query or updater, or
will be assigned to any future query or updater. The scheduler can then discard
the prefix of the commit list through transaction identifier n! thereby shorten-
ing the list. To process rl[x] of some query T,, the scheduler returns the first
version in the version list of x written by a transaction whose identifier is either
in, or smaller than any identifier in, the commit lisr given to T, when it started.
5.5 A MULTIVERSION MIXED METHOD 163
We are assuming here, as always, that when a transaction aborts, all versions it
has produced are removed from the version lists.
When the scheduler receives n from the TM for the purpose of reducing
the size of the commit list, it can also garbage collect versions. In particular, it
can discard a committed version of x, provided there is a more recent commit-
ted version of x whose identifier is less than n.
Now, T, reads a database state that includes T4’s Write on 3~but not T,‘s
Write on X, while T, reads a database state that includes T,‘s Write on x but not
T4's Write on 1: Thus, from T,‘s viewpoint, transactions executed in the order
T, T, T,, but from TL's vieit-point, transactions executed in the order T, TL T,,.
There is no serial 1V history including all four transactions that is equivalent
to this execution. Yet, the execution consisting only of updaters is lSR, and in
a sense, each query reads consistent data. We leave the proof of these proper-
ties as an exercise (see Exercise 5.22).
BIBLIOGRAPHIC NOTES
The serializability theoretic model of multiversion concurrency control is from
[Bernstein, Goodman 831. Other theoretical aspects are explored in [Hadzilacos,
Papadimitriou 8.51,[Ibaraki, Kameda 831, [Lausen 831, and [Papadimitriou, Kanellakis
841. The two version 2PL algorithm in Section 5.3 is similar to that of [Stearns,
Rosenkrantz 811, which uses timestamp-based deadlock prevention. A similar method
that uses SGT certification for rw synchronization is described in [Bayer et al. 801 and
[Bayer, Heller, Reiser 801. A multiversion tree locking algorithm appears in [Silber-
schatz,821. Multiversion TO was introduced in [Reed 781, [Reed 791, and [Reed 831.
hlultiversion mixed methods like those in Section 5.5 are described in [Bernsrein,
Goodman 811, [Chan et al. 821, [Ch,an, Gray 8.51,[Dubourdieu 821, and [Weihl 851.
[Dubourdieu 821 describes a method used in a product of Prime Computer. [Lai,
Wilkinson 841 describes a multiversion 2PL certifier, where queries are never delayed,
and each updater I, is certified by checking its readset and writeset against the writeset
of all transactions that committed after T, starts.
EXERCISES
5.1” Consider the following history:
required only conditions (1) - (4), but not recoverability. Prove that in that
case, C(H) would not necessarily be a complete MV history. (Incidentally,
this is the reason for making recoverability part of the definition of MV
histories, whereas in 1V serializability theory we treated recoverability as
a property that some histories have and others do not.)
5.3” Prove Proposition 5.2.
5.4” Prove that if H is a 1SR MV history, then so is any prefix of H.
Lj. 5 :5 Suppose no transaction ever reads a data item that it previously wrote.
Then we can redefine MV history, such that it need not preserve reflexive
reads-from relationships (since they cannot exist). Using this revised defi-
nition prove Theorem 5.3, making as many simplifications as possible.
5.6 MVTO can reject transactions whose Writes arrive too late. Design a
conservative MVTO scheduler that never rejects Reads or Writes. Prove it
correct. To show why your conservative MVTO is not worse than single
version conservative TO, characterize those situations in which the latter
will delay operations while the former will not. Are there situations where
the opposite is true?
5.7 In MVTO, suppose that we store timestamp intervals in the data
items themselves rather than in a separate table. For example, suppose the
granularity of data items is a fixed size page and that each page has a
header containing timestamp interval information. How does this organi-
zation affect the efficiency with which the MVTO scheduler processes
operations? How does it affect the way the scheduler garbage collects old
versions?
5.8 Since MVTO doesn’t use locks, we need to add a mechanism for
preventing transactions from reading uncommitted data and thereby
avoiding cascading aborts. Propose such a mechanism. How much
concurrency do you lose through this mechanism? Compare the amount of
concurrency you get with the one you proposed for Exercise 5.6.
5.9 Show that there does or does not exist a sequence of Reads and Writes
in which
a. Basic TO rejects an operation and MVTO does not;
b. Basic TO delays an operation and MVTO does not;
c. MVTO rejects an operation and Basic TO does not; and
d. MVTO delays an operation and Basic TO does not.
That is, for each situation, either give an example sequence with the
desired property, or prove that such a sequence does not exist.
5.10 Modify MVTO so that it correctly handles transactions that write into
a data item more than once.
5.11 Describe the precise conditions under which MVTO can safely discard
a version without affecting any future transaction.
166 CHAPTER 5 / MULTIVEIWON CONCURRENCY CONTROL