Software Transactional Memory Introductory Paper
Software Transactional Memory Introductory Paper
May 2, 2007
1 Introduction
Software transactional memory (STM) is a shared-memory concurrency model, originally inspired
by cache coherency protocols in hardware design, but having the flavour of version control systems
such as CVS. The main idea is that a process initiates a transaction which obtains a private copy
of the data to be modified, does local computation, and when finished attempts to commit the
results back to shared memory. The commit will succeed only if validation checks ascertain that
the transaction has seen a consistent view of memory; otherwise it must retry. The transaction
appears to execute atomically at some point in time within its execution, or in other words STM is
linearizable. Although lock-based code tends to run more efficiently, the STM approach has appeal
in that locks need not be used, so that sequential code can in most cases be safely converted to
concurrent simply be wrapping code into modular, composable transactions.
1
P1: P2:
acquire(lock1); acquire(lock2);
acquire(lock2); acquire(lock1);
... ...
The example illustrates two locks being acquired in a different order by two different processes,
resulting in a deadlock - the progress condition can no longer be satisfied and the two processes
will wait on each other forever.
One solution to the above problem is to always ensure that a process acquires all of its locks in
the same order as all other processes. The only way to do this is by implementing lock acquisition
order in the program logic itself - this either places undue burden on the programmer, or, much
more commonly, cannot be done due to the fact that the particular locks a process needs to acquire
cannot be anticipated in advance when the shared objects manipulated by each process must be
chosen at runtime.
Another solution is to use elaborate deadlock-detection algorithms to detect and break dead-
lock. Although conceptually an interesting approach that solves the problem, currently known
deadlock detection algorithms impose very high runtime overhead and so are rarely used in prac-
tice.
The choice for ensuring progress (deadlock-freedom) for lock-based programs comes down to the
inflexible (acquire all locks in the same order) or the impractical (deadlock detection algorithms).
It is then no wonder that currently, concurrent programming is considered a very difficult subject
by the majority of practicing programmers.
Composability of lock-based code breaks down because of the same need to know the details of
the locking protocol of all objects that need to be acted upon. Say we developed a banking system
where each account is represented by an object with synchronized withdraw and deposit methods.
Since they are synchronized, withdraws and deposits happen atomically. Now we wish to implement
transfers between accounts, which we would also like to be atomic. Obviously, transfer(A,B,x)
{ withdraw(A, x); deposit(B, x); } is not going to work. Now the locking protocol of the
accounts will have to be exposed, so perhaps something like the following might work
transfer(A,B,x) {
synchronized(A) {
synchronized(B) {
withdraw(A,x);
deposit(B,x);
}
}
}
will work. However, what if there is a global TotalBankBalance object that holds the sum of the
bank’s account balances and is updated by the withdraw and deposit methods? If the programmer
2
wants transfers to be atomic, she would have to know about it and lock it as well. Any time the set
of objects needing to be synchronized on in a particular method changes, all code that composes
the method in an atomic way will have to update the set of locks it tries to acquire to reflect the
change.
3 Focus on DSTM
The dynamic software transactional memory (DSTM) of Herlihy et al. is a system with both
practical and theoretic appeal. Although a specific implementation with examples in Java are
given in [3], we focus here on the model and proof of correctness, with some discussion of progress
guarantees.
3
TA ACTIVE TA COMMITTED
tran tran
new d′ new d′
O O
old d old d
TB ACTIVE TB ACTIVE
tran tran
new ⊥ new d′′
old ⊥ old d′
(a) TB prepares an new locator, eventually (b) TA commits successfully, so that d0 is the
for O. current value of O. TB prepares the old and
new fields of its locator.
TA COMMITTED TA COMMITTED
tran tran
CAS new d′ new d′ garabage
O
olle
ted
old d old d
O
TB ACTIVE TB ACTIVE
tran tran
new d′′ new d′′
old d′ old d′
0
(c) TB CASes O’s locator pointer to its lo- (d) d is still currently referenced, but the
cator. rest of the old locator and what it points to
is garabage.
Figure 1: Progress of two transactions with contention over object O. Consider also that between (a) and
(b) TB probably attempted to CAS the status of TA to ABORTED, but failed because TA committed in the
interim.
4
transaction A, B has to go through the same process as when it opens in WRITE mode, except that
when B finally obtains a current version of the data (by aborting A, or by A committing before
it could be aborted), it doesn’t create a new locator. Rather, it leaves A’s old locator intact, but
it makes an entry for the object in B’s thread-local read-only table. Every transaction’s entire
read-only table is validated on every open operation, and at the beginning of the commit phase.
3.2 Linearizability
There are two basic approaches to proving linearizability, either to prove that any possible execution
of the model admits a linearization, or to prove that there exists a fixed choice of linearization
point for any operation such that any execution behaves as if all operations executed atomically
at those points. It is the latter approach which is used here, as well as in other STM linearisability
proofs with the possible exception of [9]. The DSTM is linearized at the start of the final read
check (beginning of the commit phase). No correctness proof is given in [3], and we attempt to
remedy the deficiency here.
We wish to prove that the DSTM implementation described in the last section preserves strict
consistency – every read returns the value of the last write occurring in any successfully committed
transaction. Let’s establish some proof notation and formalize the problem.
T.Xi The ith operation of transaction T .1
S.X → T.Y Operation X in transaction S happens before Y in T .
T.B Transaction T begins.
T.E T ends (either by being aborted or by committing successfully).2
T.C T performs a successful commit CAS, swapping it’s own
state from ACTIVE to COMMITTED.
T.OW (a) T opens TM object a in WRITE mode.
T.W (a, v) T writes value v to (local copy of) object a.3
T.OR (a) T opens TM object a in READ mode.
T.R(a, v) T reads value v for object a from T ’s read-only table.
T.K(a) T performs read-check on object a.2
T.K T performs read-check on every object in its the read-only table.
1 X and Y stand for generic operations. The rest of the operation symbols are specific (and mnemonic).
2 We assume that a transaction which fails any read check self-aborts, although this is not made explicit in [3].
3 We occasionally write T.W (a) or T.R(a) when we don’t care about the value.
Formally, what we need to prove is that when the following scheme holds prior to linearization,
5
read-only table validation
U S S
W (a, w) R(a, v) lin. pt. R(a) Ka lin. pt.
w 6= v
T T
W (a, v) lin. pt. W (a) C
(a) Illustrating why you can’t linearize DSTM prior (b) Why the DSTM linearization point cannot be
to the read-only table validation at the start of the later than the start of the read-only table validation. In
commit phase. particular, DSTM cannot be linearized at the commit
CAS, contrary to the claim in [6].
Figure 2: Counterexamples showing why the DSTM linearization point cannot be before or after the start of
the commit-phase read-only table validation.
To begin, consider why the DSTM linearization point (if one exists) could not be chosen at the
final CAS of the commit. Figure 2a depicts a history which fails to linearize at an arbitrary point
prior to the read-only table validation (commit read-check). The complementary Figure 2b shows
why linearization can fail if the linearization point is chosen beyond the start of the read-only table
validation.
For what follows it is useful to consider histories in which the same value is never written twice
to the same TM object. While this is not necessarily the case in an arbitrary DSTM history,
it suffices to limit our argument to this ‘worst case’, where validity is actually threatened. The
advantage is that, if linearization reorders our ops we don’t need to worry that the value read is
still valid due to some unforseen prior write of the same value.
Lemma. ¬ (T 6= U ∧ {T.W (a), U.W (a)} → {T.C, U.C}).
Proof.
both of which are contradictions. Consider w.l.o.g. the first disjunct. When U attempts to open
the TM object a for writing, it will attempt to abort T . If T manages to commit before being
6
T
W (a, v) lin. pt.
C
S
R(a, v) lin. pt.
C
Figure 3: The only remaining possibility for a write and read to the same TM object. The extent of overlap
between T and S could vary, but in any case the read in S must follow the commit in T . This linearizes
correctly.
aborted, the above sequence does not reflect the actual order of ops, since T.C → U.OW (a) in that
case.
Theorem. DSTM is linearizable.
Proof. The possible history has been reduced to a situation like that shown in Figure 3. This
linearizes correctly – if all operations were to occur at the linearization points shown, this would
not affect the apparent ordering of write and read to a.
4 GHC STM
The GHC STM system, originally described in [2], is an implementation of software transactional
memory for the Glasgow Haskell Compiler. STM operates on shared mutable places called TVars
(short for Transactional Variables), which can hold values of any type (since Haskell is strictly
typed2 , once a TVar is declared to be of a particular type, it can only hold values of that type).
data TVar a
7
Given an argument of type a, newTVar returns an STM action that returns a new transactional
variable which initially has the given argument as data.
Given a transactional variable as an argument, readTVar returns an STM monad that holds
the contents of that transactional variable. The contents can be bound to a variable with the <-
operator.
If we’ve created a TVar of type a, call it A, then foo <- readTVar A takes the type STM a
(yielded by readTVar) to a and binds it to the variable foo.3
Given a transactional variable and a new value for it, writeTVar creates an STM action that
writes the new value to that transactional variable.
atomically takes an STM action (transaction) and returns an IO action that can be composed
(sequenced) with other IO actions to build up a Haskell program.
The following code listing provides an example of how the above can be put together, as well
as illustrating the use of higher-order programming with monads:
import Control.Concurrent.STM
import Control.Concurrent
3
The <- operator is really syntactic sugar for the more general concept of monad binding.
8
makeAccount x = atomically (do { tvar <- newTVar x; return (Balance tvar) })
In addition to the usual transactional operations, GHC STM provides two operators that when
taken together form a powerful new synchronization mechanism for transactional code:
retry :: STM a
retry, when called, causes the transaction to retry from the beginning. However, instead
of running right away, the transaction will block on all the transactional variables that were
previously accessed by that transaction until at least one of them is modified (since TVars are
the only stateful objects inside an STM monad, non-determinism/conditional execution inside
transactions is entirely dependent upon them - if all the TVars have the same values, re-running
the transaction will lead down the same execution path that lead to the retry).
9
4.2 GHC STM semantics
Because Haskell’s type system restricts side-effects to IO actions, the GHC Haskell STM system
can be described with an operational semantics that largely corresponds to the Haskell code. In
[2], Harris et alia provide such semantics. They will not be reproduced here, however, the authors
feel that they warrant some comment.
The semantics are provided in terms of state transition rules corresponding to STM and IO
actions. It is important to note that the semantics describe the results of the system interface
only - most implementation details are left out. In particular, while the semantics specify that
IO transitions can be interleaved between several threads, STM transitions transition to a return
or throw statement instead of transition-at-a-time like IO transitions. So an STM action that
has been wrapped in ‘atomically’ yields an atomic transition - in effect linearizability is already
assumed.
An important property of the semantics is that they illuminate tricky design decisions with
orElse and exceptions - if the first action throws an exception, should it be discarded and the
second action attempted (which at first glance may seem like a reasonable thing to do), or should
the exception propagate? If the former happens, then what about if the second action throws an
exception? This can’t be ignored (otherwise orElse would trap all exceptions!), so any exceptions
thrown by action2 would be propagated, but not those of action1 - besides being inconsistent,
this would also break the relation that retry is a unit of orElse.
Another critically important aspect of exception handling inside transactions is revealed in [2]
that to the authors’ knowledge has not been considered elsewhere. Software transactional memory
systems with lazy acquire semantics (see Section 6) open the possibility that an exception can
be thrown because an inconsistent state has been observed (this is not possible in eager acquire
systems because validation occurs each time an object is opened for subsequent reading and/or
writing). From a transactional point of view, the correct way to handle this situation is to catch the
exception, validate the transaction, and if the validation fails, discard the exception and retry the
transaction, and if it succeeds, re-throw the exception.4 This is the approach that the GHC STM
system employs. Precisely because side-effects are limited to transactional variables in the STM
monad, this particular choice is guaranteed to be correct. However, in systems with unrestricted
side-effects, the possibility arises that the above mechanism will discard valid exceptions carrying
meaningful state. The authors feel that this issue can lead to potentially difficult to diagnose bugs,
and the choice of exception handling semantics for lazy-acquire software transactional memory
systems should be carefully considered and specified by the implementers of those systems as this
affects the behavior of programs written by their users.
The biggest hole left from the viewpoint of someone attempting to prove correctness properties
about the GHC STM system is the assumption made by the semantics that the STM transitions
are already linearized. Of course, in order to prove this piece of the puzzle, we will need to
4
Languages with less primitive condition handling facilities, such as Common Lisp, enable handling of these
cases without needing to catch and re-throw. Since the exception system is implemented monadically in Haskell, it
does not suffer from this weakness either.
10
dive down into the STM implementation code at its operating system interface. For portability,
interfacing and performance reasons, much of the implementation code is written in C (as is
much of the GHC multithreading system), which makes a formal proof of all but the most trivial
properties of the source code nearly impossible. However, we can make an assumption that the
implementation code is correct, and prove properties about the high-level description of the design
of the implementation, like was done earlier in the paper for DSTM.
11
value of the most recent write which was previously successfully committed, and that this value
is unchanged at the the commit of their own transaction. No ops occur outside of a transaction,
and no transaction nesting within a thread is permitted.
A conflict function C : H × D × D is defined to be a boolean function over the product of histo-
ries and descriptor-pairs. Two transactions with descriptors s and t in a history H conflict precisely
when C(H, s, t) holds true. The conflict functions are required to preserve three properties:
1. C(H, s, t) = C(H, t, s),
2. if s = t or s and t refer to nonoverlapping transactions, then C(H, s, t) is false, and
3. if H[s,t) = I[s,t) then C(H, s, t) = C(I, s, t), where H[s,t) denotes the subhistory of H consisting
only of ops of s and t, and not including any ops beyond the end of s or t.
The first property implies that it is the pair (s, t) that conflicts in H, and that neither s nor t is
the cause of it. (Scott introduces arbitration functions to break this symmetry, for contention
management, but we cannot elaborate on that here.) The second property upholds our sequential
specification in that isolated transactions must commit successfully. The third property means
that other transactions don’t influence the conflict valuation between s and t.
Figure 4c illustrates the nested structure of STM models which is induced by some natural
alternative conflict definitions. The regions can be thought of as subspaces of H × D × D – specif-
ically, inverse images of true, C −1 (true). Thus, the enclosing conflict function is less permissive
than the enclosed, so that ‘overlap conflict’ is the most stringent, and ‘lazy invalidation’ the most
permissive.
Lazy invalidation defines there to be a conflict if in two overlapping transactions s and t, s writes
to some object that t reads, and s commits successfully before t finishes. Lazy invalidation conflict
is the most permissive consistency-preserving definition of conflict, by definition, since if a conflict
was not generated by this situation, the value read in t would be invalid for the duration of t after
the commit of s. It is instructive to examine how the conflict functions differ; as an example,
consider now ‘eager W-R’ invalidation (Figure 4a). Unlike lazy invalidation, eager W-R requires
that the write in s happens before the read in t, but it does not require that either s or t commit
for a conflict to arise. One says that the write in s ‘threatens’ the read in t, because there exists
a possible future history in which the read is invalidated by the write, namely a history in which
s commits first.
Let’s leave Scott by proving one of his main results, which requires one more definition. The
authors regret that this introduces two more undefined terms, but if all the terminology was to be
defined this section would be as long as Scott’s entire paper! The terms in question will be defined
informally in the course of the following proof.
Proof. C-based TM denotes the set of all consistent, C-respecting histories. A sequential specifi-
cation is a prefix-closed set of sequential histories. A C-respecting history is one in which both
1. it is never the case that both of a conflicting pair of transactions commit, and
12
(a) Some nontrivial conflict definitions. (b) Demonstrating nondegeneracy in the definitions.
Figure 4: A classification of STM systems by definition of conflict. Diagrams reproduced from [8].
13
2. any transaction which has no conflicts succeeds.
The properties of a conflict function imply that, if a history is C-respecting, it is linearisable,
by Theorem 1 of [8]. It only remains to prove that C-based TM is prefix-closed. Supposing the
contrary, let P be a prefix of a C-respecting history H, such that P is not C-respecting. We take
the two cases of the definition of C-respecting in order.
In the first case, there must exist two C-conflicting transactions S and T in P which both
commit successfully in P . But since P is a subhistory of H and commits are irrevocable, these two
commits also succeed in H. Also, the conflicting transactions must still conflict in H by property
3 of the definition of conflict function, since P[S,T ) = H[S,T ) . This is a contradiction, since H could
not be C-respecting under these conditions.
In the second case, there must exist a failed transaction T which had a conflict (say with
transaction S) in P but none in H. Since T failed in P , it must have ended in P , which means
again that P[S,T ) = H[S,T ) . That would imply that C(H, S, T ) = C(P, S, T ), a contradiction.
14
nutshell OSTM is the winner in read-dominated tasks (due to the lack of indirection), but is the
slower in write-dominated tasks (due to the extra overhead of validating writers at every open()
op, whereas DSTM only requires that readers be validated).
Like DSTM, ASTM (both eager and lazy variants) maintain a read-list which is validated on
each open() op, but lazy ASTM goes further and never attempts to acquire any object until commit
time, instead maintaining an additional write-list analogous to the read-list. This does not make the
ASTM lock-free however, as it still relies on a contention manager to resolve conflicts. The OSTM
achieves lock-free progress by maintaining a sorted order of the objects, and using recursive helping
to allow transactions to expedite the task completion of contending transactions, a technique which
has been used in STM systems since Shavit and Touitou [9]. ASTM also incorporates early release
features found in DSTM, allowing readers which are no longer needed in a transaction to erase
their entries from the read-table, a practise which can decrease the validation complexity from
quadratic to linear. This is especially useful for structures which need to be traversed from a
common ingress, such as trees and linked lists. However, it has the disadvantage of breaking
linearizability! The system is no longer safe; it becomes incumbent on the programmer to exercise
correct judgement in deciding when it is safe to release an object. The OSTM makes validation
checks the responsibility of the programmer a priori , and therefore suffers in this respect also.
Under the assumption that all necessary validation checks are in place, both OSTM and DSTM
linearize at the beginning of the commit phase, before the read-check validation. (The linearization
point is incorrectly reported in [6] to be the final CAS which effects the commit.)
In terms of Marathe et alia’s STM design space, GHC STM is a lazy acquiring, per-transaction
metadata, indirect object referencing, and lock-free (a transaction is only aborted if another con-
flicting transaction was the first to commit).
15
// this works
P1: P2:
synchronize(o1) { synchronize(o2) {
while (!flagA) {} flagA := true;
flagB := true; while (!flagB) {}
} }
atomic { atomic {
while (!flagA) {} flagA := true;
flagB := true; while (!flagB) {}
} }
It is also important to note that the composability of transactions poses its own pitfalls. In
particular, composition of transactions does not preserve the progress property of otherwise correct
transactional code. In the following example (also borrowed from [7]) sequential composition leads
to deadlock:
int A := 0;
int B := 0;
P1: P2:
atomic { atomic {
A := 1; if (A != 1) then retry
} else B := 1;
}
atomic {
if (B != 1) then retry
else...
}
// this deadlocks
16
int A := 0;
int B := 0;
P1: P2:
atomic { atomic {
atomic { if (A != 1) then retry
A := 1; else B := 1;
} }
atomic {
if (B != 1) then retry
else...
}
}
References
[1] K. Fraser and T. Harris. Concurrent programming without locks, 2007.
[3] M. Herlihy, V. Luchangco, M. Moir, and W. N. Scherer III. Software transactional memory for
dynamic-sized data structures, 2003.
[5] V. J. Marathe, W. N. Scherer III, and M. L. Scott. Design tradeoffs in modern software
transactional memory systems. In LCR ’04: Proceedings of the 7th workshop on Workshop on
languages, compilers, and run-time support for scalable systems, pages 1–7, New York, NY,
USA, 2004. ACM Press.
[6] V. J. Marathe, W. N. Scherer III, and M. L. Scott. Adaptive software transactional memory. In
Proceedings of the 19th International Symposium on Distributed Computing, Cracow, Poland,
Sep 2005. Earlier but expanded version available as TR 868, University of Rochester Computer
Science Dept., May2005.
[7] M. Martin, C. Blundell, and E. Lewis. Subtleties of transactional memory atomicity semantics.
IEEE Computer Architecture Letters, 5(2), 2006.
17
[8] M. L. Scott. Sequential specification of transactional memory semantics. In ACM SIGPLAN
Workshop on Transactional Computing. Jun 2006. Held in conjunction with PLDI 2006.
18